Instructions to use Pinkstack/Luau-coder-v2-3B-base-32k with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Pinkstack/Luau-coder-v2-3B-base-32k with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Pinkstack/Luau-coder-v2-3B-base-32k")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Pinkstack/Luau-coder-v2-3B-base-32k")
model = AutoModelForCausalLM.from_pretrained("Pinkstack/Luau-coder-v2-3B-base-32k", device_map="auto")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Pinkstack/Luau-coder-v2-3B-base-32k with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Pinkstack/Luau-coder-v2-3B-base-32k"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Pinkstack/Luau-coder-v2-3B-base-32k",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Pinkstack/Luau-coder-v2-3B-base-32k

SGLang

How to use Pinkstack/Luau-coder-v2-3B-base-32k with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Pinkstack/Luau-coder-v2-3B-base-32k" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Pinkstack/Luau-coder-v2-3B-base-32k",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Pinkstack/Luau-coder-v2-3B-base-32k" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Pinkstack/Luau-coder-v2-3B-base-32k",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use Pinkstack/Luau-coder-v2-3B-base-32k with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Pinkstack/Luau-coder-v2-3B-base-32k to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Pinkstack/Luau-coder-v2-3B-base-32k to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Pinkstack/Luau-coder-v2-3B-base-32k to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Pinkstack/Luau-coder-v2-3B-base-32k",
    max_seq_length=2048,
)

Docker Model Runner
How to use Pinkstack/Luau-coder-v2-3B-base-32k with Docker Model Runner:
```
docker model run hf.co/Pinkstack/Luau-coder-v2-3B-base-32k
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Note: this is not a chat model, the chat model is coming soon but this is the base model for further fine-tuning, stay tuned for the chat model release! This page will be updated once that model is out. (The chat model will be under a different repo)

print("Before we start")

We are not related to Roblox in any way, any mention of Roblox is purely to help people understand what the model is about. As per the Roblox website, they use Meta's Llama 3 (we assume 70B) for their AI assistant. This model, while powerful, cannot come close to the performance of a 70B model.

But unlike Llama 3, this model (luau-coder-v2-3b-32k) aka luaucoder for short is under an open apache 2.0 license.

print("Stages of pre-training")

This model was continually pre-trained in 3 stages. (Note, allenai states that olmo 2 1B, which is the model this is based on was pre-trained on 4 trillion or so tokens.)

Stage 1: Pre-training on the Pinkstack/roblox-luau-corpus-text & Roblox/luau_corpus on 4096 context (the maximum olmo 2 can usually reach)
Stage 2: Pre-training on the boatbomber/roblox-info-dump with rope scaling set to 4, so stage 2 was for expanding the context of the model to 16384.

!stage 3 and onwards were with added layers. the model started with 16 layers, then we merged another 20 to make the model bigger and deeper!

Stage 3: Training on a mix of Pinkstack/roblox-luau-corpus-text & Roblox/luau_corpus + wikimedia/wikipedia with rope scaling set to 8, aka 32768 tokens of context. We mixed the wikimedia/wikipedia to hopefully improve the general text and knowledge of the model.

In total, the model was continually pre-trained on up to 1.3B tokens, final loss of 1.916400.

print("Use cases")

As this is a base model, there isn't much to do with it currently. But, you can fine-tune it on your own datasets to turn it into an instruct - chat type model.

print("Notice")

This stage-3 base model did not undergo saftey alignment by us, thus it can generate unethical content. Any outputs generated by the LLM are your responsibility.