Instructions to use Eclipse-Senpai/KeyLM-75M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Eclipse-Senpai/KeyLM-75M with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Eclipse-Senpai/KeyLM-75M", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("Eclipse-Senpai/KeyLM-75M", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Eclipse-Senpai/KeyLM-75M with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Eclipse-Senpai/KeyLM-75M" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Eclipse-Senpai/KeyLM-75M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Eclipse-Senpai/KeyLM-75M
- SGLang
How to use Eclipse-Senpai/KeyLM-75M with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Eclipse-Senpai/KeyLM-75M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Eclipse-Senpai/KeyLM-75M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Eclipse-Senpai/KeyLM-75M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Eclipse-Senpai/KeyLM-75M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Eclipse-Senpai/KeyLM-75M with Docker Model Runner:
docker model run hf.co/Eclipse-Senpai/KeyLM-75M
KeyLM-75M
KeyLM-75M is a 75M parameter base language model trained from scratch on approximately 18 billion tokens. That training budget is a small fraction of what comparable small models use (SmolLM-135M was trained on roughly 600B tokens, SmolLM2-135M on roughly 2T).
This is the base model: a text-completion model, not instruction-tuned. For chat and instruction following, use KeyLM-75M-Instruct.
Table of Contents
Model Summary
KeyLM is a compact decoder-only transformer built on the standard small-model recipe used by Llama and Qwen3: grouped-query attention, rotary position embeddings (RoPE), SwiGLU feed-forward layers, and per-head QK-RMSNorm.
| Field | Value |
|---|---|
| Parameters | 75,251,200 |
| Layers | 24 |
| Hidden size | 512 |
| Attention heads | 8 (2 KV heads, GQA) |
| Context length | 2048 |
| Vocabulary | 12,020 (ByteLevel BPE) |
| Precision | bfloat16 |
| Training tokens | ~18B |
How to Use
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "Eclipse-Senpai/KeyLM-75M"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id, trust_remote_code=True, torch_dtype=torch.bfloat16
)
inputs = tokenizer("The three primary colors are", return_tensors="pt")
outputs = model.generate(
**inputs, max_new_tokens=40, do_sample=True,
temperature=0.7, top_p=0.9, repetition_penalty=1.1,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Evaluation
On zero-shot benchmarks (lm_eval; accuracy, with length-normalized accuracy for ARC and HellaSwag) KeyLM is modest but above random on basic commonsense, and at chance on knowledge-heavy tasks.
| Benchmark | KeyLM-75M (base) | KeyLM-75M-Instruct | Random |
|---|---|---|---|
| IFEval (4-metric avg) | — | 17.85 | — |
| MMLU | 23.0 | 24.0 | 25.0 |
| ARC (avg) | 29.9 | 30.8 | 25.0 |
| HellaSwag | 29.7 | 31.0 | 25.0 |
| PIQA | 60.0 | 61.3 | 50.0 |
| WinoGrande | 48.4 | 48.3 | 50.0 |
| OpenBookQA | 25.0 | 25.0 | 25.0 |
Instruction tuning leaves knowledge and reasoning roughly unchanged, its effect is the instruction-following ability (IFEval) the base lacks.
Training
KeyLM-75M was pretrained from random initialization on approximately 18B tokens, drawn from a weighted mixture of public datasets streamed through a deterministic curriculum.
| Category | Share | Sources |
|---|---|---|
| Formal / quality | ~30% | FineWeb-Edu, Wikipedia |
| Casual / social | ~30% | Reddit comments, StackExchange |
| Conversational | ~25% | WildChat, UltraChat, LMSYS-Chat, OASST2 |
| Structured knowledge | ~5% | Cosmopedia |
| Typo augmentation | ~10% | Synthetic (contrastive) |
The instruction-tuned model built on this base is available at KeyLM-75M-Instruct.
Limitations
- Minimal world knowledge. Not suitable for factual question answering, reasoning, math, or code.
- Base model: it completes text and does not follow instructions or hold a conversation. Use the Instruct version for chat.
- English only.
- No safety alignment. Apply your own filtering before any user-facing use.
License
Apache 2.0. The weights are trained from scratch and free to use, modify, and redistribute.
Citation
@misc{keylm75m2026,
title = {KeyLM-75M: a from-scratch small language model},
author = {Eclipse-Senpai},
year = {2026},
howpublished = {\url{https://huggingface.co/Eclipse-Senpai/KeyLM-75M}}
}
- Downloads last month
- -