Context-1 GGUF Quantizations
GGUF quantized versions of chromadb/context-1, converted for inference with llama.cpp, LM Studio, and other GGUF-compatible engines.
About Context-1
Context-1 is a 20.9B parameter Mixture-of-Experts (MoE) causal language model developed by Chroma. It uses the GptOssForCausalLM architecture with 32 experts and 4 active per token, providing strong performance with efficient inference.
| Detail | Value |
|---|---|
| Architecture | GptOssForCausalLM (MoE) |
| Total Parameters | ~20.9B |
| Active Parameters | ~3B per token (4 of 32 experts) |
| Hidden Size | 2880 |
| License | Apache-2.0 |
Quantization
Quantized from F16 weights using llama.cpp with importance matrix (imatrix) calibration, running on NVIDIA H100 GPUs via Modal. All standard K-quant and I-quant variants are provided.
Usage
llama.cpp
# Download your preferred quant
huggingface-cli download nicolasembleton/context-1-GGUF context-1-Q4_K_M.gguf --local-dir .
# Run
./llama-cli -m context-1-Q4_K_M.gguf -p "Your prompt here" -ngl 99
LM Studio
Search for nicolasembleton/context-1-GGUF in LM Studio's model browser and download the desired quantization.
Python (llama-cpp-python)
from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id="nicolasembleton/context-1-GGUF",
filename="context-1-Q4_K_M.gguf",
n_gpu_layers=-1,
)
response = llm.create_chat_completion(
messages=[{"role": "user", "content": "Hello!"}]
)
print(response)
Chat Template
This model uses a custom chat template based on the OpenAI/Oss architecture with support for multi-channel output (analysis, commentary, final), tool calling, and built-in browser/python tools. The template is embedded in the GGUF files.
Format overview:
<|start|>system<|message|>...<|end|>
<|start|>developer<|message|>...<|end|>
<|start|>user<|message|>...<|end|>
<|start|>assistant<|channel|>final<|message|>...<|end|>
For the full template, see chat_template.jinja in the original repository.
License
Apache-2.0 β same as the original model.
Acknowledgements
- Downloads last month
- 19,514
1-bit
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit