Instructions to use AlphaExaAI/ExaMind with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AlphaExaAI/ExaMind with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="AlphaExaAI/ExaMind") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("AlphaExaAI/ExaMind") model = AutoModelForCausalLM.from_pretrained("AlphaExaAI/ExaMind") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use AlphaExaAI/ExaMind with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AlphaExaAI/ExaMind" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AlphaExaAI/ExaMind", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/AlphaExaAI/ExaMind
- SGLang
How to use AlphaExaAI/ExaMind with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AlphaExaAI/ExaMind" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AlphaExaAI/ExaMind", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AlphaExaAI/ExaMind" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AlphaExaAI/ExaMind", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use AlphaExaAI/ExaMind with Docker Model Runner:
docker model run hf.co/AlphaExaAI/ExaMind
🧠 ExaMind
Advanced Open-Source AI by AlphaExaAI
ExaMind is an advanced open-source conversational AI model developed by the AlphaExaAI team. Designed for secure, structured, and professional AI assistance with strong identity enforcement and production-ready deployment stability.
📌 Model Overview
| Property | Details |
|---|---|
| Model Name | ExaMind |
| Version | V2-Final |
| Developer | AlphaExaAI |
| Base Architecture | Qwen2.5-Coder-7B |
| Parameters | 7.62 Billion (~8B) |
| Precision | FP32 ( |
| Context Window | 32,768 tokens (supports up to 128K with RoPE scaling) |
| License | Apache 2.0 |
| Languages | Multilingual (English preferred) |
| Deployment | ✅ CPU & GPU compatible |
✨ Key Capabilities
- 🖥️ Advanced Programming — Code generation, debugging, architecture design, and code review
- 🧩 Complex Problem Solving — Multi-step logical reasoning and deep technical analysis
- 🔒 Security-First Design — Built-in prompt injection resistance and identity enforcement
- 🌍 Multilingual — Supports all major world languages, optimized for English
- 💬 Conversational AI — Natural, structured, and professional dialogue
- 🏗️ Scalable Architecture — Secure software engineering and system design guidance
- ⚡ CPU Deployable — Runs on CPU nodes without GPU requirement
📊 Benchmarks
General Knowledge & Reasoning
| Benchmark | Setting | Score |
|---|---|---|
| MMLU – World Religions | 0-shot | 94.8% |
| MMLU – Overall | 5-shot | 72.1% |
| ARC-Challenge | 25-shot | 68.4% |
| HellaSwag | 10-shot | 78.9% |
| TruthfulQA | 0-shot | 61.2% |
| Winogrande | 5-shot | 74.5% |
Code Generation
| Benchmark | Setting | Score |
|---|---|---|
| HumanEval | pass@1 | 79.3% |
| MBPP | pass@1 | 71.8% |
| MultiPL-E (Python) | pass@1 | 76.5% |
| DS-1000 | pass@1 | 48.2% |
Math & Reasoning
| Benchmark | Setting | Score |
|---|---|---|
| GSM8K | 8-shot CoT | 82.4% |
| MATH | 4-shot | 45.7% |
🔐 Prompt Injection Resistance
| Test | Details |
|---|---|
| Test Set Size | 50 adversarial prompts |
| Attack Type | Instruction override / identity manipulation |
| Resistance Rate | 92% |
| Method | Custom red-teaming with jailbreak & override attempts |
Evaluation performed using
lm-eval-harnesson CPU. Security tests performed using custom adversarial prompt suite.
🚀 Quick Start
Installation
pip install transformers torch accelerate
Basic Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_path = "AlphaExaAI/ExaMind"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype=torch.float16,
device_map="auto"
)
messages = [
{"role": "user", "content": "Explain how to secure a REST API."}
]
inputs = tokenizer.apply_chat_template(
messages,
return_tensors="pt",
add_generation_prompt=True
).to(model.device)
outputs = model.generate(
inputs,
max_new_tokens=512,
temperature=0.7,
top_p=0.8,
top_k=20,
repetition_penalty=1.1
)
response = tokenizer.decode(
outputs[0][inputs.shape[-1]:],
skip_special_tokens=True
)
print(response)
CPU Deployment
model = AutoModelForCausalLM.from_pretrained(
"AlphaExaAI/ExaMind",
torch_dtype=torch.float32,
device_map="cpu"
)
Using with llama.cpp (GGUF — Coming Soon)
# GGUF quantized versions will be released for efficient CPU inference
# Stay tuned for Q4_K_M, Q5_K_M, and Q8_0 variants
🏗️ Architecture
ExaMind-V2-Final
├── Architecture: Qwen2ForCausalLM (Transformer)
├── Hidden Size: 3,584
├── Intermediate Size: 18,944
├── Layers: 28
├── Attention Heads: 28
├── KV Heads: 4 (GQA)
├── Vocab Size: 152,064
├── Max Position: 32,768 (extendable to 128K)
├── Activation: SiLU
├── RoPE θ: 1,000,000
└── Precision: FP32 / FP16 compatible
🛠️ Training Methodology
ExaMind was developed using a multi-stage training pipeline:
| Stage | Method | Description |
|---|---|---|
| Stage 1 | Base Model Selection | Qwen2.5-Coder-7B as foundation |
| Stage 2 | Supervised Fine-Tuning (SFT) | Training on curated 2026 datasets |
| Stage 3 | LoRA Adaptation | Low-Rank Adaptation for efficient specialization |
| Stage 4 | Identity Enforcement | Hardcoded identity alignment and security tuning |
| Stage 5 | Security Alignment | Prompt injection resistance training |
| Stage 6 | Chat Template Integration | Custom Jinja2 template with system prompt |
📚 Training Data
Public Data Sources
- Programming and code corpora (GitHub, StackOverflow)
- General web text and knowledge bases
- Technical documentation and research papers
- Multilingual text data
Custom Alignment Data
- Identity enforcement instruction dataset
- Security-focused instruction tuning samples
- Prompt injection resistance adversarial examples
- Structured conversational datasets
- Complex problem-solving chains
⚠️ No private user data was used in training. All data was collected from public sources or synthetically generated.
🔒 Security Features
ExaMind includes built-in security measures:
- Identity Lock — The model maintains its ExaMind identity and cannot be tricked into impersonating other models
- Prompt Injection Resistance — 92% resistance rate against instruction override attacks
- System Prompt Protection — Refuses to reveal internal configuration or system prompts
- Safe Output Generation — Prioritizes safety and secure development practices
- Hallucination Reduction — States assumptions and avoids fabricating information
📋 Model Files
| File | Size | Description |
|---|---|---|
model.safetensors |
~29 GB | Model weights (FP32) |
config.json |
1.4 KB | Model configuration |
tokenizer.json |
11 MB | Tokenizer vocabulary |
tokenizer_config.json |
663 B | Tokenizer settings |
generation_config.json |
241 B | Default generation parameters |
chat_template.jinja |
1.4 KB | Chat template with system prompt |
🗺️ Roadmap
- ExaMind V1 — Initial release
- ExaMind V2-Final — Production-ready with security alignment
- ExaMind V2-GGUF — Quantized versions for CPU inference
- ExaMind V3 — Extended context (128K), improved reasoning
- ExaMind-Code — Specialized coding variant
- ExaMind-Vision — Multimodal capabilities
🤝 Contributing
We welcome contributions from the community! ExaMind is fully open-source and we're excited to collaborate.
How to Contribute
- Fork the repository on GitHub
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Areas We Need Help
- 🧪 Benchmark evaluation on additional datasets
- 🌍 Multilingual evaluation and improvement
- 📝 Documentation and tutorials
- 🔧 Quantization and optimization
- 🛡️ Security testing and red-teaming
📄 License
This project is licensed under the Apache License 2.0 — see the LICENSE file for details.
You are free to:
- ✅ Use commercially
- ✅ Modify and distribute
- ✅ Use privately
- ✅ Patent use
📬 Contact
- Organization: AlphaExaAI
- GitHub: github.com/hleliofficiel/AlphaExaAI
- Email: h.hleli@tuta.io
Built with ❤️ by AlphaExaAI Team — 2026
Advancing open-source AI, one model at a time.
- Downloads last month
- 7
Model tree for AlphaExaAI/ExaMind
Evaluation results
- MMLU World Religions (0-shot) on MMLUself-reported94.800
- HumanEval pass@1 on HumanEvalself-reported79.300