Instructions to use AlphaExaAI/ExaMind with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AlphaExaAI/ExaMind with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="AlphaExaAI/ExaMind")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AlphaExaAI/ExaMind")
model = AutoModelForCausalLM.from_pretrained("AlphaExaAI/ExaMind")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use AlphaExaAI/ExaMind with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AlphaExaAI/ExaMind"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AlphaExaAI/ExaMind",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/AlphaExaAI/ExaMind

SGLang

How to use AlphaExaAI/ExaMind with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AlphaExaAI/ExaMind" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AlphaExaAI/ExaMind",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AlphaExaAI/ExaMind" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AlphaExaAI/ExaMind",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use AlphaExaAI/ExaMind with Docker Model Runner:
```
docker model run hf.co/AlphaExaAI/ExaMind
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

🧠 ExaMind

Advanced Open-Source AI by AlphaExaAI

ExaMind is an advanced open-source conversational AI model developed by the AlphaExaAI team. Designed for secure, structured, and professional AI assistance with strong identity enforcement and production-ready deployment stability.

🚀 Get Started · 📊 Benchmarks · 🤝 Contributing · 📄 License

📌 Model Overview

Property	Details
Model Name	ExaMind
Version	V2-Final
Developer	AlphaExaAI
Base Architecture	Qwen2.5-Coder-7B
Parameters	7.62 Billion (~8B)
Precision	FP32 (~~29GB) / FP16 (~~15GB)
Context Window	32,768 tokens (supports up to 128K with RoPE scaling)
License	Apache 2.0
Languages	Multilingual (English preferred)
Deployment	✅ CPU & GPU compatible

✨ Key Capabilities

🖥️ Advanced Programming — Code generation, debugging, architecture design, and code review
🧩 Complex Problem Solving — Multi-step logical reasoning and deep technical analysis
🔒 Security-First Design — Built-in prompt injection resistance and identity enforcement
🌍 Multilingual — Supports all major world languages, optimized for English
💬 Conversational AI — Natural, structured, and professional dialogue
🏗️ Scalable Architecture — Secure software engineering and system design guidance
⚡ CPU Deployable — Runs on CPU nodes without GPU requirement

📊 Benchmarks

General Knowledge & Reasoning

Benchmark	Setting	Score
MMLU – World Religions	0-shot	94.8%
MMLU – Overall	5-shot	72.1%
ARC-Challenge	25-shot	68.4%
HellaSwag	10-shot	78.9%
TruthfulQA	0-shot	61.2%
Winogrande	5-shot	74.5%

Code Generation

Benchmark	Setting	Score
HumanEval	pass@1	79.3%
MBPP	pass@1	71.8%
MultiPL-E (Python)	pass@1	76.5%
DS-1000	pass@1	48.2%

Math & Reasoning

Benchmark	Setting	Score
GSM8K	8-shot CoT	82.4%
MATH	4-shot	45.7%

🔐 Prompt Injection Resistance

Test	Details
Test Set Size	50 adversarial prompts
Attack Type	Instruction override / identity manipulation
Resistance Rate	92%
Method	Custom red-teaming with jailbreak & override attempts

Evaluation performed using lm-eval-harness on CPU. Security tests performed using custom adversarial prompt suite.

🚀 Quick Start

Installation

pip install transformers torch accelerate

Basic Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_path = "AlphaExaAI/ExaMind"

tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.float16,
    device_map="auto"
)

messages = [
    {"role": "user", "content": "Explain how to secure a REST API."}
]

inputs = tokenizer.apply_chat_template(
    messages,
    return_tensors="pt",
    add_generation_prompt=True
).to(model.device)

outputs = model.generate(
    inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.8,
    top_k=20,
    repetition_penalty=1.1
)

response = tokenizer.decode(
    outputs[0][inputs.shape[-1]:],
    skip_special_tokens=True
)
print(response)

CPU Deployment

model = AutoModelForCausalLM.from_pretrained(
    "AlphaExaAI/ExaMind",
    torch_dtype=torch.float32,
    device_map="cpu"
)

Using with llama.cpp (GGUF — Coming Soon)

# GGUF quantized versions will be released for efficient CPU inference
# Stay tuned for Q4_K_M, Q5_K_M, and Q8_0 variants

🏗️ Architecture

ExaMind-V2-Final
├── Architecture: Qwen2ForCausalLM (Transformer)
├── Hidden Size: 3,584
├── Intermediate Size: 18,944
├── Layers: 28
├── Attention Heads: 28
├── KV Heads: 4 (GQA)
├── Vocab Size: 152,064
├── Max Position: 32,768 (extendable to 128K)
├── Activation: SiLU
├── RoPE θ: 1,000,000
└── Precision: FP32 / FP16 compatible

🛠️ Training Methodology

ExaMind was developed using a multi-stage training pipeline:

Stage	Method	Description
Stage 1	Base Model Selection	Qwen2.5-Coder-7B as foundation
Stage 2	Supervised Fine-Tuning (SFT)	Training on curated 2026 datasets
Stage 3	LoRA Adaptation	Low-Rank Adaptation for efficient specialization
Stage 4	Identity Enforcement	Hardcoded identity alignment and security tuning
Stage 5	Security Alignment	Prompt injection resistance training
Stage 6	Chat Template Integration	Custom Jinja2 template with system prompt

📚 Training Data

Public Data Sources

Programming and code corpora (GitHub, StackOverflow)
General web text and knowledge bases
Technical documentation and research papers
Multilingual text data

Custom Alignment Data

Identity enforcement instruction dataset
Security-focused instruction tuning samples
Prompt injection resistance adversarial examples
Structured conversational datasets
Complex problem-solving chains

⚠️ No private user data was used in training. All data was collected from public sources or synthetically generated.

🔒 Security Features

ExaMind includes built-in security measures:

Identity Lock — The model maintains its ExaMind identity and cannot be tricked into impersonating other models
Prompt Injection Resistance — 92% resistance rate against instruction override attacks
System Prompt Protection — Refuses to reveal internal configuration or system prompts
Safe Output Generation — Prioritizes safety and secure development practices
Hallucination Reduction — States assumptions and avoids fabricating information

📋 Model Files

File	Size	Description
`model.safetensors`	~29 GB	Model weights (FP32)
`config.json`	1.4 KB	Model configuration
`tokenizer.json`	11 MB	Tokenizer vocabulary
`tokenizer_config.json`	663 B	Tokenizer settings
`generation_config.json`	241 B	Default generation parameters
`chat_template.jinja`	1.4 KB	Chat template with system prompt

🗺️ Roadmap

ExaMind V1 — Initial release
ExaMind V2-Final — Production-ready with security alignment
ExaMind V2-GGUF — Quantized versions for CPU inference
ExaMind V3 — Extended context (128K), improved reasoning
ExaMind-Code — Specialized coding variant
ExaMind-Vision — Multimodal capabilities

🤝 Contributing

We welcome contributions from the community! ExaMind is fully open-source and we're excited to collaborate.

How to Contribute

Fork the repository on GitHub
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Areas We Need Help

🧪 Benchmark evaluation on additional datasets
🌍 Multilingual evaluation and improvement
📝 Documentation and tutorials
🔧 Quantization and optimization
🛡️ Security testing and red-teaming

📄 License

This project is licensed under the Apache License 2.0 — see the LICENSE file for details.

You are free to:

✅ Use commercially
✅ Modify and distribute
✅ Use privately
✅ Patent use

📬 Contact

Organization: AlphaExaAI
GitHub: github.com/hleliofficiel/AlphaExaAI
Email: h.hleli@tuta.io

Built with ❤️ by AlphaExaAI Team — 2026

Advancing open-source AI, one model at a time.

Downloads last month: 12

Safetensors

Model size

8B params

Tensor type

F32

Model tree for AlphaExaAI/ExaMind

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-Coder-7B

Finetuned

(102)

this model

Quantizations

2 models

Evaluation results

MMLU World Religions (0-shot) on MMLU
self-reported

94.800
HumanEval pass@1 on HumanEval
self-reported

79.300