nani-qwen-3.5-2B

A fine-tuned Qwen3.5-2B for crypto wallet tool calling. Trained with Unsloth LoRA on 2,029 examples covering 103 blockchain tools from agentek.

Quickstart

# Download the GGUF
huggingface-cli download NaniDAO/nani-qwen-3.5-2B-gguf-q4km \
  --local-dir ~/models/nani-2b-q4km

# Register with Ollama
cd ~/models/nani-2b-q4km
ollama create nani -f Modelfile

# Run
ollama run nani --system "$(cat system-prompt.txt)" "resolve vitalik.eth"

Model Details

Base model Qwen3.5-2B
Method LoRA (r=16, alpha=16, dropout=0.05)
Training data 2,029 examples, 103 tools
Epochs 3 (optimal — 4th showed no improvement)
Hardware 1x T4 GPU (Kaggle)
Quantization Q4_K_M (this repo)
Context 4096 tokens
License Same as Qwen3.5-2B

Eval Results

Evaluated on 50 held-out examples. Base = Qwen3.5-2B without fine-tuning.

Metric Base Nani Delta
Tool call accuracy 98.0% 96.0% -2.0%
Correct function 75.5% 81.6% +6.1%
Correct params 67.3% 71.4% +4.1%
Format valid 100.0% 100.0%
Has <think> block 28.0% 100.0% +72.0%
No-tool correct 0.0% 100.0% +100.0%

The model learned to pick the right tool (+6.1%), use correct parameters (+4.1%), reason before acting (100% thinking), and know when NOT to call a tool (+100%). The small drop in tool call accuracy (-2%) is mostly sampling noise — 3 of 4 "failures" succeed on rerun.

Tool Call Format

The model uses Qwen3.5's native XML tool calling format:

<think>
The user wants to resolve an ENS name. I'll use resolveENS.
</think>

<tool_call>
<function=resolveENS>
<parameter=name>vitalik.eth</parameter>
</function>
</tool_call>

Tool results are passed back as tool role messages, then the model generates a final response.

System Prompt Format

Tools must be defined in the system prompt as newline-separated JSON inside <tools> tags. This matches the training data format exactly:

# Tools

You have access to the following functions:

<tools>
{"type":"function","function":{"name":"resolveENS","description":"Resolves an ENS name to an Ethereum address","parameters":{"type":"object","properties":{"name":{"type":"string","description":"The ENS name to resolve"}},"required":["name"]}}}
{"type":"function","function":{"name":"getBalance","description":"Get the native token (ETH) balance for an address","parameters":{"type":"object","properties":{"address":{"type":"string","description":"The wallet address (0x...)"},"chainId":{"type":"number","description":"Chain ID (1=Ethereum, 8453=Base)"}},"required":["address"]}}}
</tools>

If you choose to call a function ONLY reply in the following format with NO suffix:

<tool_call>
<function=example_function_name>
<parameter=example_parameter_1>
value_1
</parameter>
</function>
</tool_call>

<IMPORTANT>
Reminder:
- Function calls MUST follow the specified format
- Required parameters MUST be specified
- You may provide optional reasoning BEFORE the function call, but NOT after
- If there is no function call available, answer the question like normal
</IMPORTANT>

You are Nani, a crypto wallet assistant.

Keep tool schemas simple — use {"type": "string", "description": "..."} per property. Complex schemas with anyOf, $ref, or nested objects confuse the 2B model.

Supported Tools

Trained on 103 tools from agentek. Top tools by training examples:

Tool Examples Category
intentSwap 183 DEX trading
intentTransfer 157 Token transfers
resolveENS 71 ENS resolution
getBalance 64 Balance queries
getBalanceOf 59 ERC20 balances
resolveWNS 58 WNS resolution
getCryptoPrice 55 Price data
lookupENS 54 Reverse ENS
getQuote 53 Swap quotes
getFearAndGreedIndex 50 Market sentiment

Full coverage includes: ENS/WNS, ERC20, Uniswap V3, Aave, bridging (Across), security (ScamSniffer), blockscout explorer, gas estimation, DeFillama yields, NFTs, and more.

Ollama Modelfile

FROM ./nani-qwen-3.5-2B-Q4_K_M.gguf

PARAMETER temperature 0.7
PARAMETER top_p 0.8
PARAMETER top_k 20
PARAMETER stop "<|im_start|>"
PARAMETER stop "<|im_end|>"
PARAMETER num_ctx 4096

TEMPLATE """{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{- range .Messages }}{{- if eq .Role "user" }}<|im_start|>user
{{ .Content }}<|im_end|>
{{ else if eq .Role "assistant" }}<|im_start|>assistant
{{ .Content }}<|im_end|>
{{ else if eq .Role "tool" }}<|im_start|>tool
{{ .Content }}<|im_end|>
{{ end }}{{- end }}<|im_start|>assistant
"""

SYSTEM """You are Nani, a crypto wallet assistant."""

Do NOT add </tool_call> as a stop token — it prevents the model from outputting the closing tag, breaking tool call parsing.

Ollama API Usage

Call Ollama directly and parse tool calls from the text output:

curl http://localhost:11434/api/chat -d '{
  "model": "nani",
  "messages": [
    {"role": "system", "content": "<your system prompt with tools>"},
    {"role": "user", "content": "resolve vitalik.eth"}
  ],
  "stream": false
}'

Parse <tool_call> XML from message.content, execute the tool, then send the result back:

curl http://localhost:11434/api/chat -d '{
  "model": "nani",
  "messages": [
    {"role": "system", "content": "<your system prompt with tools>"},
    {"role": "user", "content": "resolve vitalik.eth"},
    {"role": "assistant", "content": "<tool_call>\n<function=resolveENS>\n<parameter=name>vitalik.eth</parameter>\n</function>\n</tool_call>"},
    {"role": "tool", "content": "0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045"}
  ],
  "stream": false
}'

Tool Call Parsing

Regex to extract tool calls from model output:

const regex = /<tool_call>\s*<function=(\w+)>([\s\S]*?)<\/function>\s*(?:<\/tool_call>)?/g;
const paramRegex = /<parameter=(\w+)>([\s\S]*?)<\/parameter>/g;

All parameter values are strings — coerce to numbers/booleans based on the tool's schema before execution.

Training Config

model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    lora_alpha=16,
    lora_dropout=0.05,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                     "gate_proj", "up_proj", "down_proj"],
)

# Tokenize with enable_thinking=True (critical)
text = tokenizer.apply_chat_template(messages, tokenize=False,
    add_generation_prompt=False, enable_thinking=True)

TrainingArguments(
    num_train_epochs=3,
    learning_rate=5e-5,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=16,
    lr_scheduler_type="cosine",
    warmup_ratio=0.1,
    fp16=True,
    optim="adamw_8bit",
)

enable_thinking=True is critical. With it disabled (v2), the model regressed -64% in tool accuracy because the tokenizer mangled the <think> blocks present in 97% of training data.

Local Testing

See nani-local for a Vite + React test app that connects to Ollama and executes real agentek tools with streaming, tool call visualization, and regex-based tool filtering (5 tools per message).

HuggingFace Repos

Repo Format Size
NaniDAO/nani-qwen-3.5-2B Merged fp16 ~5GB
NaniDAO/nani-qwen-3.5-2B-gguf-q4km GGUF Q4_K_M ~1.3GB

Vision Support

Qwen3.5-2B is a vision-language model (VLM), and the base weights include a vision encoder. However, this GGUF only contains the text/language model weights. The vision encoder (mmproj) was not included in the GGUF conversion.

Current state:

  • Text-based tool calling works fully via Ollama
  • Image input is NOT supported in this GGUF
  • Ollama does not yet support Qwen3.5's mmproj format
  • llama.cpp supports it with separate --mmproj flag, but requires reconversion

To add vision later: reconvert from the merged fp16 model using llama.cpp's convert_hf_to_gguf.py which extracts the mmproj automatically, then use llama.cpp directly with --model nani.gguf --mmproj nani-mmproj-f16.gguf.

Known Limitations

  • Text-only — vision encoder not included in this GGUF (see above)
  • 2B model size — sometimes hallucinates tool names or picks wrong parameters. Works best with 5 or fewer tools in the system prompt.
  • Simple schemas only — complex JSON schemas with anyOf, $ref, or nested objects confuse the model. Keep tool definitions flat: {"type": "string", "description": "..."} per property.
  • Training data imbalance — 78 of 103 tools have fewer than 15 examples. Performance on underrepresented tools is weaker.

Future Work

The model has plateaued with the current ~2K dataset. Next improvements require more data:

  • Expand to 4,000-5,000 examples
  • Cover all 150+ agentek tools (currently 103)
  • Balance tool distribution (cap at 50, min 20-25 per tool)
  • More multi-tool chains (currently 47, target 150+)
  • More no-tool conversations (currently 94, target 250+)
  • Parameter edge cases (optional params, wei values, chain variants)
  • Vision encoder GGUF extraction + Ollama/llama.cpp support
Downloads last month
562
GGUF
Model size
2B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NaniDAO/nani-qwen-3.5-2B-gguf-q4km

Finetuned
Qwen/Qwen3.5-2B
Adapter
(45)
this model

Evaluation results