gemma-4-E4B-it — Scribion German Medical Extraction v5 (recall-focused) — GGUF Q4_K_M

LoRA fine-tune of google/gemma-4-E4B-it for German consultation fact extraction in Scribion's exact 5-call schema ({id,text,section,evidence} over the 24 NOTE_STATE paths). v5 is the recall-focused successor to v3 (Mediform/gemma4): it keeps v3's section-routing fixes but recovers facts v3 omitted.

Files

  • gemma4-e4b-v5-Q4_K_M.gguf — language model, Q4_K_M (~5.3 GB).
  • mmproj-gemma-4-E4B-it-Q8_0.gguf — stock multimodal projector (audio+vision), unchanged.

v5 vs v3 (held-out Scribion clips, max_tokens 4096)

clip v3 v5
arztbericht 18/22 facts 20/22 (recovers Vacoped orthosis, 2 crutches, Clexane, Fersensporn, knee finding)
froehlich 12/17 12/17

What changed in training: the per-type empty calls were rebalanced 52%→20% (removing the sparsity bias that made v3 under-extract) and the teacher targets were regenerated with an exhaustiveness hint (capture negatives, devices, already-given meds, minor facts — no hallucination). Routing/dedup discipline from v3 is preserved (Weber-B → diagnosen_gesichert/ befunde, not vitalparameter).

Trade-off: higher recall at a small precision cost — slightly more verbose, occasional cross-section duplicate (e.g. a fact placed in two plausibly-valid sections). Net: better for the importance-weighted Q metric (which penalizes omissions far more than minor dups). Validate on your DeepSeek Q pipeline before promoting over v3.

Usage (llama.cpp)

llama-cli -m gemma4-e4b-v5-Q4_K_M.gguf -sys "<scribion per-type system prompt>" -p "Transkript:\n..."

Note: use maxTokens≈4096 for the narrative call — exhaustive output can be long; a smaller cap silently truncates the JSON (→ omitted facts). Decode low-temp (≈0.05) for deterministic extraction.

Quantized with llama.cpp (CPU build).

Downloads last month
-
GGUF
Model size
7B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Mediform/gemma4-v5

Quantized
(233)
this model