SAM-G-Agent
SAM-G-Agent is the autonomous-agent member of the SAM-G family: a ~30M-parameter, offline, dual-mode model that acts as the per-step tool dispatcher of a long-running agentic loop (Manus / Claude-Code style). Given an instruction or the current state of a task, it emits the next action(s) as a compact, risk-flagged JSON plan that an executor runs against real tools.
It is not a monolithic long-horizon planner. An agent built on SAM-G-Agent runs for hours by a host loop that re-invokes the model each turn with the latest observation; the model returns one short action at a time. This design plays to the model's strength (short, reactive tool emission) and around its limit (long exactly-ordered chains).
What it does
Input: a natural-language instruction (EN/FR), optionally followed by an observation
block ( intent | {observation}). Output, after the [ACTION] mode token:
{"plan":[{"op":"web_search","args":{"query":"latest diffusion models"},"risk":"safe"}]}
A terminal {"op":"finish","args":{...}} tells the host loop to stop.
Tool vocabulary
| op | purpose | default risk |
|---|---|---|
web_search |
query the web | safe |
scrape_page |
fetch a page's content | safe |
read_arxiv |
read an arXiv paper | safe |
browse |
navigate a site (open / click / scroll / extract); submit/download gated | safe / critical |
execute_python |
run Python; gated when it touches os/subprocess/files/network | safe / critical |
generate_image |
text-to-image | safe |
ffmpeg |
video/audio editing (trim, concat, overlay, subtitles, extract audio) | safe |
download_file |
fetch a file to disk | critical |
transfer_token |
move crypto / value | always critical |
summarize / ask_llm |
condense / delegate hard reasoning to a larger model | safe |
finish |
terminate the agent loop | safe |
| inherited dev ops | open_file, list_dir, run_command, write_file, git_push, api_call, db_query |
per op |
Behaviour families (training coverage)
- dispatch β instruction β one tool call (the strongest mode).
- search_react / code_react / browse_react β react to an observation (results, stdout/error, page state) with the next action: refine, fix-and-retry, extract, finish.
- research_chain β
web_search β scrape_page β summarize. - media_pipeline β
download_file β ffmpeg β finish(gated). - risk_gate_agent β plans mixing safe + critical ops (transfer / download / system code).
- autonomous_step β
goal + state β the single next op(incl.finish): the loop primitive. - dev_dispatch β replay of inherited IDE/dev ops (anti-forgetting).
Safety: risk flag + mandatory deterministic backstop
Every op carries a learned risk flag (safe / critical) meant to drive a
user-confirmation gate. The flag is advisory, not the safety boundary. The host
application MUST enforce a deterministic policy that forces confirmation on
known-dangerous operations regardless of the flag β in particular:
transfer_token(value movement) β always confirm; never auto-execute;download_file, externalapi_callmutations, andexecute_pythonthat touches the filesystem / network / system β confirm;run_commandmatching dangerous patterns (e.g.rm -rf,git push),git_push,write_file,open_appβ confirm.
The flag may only harden (safe β critical), never permit. Treat a missing critical flag as a false negative to be caught by the backstop.
Intended use
The structured-action stage of an autonomous agent: research assistants, media-editing pipelines (ffmpeg), browser/YouTube navigation, code-execution loops, on-device automation. Runs fully offline; the executor supplies the actual tools.
Limitations (honest)
- Short chains, looped β not long monolithic plans. Reactive 1β3-op emission is the model's strength; tasks needing one long exactly-ordered plan must be decomposed by the host loop into short steps. This is by design, not a regression.
- ~30M scale: limited open-ended reasoning and world knowledge; delegate hard reasoning
via
ask_llmto a larger model. - French covers agentic instructions, not free prose.
- Tool set is fixed at fine-tune time; new tools require additional fine-tuning.
- Benchmarks are synthetic (disjoint seed); they validate routing/format/risk-gating, not real-world tool success, which depends on the executor.
Lineage
SAM-G (base, dual-mode) β SAM-G-Reasoning β SAM-G-CobraTooling (IDE tools, robust)
β SAM-G-Agent (autonomous tool dispatcher).
Disclosure
Architecture internals, tokenizer construction, data generators, and ablations are proprietary and withheld. This card documents the released artifact only.
Model tree for AMFORGE/sam-g-agent
Base model
AMFORGE/samg-cobratooling