SAM-G-Agent

SAM-G-Agent is the autonomous-agent member of the SAM-G family: a ~30M-parameter, offline, dual-mode model that acts as the per-step tool dispatcher of a long-running agentic loop (Manus / Claude-Code style). Given an instruction or the current state of a task, it emits the next action(s) as a compact, risk-flagged JSON plan that an executor runs against real tools.

It is not a monolithic long-horizon planner. An agent built on SAM-G-Agent runs for hours by a host loop that re-invokes the model each turn with the latest observation; the model returns one short action at a time. This design plays to the model's strength (short, reactive tool emission) and around its limit (long exactly-ordered chains).

What it does

Input: a natural-language instruction (EN/FR), optionally followed by an observation block ( intent | {observation}). Output, after the [ACTION] mode token:

{"plan":[{"op":"web_search","args":{"query":"latest diffusion models"},"risk":"safe"}]}

A terminal {"op":"finish","args":{...}} tells the host loop to stop.

Tool vocabulary

op	purpose	default risk
`web_search`	query the web	safe
`scrape_page`	fetch a page's content	safe
`read_arxiv`	read an arXiv paper	safe
`browse`	navigate a site (open / click / scroll / extract); submit/download gated	safe / critical
`execute_python`	run Python; gated when it touches os/subprocess/files/network	safe / critical
`generate_image`	text-to-image	safe
`ffmpeg`	video/audio editing (trim, concat, overlay, subtitles, extract audio)	safe
`download_file`	fetch a file to disk	critical
`transfer_token`	move crypto / value	always critical
`summarize` / `ask_llm`	condense / delegate hard reasoning to a larger model	safe
`finish`	terminate the agent loop	safe
inherited dev ops	`open_file`, `list_dir`, `run_command`, `write_file`, `git_push`, `api_call`, `db_query`	per op

Behaviour families (training coverage)

dispatch — instruction → one tool call (the strongest mode).
search_react / code_react / browse_react — react to an observation (results, stdout/error, page state) with the next action: refine, fix-and-retry, extract, finish.
research_chain — web_search → scrape_page → summarize.
media_pipeline — download_file → ffmpeg → finish (gated).
risk_gate_agent — plans mixing safe + critical ops (transfer / download / system code).
autonomous_step — goal + state → the single next op (incl. finish): the loop primitive.
dev_dispatch — replay of inherited IDE/dev ops (anti-forgetting).

Safety: risk flag + mandatory deterministic backstop

Every op carries a learned risk flag (safe / critical) meant to drive a user-confirmation gate. The flag is advisory, not the safety boundary. The host application MUST enforce a deterministic policy that forces confirmation on known-dangerous operations regardless of the flag — in particular:

transfer_token (value movement) — always confirm; never auto-execute;
download_file, external api_call mutations, and execute_python that touches the filesystem / network / system — confirm;
run_command matching dangerous patterns (e.g. rm -rf, git push), git_push, write_file, open_app — confirm.

The flag may only harden (safe → critical), never permit. Treat a missing critical flag as a false negative to be caught by the backstop.

Intended use

The structured-action stage of an autonomous agent: research assistants, media-editing pipelines (ffmpeg), browser/YouTube navigation, code-execution loops, on-device automation. Runs fully offline; the executor supplies the actual tools.

Limitations (honest)

Short chains, looped — not long monolithic plans. Reactive 1–3-op emission is the model's strength; tasks needing one long exactly-ordered plan must be decomposed by the host loop into short steps. This is by design, not a regression.
~30M scale: limited open-ended reasoning and world knowledge; delegate hard reasoning via ask_llm to a larger model.
French covers agentic instructions, not free prose.
Tool set is fixed at fine-tune time; new tools require additional fine-tuning.
Benchmarks are synthetic (disjoint seed); they validate routing/format/risk-gating, not real-world tool success, which depends on the executor.

Lineage

SAM-G (base, dual-mode) → SAM-G-Reasoning → SAM-G-CobraTooling (IDE tools, robust) → SAM-G-Agent (autonomous tool dispatcher).

Disclosure

Architecture internals, tokenizer construction, data generators, and ablations are proprietary and withheld. This card documents the released artifact only.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for AMFORGE/sam-g-agent

Base model

AMFORGE/samg-cobratooling

Finetuned

(1)

this model