InternRobotics
/

InternVLA-M1

image-text-to-text

vision-language-action-model

vision-language-model

text-generation-inference

Model card Files Files and versions

Improve model card: Add pipeline tag, paper link, abstract, and sample usage

#2

by nielsr HF Staff - opened Oct 16, 2025

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

This PR enhances the model card for InternVLA-M1 by:

Updating the license to mit based on the explicit badge in the GitHub repository.
Adding the pipeline_tag: robotics to the metadata, ensuring the model appears in the robotics pipeline filter on the Hugging Face Hub.
Including a direct link to the official Hugging Face paper page, InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy, in the model description.
Adding the paper's abstract as a dedicated section for a quick overview.
Including two detailed Python code snippets for "InternVLA-M1 Chat Demo (image Q&A / Spatial Grounding)" and "InternVLA-M1 Action Prediction Demo (two views)", extracted directly from the GitHub repository's Quick Interactive M1 Demo section, to help users easily get started with the model.
Updating the Citation section with the more complete BibTeX entry from the GitHub README.

Please review and merge this PR.

Improve model card: Add pipeline tag, paper link, abstract, and sample usage8e1fa83f

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment