Robotics
Transformers
Safetensors
qwen2_5_vl
image-text-to-text
vision-language-action-model
vision-language-model
text-generation-inference
Instructions to use InternRobotics/InternVLA-M1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use InternRobotics/InternVLA-M1 with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("InternRobotics/InternVLA-M1") model = AutoModelForImageTextToText.from_pretrained("InternRobotics/InternVLA-M1") - Notebooks
- Google Colab
- Kaggle
Improve model card: Add pipeline tag, paper link, abstract, and sample usage
#2
by nielsr HF Staff - opened
This PR enhances the model card for InternVLA-M1 by:
- Updating the
licensetomitbased on the explicit badge in the GitHub repository. - Adding the
pipeline_tag: roboticsto the metadata, ensuring the model appears in the robotics pipeline filter on the Hugging Face Hub. - Including a direct link to the official Hugging Face paper page, InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy, in the model description.
- Adding the paper's abstract as a dedicated section for a quick overview.
- Including two detailed Python code snippets for "InternVLA-M1 Chat Demo (image Q&A / Spatial Grounding)" and "InternVLA-M1 Action Prediction Demo (two views)", extracted directly from the GitHub repository's
Quick Interactive M1 Demosection, to help users easily get started with the model. - Updating the
Citationsection with the more complete BibTeX entry from the GitHub README.
Please review and merge this PR.