Instructions to use FastVideo/FastMochi-diffusers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use FastVideo/FastMochi-diffusers with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("FastVideo/FastMochi-diffusers", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
| language: | |
| - "en" | |
| tags: | |
| - video | |
| license: apache-2.0 | |
| pipeline_tag: text-to-video | |
| library_name: diffusers | |
| <p align="center"> | |
| <img src="assets/logo.jpg" height=30> | |
| </p> | |
| # FastMochi Model Card | |
| ## Model Details | |
| <div align="center"> | |
| <table style="margin-left: auto; margin-right: auto; border: none;"> | |
| <tr> | |
| <td> | |
| <img src="assets/mochi-demo.gif" width="640" alt="Mochi Demo"> | |
| </td> | |
| </tr> | |
| <tr> | |
| <td style="text-align:center;"> | |
| Get 8X diffusion boost for Mochi with FastVideo | |
| </td> | |
| </tr> | |
| </table> | |
| </div> | |
| FastMochi is an accelerated [Mochi](https://huggingface.co/genmo/mochi-1-preview) model. It can sample high quality videos with 8 diffusion steps. That brings around 8X speed up compared to the original Mochu with 64 steps. | |
| - **Developed by**: [Hao AI Lab](https://hao-ai-lab.github.io/) | |
| - **License**: Apache-2.0 | |
| - **Distilled from**: [Mochi](https://huggingface.co/genmo/mochi-1-preview) | |
| - **Github Repository**: https://github.com/hao-ai-lab/FastVideo | |
| ## Usage | |
| - Clone [Fastvideo](https://github.com/hao-ai-lab/FastVideo) repository and follow the inference instructions in the README. | |
| - You can also run FastMochi using the official [Mochi repository](https://github.com/Tencent/HunyuanVideo) with the script below and this [compatible weight](https://huggingface.co/FastVideo/FastMochi). | |
| <details> | |
| <summary>Code</summary> | |
| ```python | |
| from genmo.mochi_preview.pipelines import ( | |
| DecoderModelFactory, | |
| DitModelFactory, | |
| MochiMultiGPUPipeline, | |
| T5ModelFactory, | |
| linear_quadratic_schedule, | |
| ) | |
| from genmo.lib.utils import save_video | |
| import os | |
| with open("prompt.txt", "r") as f: | |
| prompts = [line.rstrip() for line in f] | |
| pipeline = MochiMultiGPUPipeline( | |
| text_encoder_factory=T5ModelFactory(), | |
| world_size=4, | |
| dit_factory=DitModelFactory( | |
| model_path=f"weights/dit.safetensors", model_dtype="bf16" | |
| ), | |
| decoder_factory=DecoderModelFactory( | |
| model_path=f"weights/decoder.safetensors", | |
| ), | |
| ) | |
| # read prompt line by line from prompt.txt | |
| output_dir = "outputs" | |
| os.makedirs(output_dir, exist_ok=True) | |
| for i, prompt in enumerate(prompts): | |
| video = pipeline( | |
| height=480, | |
| width=848, | |
| num_frames=163, | |
| num_inference_steps=8, | |
| sigma_schedule=linear_quadratic_schedule(8, 0.1, 6), | |
| cfg_schedule=[1.5] * 8, | |
| batch_cfg=False, | |
| prompt=prompt, | |
| negative_prompt="", | |
| seed=12345, | |
| )[0] | |
| save_video(video, f"{output_dir}/output_{i}.mp4") | |
| ``` | |
| </details> | |
| ## Training details | |
| FastMochi is consistency distillated on the [MixKit](https://huggingface.co/datasets/LanguageBind/Open-Sora-Plan-v1.1.0/tree/main) dataset with the following hyperparamters: | |
| - Batch size: 32 | |
| - Resulotion: 480X848 | |
| - Num of frames: 169 | |
| - Train steps: 128 | |
| - GPUs: 16 | |
| - LR: 1e-6 | |
| - Loss: huber | |
| ## Evaluation | |
| We provide some qualitative comparisons between FastMochi 8 step inference v.s. the original Mochi with 8 step inference: | |
| | FastMochi 6 steps | Mochi 6 steps | | |
| | --- | --- | | |
| |  |  | | |
| |  |  | | |
| |  |  | | |
| |  |  | | |