Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation Paper • 2604.24763 • Published 5 days ago • 64
Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising Paper • 2604.26694 • Published 3 days ago • 6
Nano-World-Model Collection 🌍 A minimalist repository for training video world models based on diffusion-forcing. • 17 items • Updated 1 day ago
Nano-World-Model Collection 🌍 A minimalist repository for training video world models based on diffusion-forcing. • 17 items • Updated 1 day ago
Nano-World-Model Collection 🌍 A minimalist repository for training video world models based on diffusion-forcing. • 17 items • Updated 1 day ago
Nano-World-Model Collection 🌍 A minimalist repository for training video world models based on diffusion-forcing. • 17 items • Updated 1 day ago
Nano-World-Model Collection 🌍 A minimalist repository for training video world models based on diffusion-forcing. • 17 items • Updated 1 day ago