World-R1: Reinforcing 3D Constraints for Text-to-Video Generation Paper • 2604.24764 • Published 5 days ago • 113
AVControl: Efficient Framework for Training Audio-Visual Controls Paper • 2603.24793 • Published Mar 25 • 27
MOSS-Audio Collection An open-source audio understanding model supporting speech recognition, environmental sound analysis, music understanding, time-aware QA, and complex • 5 items • Updated 11 days ago • 52
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds Paper • 2604.14268 • Published 17 days ago • 116
Seedance 2.0: Advancing Video Generation for World Complexity Paper • 2604.14148 • Published 17 days ago • 153
WAON Collection WAON: Large-Scale and High-Quality Japanese Image-Text Pair Dataset for Vision-Language Models • 4 items • Updated Mar 2 • 2
Marco-MoE Collection A suit of multilingual MoE models with highly-sparse architectures • 5 items • Updated 23 days ago • 16
view article Article Welcome Gemma 4: Frontier multimodal intelligence on device +5 30 days ago • 884