Negligible in Size, Significant in Effect: On Scale Vectors in Large Language Models Paper • 2605.26895 • Published 5 days ago • 15
Linear-Time Global Visual Modeling without Explicit Attention Paper • 2605.01711 • Published 28 days ago • 7
ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation Paper • 2601.21420 • Published Jan 29 • 42
Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts Paper • 2503.16057 • Published Mar 20, 2025 • 15
Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning Paper • 2504.13914 • Published Apr 10, 2025 • 6
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning Paper • 2508.18756 • Published Aug 26, 2025 • 36
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning Paper • 2508.18756 • Published Aug 26, 2025 • 36
Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation Paper • 2508.05635 • Published Aug 7, 2025 • 73
Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts Paper • 2503.16057 • Published Mar 20, 2025 • 15 • 2
Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts Paper • 2503.16057 • Published Mar 20, 2025 • 15
Frac-Connections: Fractional Extension of Hyper-Connections Paper • 2503.14125 • Published Mar 18, 2025 • 22
Frac-Connections: Fractional Extension of Hyper-Connections Paper • 2503.14125 • Published Mar 18, 2025 • 22 • 5
Frac-Connections: Fractional Extension of Hyper-Connections Paper • 2503.14125 • Published Mar 18, 2025 • 22
Frac-Connections: Fractional Extension of Hyper-Connections Paper • 2503.14125 • Published Mar 18, 2025 • 22 • 5
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling Paper • 2501.16975 • Published Jan 28, 2025 • 32
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling Paper • 2501.16975 • Published Jan 28, 2025 • 32