3 13

Defa Zhu

mathfinder

https://zhudefa.github.io/

mathfinder

AI & ML interests

None yet

Recent Activity

upvoted a paper 3 days ago

Negligible in Size, Significant in Effect: On Scale Vectors in Large Language Models

upvoted a paper 23 days ago

Linear-Time Global Visual Modeling without Explicit Attention

upvoted a paper 4 months ago

ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation

View all activity

Organizations

None yet

upvoted a paper 3 days ago

Negligible in Size, Significant in Effect: On Scale Vectors in Large Language Models

Paper • 2605.26895 • Published 5 days ago • 15

upvoted a paper 23 days ago

Linear-Time Global Visual Modeling without Explicit Attention

Paper • 2605.01711 • Published 28 days ago • 7

upvoted a paper 4 months ago

ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation

Paper • 2601.21420 • Published Jan 29 • 42

upvoted a paper 5 months ago

mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published Dec 31, 2025 • 328

authored 4 papers 6 months ago

Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts

Paper • 2503.16057 • Published Mar 20, 2025 • 15

Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning

Paper • 2504.13914 • Published Apr 10, 2025 • 6

UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning

Paper • 2508.18756 • Published Aug 26, 2025 • 36

Virtual Width Networks

Paper • 2511.11238 • Published Nov 14, 2025 • 39

upvoted a paper 6 months ago

Virtual Width Networks

Paper • 2511.11238 • Published Nov 14, 2025 • 39

upvoted a paper 9 months ago

UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning

Paper • 2508.18756 • Published Aug 26, 2025 • 36

upvoted a paper 10 months ago

Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation

Paper • 2508.05635 • Published Aug 7, 2025 • 73

commented a paper about 1 year ago

Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts

Paper • 2503.16057 • Published Mar 20, 2025 • 15 •

upvoted a paper about 1 year ago

Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts

Paper • 2503.16057 • Published Mar 20, 2025 • 15

authored a paper about 1 year ago

Frac-Connections: Fractional Extension of Hyper-Connections

Paper • 2503.14125 • Published Mar 18, 2025 • 22

commented a paper about 1 year ago

Frac-Connections: Fractional Extension of Hyper-Connections

Paper • 2503.14125 • Published Mar 18, 2025 • 22 •

upvoted a paper about 1 year ago

Frac-Connections: Fractional Extension of Hyper-Connections

Paper • 2503.14125 • Published Mar 18, 2025 • 22

commented a paper about 1 year ago

Frac-Connections: Fractional Extension of Hyper-Connections

Paper • 2503.14125 • Published Mar 18, 2025 • 22 •

authored a paper over 1 year ago

Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling

Paper • 2501.16975 • Published Jan 28, 2025 • 32

upvoted a paper over 1 year ago

Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling

Paper • 2501.16975 • Published Jan 28, 2025 • 32

authored a paper over 1 year ago

Ultra-Sparse Memory Network

Paper • 2411.12364 • Published Nov 19, 2024 • 23

Defa Zhu

AI & ML interests

Recent Activity

Organizations

mathfinder's activity