2 5

chengluo PRO

wdlctc

AI & ML interests

None yet

Recent Activity

published a model 9 days ago

wdlctc/open-attnres-0.6b-full

updated a model 9 days ago

wdlctc/open-attnres-0.6b-block

published a model 9 days ago

wdlctc/open-attnres-0.6b-block

View all activity

Organizations

None yet

published a model 9 days ago

wdlctc/open-attnres-0.6b-full

Updated 9 days ago

updated a model 9 days ago

wdlctc/open-attnres-0.6b-block

0.5B • Updated 9 days ago • 11

published a model 9 days ago

wdlctc/open-attnres-0.6b-block

0.5B • Updated 9 days ago • 11

updated a model 9 days ago

wdlctc/open-attnres-0.6b-baseline

0.5B • Updated 9 days ago • 15

published a model 9 days ago

wdlctc/open-attnres-0.6b-baseline

0.5B • Updated 9 days ago • 15

upvoted a paper 9 days ago

Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?

Paper • 2603.24472 • Published 11 days ago • 48

updated a model 10 days ago

wdlctc/open-attnres-full

0.1B • Updated 10 days ago • 12

published a model 10 days ago

wdlctc/open-attnres-full

0.1B • Updated 10 days ago • 12

updated a model 10 days ago

wdlctc/open-attnres-block

0.1B • Updated 10 days ago • 16

published a model 10 days ago

wdlctc/open-attnres-block

0.1B • Updated 10 days ago • 16

updated a model 10 days ago

wdlctc/open-attnres-baseline

0.1B • Updated 10 days ago • 12

published a model 10 days ago

wdlctc/open-attnres-baseline

0.1B • Updated 10 days ago • 12

upvoted a paper 6 months ago

Prosperity before Collapse: How Far Can Off-Policy RL Reach with Stale Data on LLMs?

Paper • 2510.01161 • Published Oct 1, 2025 • 14

upvoted a paper 10 months ago

Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation

Paper • 2506.09991 • Published Jun 11, 2025 • 55

upvoted an article about 1 year ago

Article

SmolLM - blazingly fast and remarkably powerful

Jul 16, 2024

•

449

authored a paper about 1 year ago

HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading

Paper • 2502.12574 • Published Feb 18, 2025 • 13

upvoted a paper about 1 year ago

HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading

Paper • 2502.12574 • Published Feb 18, 2025 • 13

commented a paper about 1 year ago

HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading

Paper • 2502.12574 • Published Feb 18, 2025 • 13 •

authored a paper over 1 year ago

MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training

Paper • 2407.15892 • Published Jul 22, 2024

New activity in tiiuae/falcon-mamba-7b-instruct over 1 year ago

About finetuning

#6 opened over 1 year ago by

Xiangyu1

chengluo PRO

AI & ML interests

Recent Activity

Organizations

wdlctc's activity

SmolLM - blazingly fast and remarkably powerful

About finetuning