·
AI & ML interests
Large Language Models; Reasoning; Reinforcement Learning
Recent Activity
Organizations
TianHongZXY/CHIMERA-4B-RL
Text Generation
• 4B • Updated • 7
• 4
TianHongZXY/CHIMERA-4B-SFT
Text Generation
• 4B • Updated • 8
• • 2
4B • Updated • 2
TianHongZXY/Qwen2.5-Math-7B-GRPO
8B • Updated • 5
TianHongZXY/OpenR1-Math-46k-8192-Qwen2.5-7B-Instruct-GRPO-clip_0.28
Updated
TianHongZXY/Qwen2.5-Math-7B-W-REINFORCE
8B • Updated • 5
• 1
TianHongZXY/Qwen3-4B-GRPO
4B • Updated • 1
4B • Updated • 3
4B • Updated • 3
• 1
TianHongZXY/Qwen2.5-Math-7B-PPO
8B • Updated • 2
TianHongZXY/Qwen2.5-Math-7B-PSR
8B • Updated • 4
TianHongZXY/Qwen2.5-Math-7B-NSR
8B • Updated • 4
• 2