shuoxing/llama3-8b-full-pretrain-wash-c4-2-4m-sft-bs64 Text Generation • 8B • Updated 16 days ago • 214
shuoxing/llama3-8b-full-pretrain-wash-c4-2-1m-sft-bs64 Text Generation • 8B • Updated 16 days ago • 228
shuoxing/llama3-8b-full-pretrain-wash-c4-1-8m-sft-bs64 Text Generation • 8B • Updated 16 days ago • 242
shuoxing/llama3-8b-full-pretrain-wash-c4-1-5m-sft-bs64 Text Generation • 8B • Updated 16 days ago • 271
shuoxing/llama3-8b-full-pretrain-wash-c4-1-2m-sft-bs64 Text Generation • 8B • Updated 16 days ago • 276
shuoxing/llama3-8b-full-pretrain-wash-c4-0-9m-sft-bs64 Text Generation • 8B • Updated 16 days ago • 297
shuoxing/llama3-8b-full-pretrain-wash-c4-0-6m-sft-bs64 Text Generation • 8B • Updated 16 days ago • 307
shuoxing/llama3-8b-full-pretrain-wash-c4-0-3m-sft-bs64 Text Generation • 8B • Updated 16 days ago • 310
shuoxing/qwen2-5-7b-full-sft-control-tweet-1m-en-reproduce-bs128 Text Generation • 333k • Updated Jan 26 • 6
shuoxing/qwen2-5-7b-full-sft-mix-high-tweet-1m-en-reproduce-bs128 Text Generation • 333k • Updated Jan 26 • 3
shuoxing/qwen2-5-7b-full-sft-mix-mid-tweet-1m-en-reproduce-bs128 Text Generation • 333k • Updated Jan 25 • 3
shuoxing/qwen2-5-7b-full-sft-mix-low-tweet-1m-en-reproduce-bs128 Text Generation • 333k • Updated Jan 25 • 5
shuoxing/qwen3-4b-full-sft-control-tweet-1m-en-reproduce-bs128 Text Generation • 196k • Updated Jan 25 • 5
shuoxing/qwen3-4b-full-sft-mix-high-tweet-1m-en-reproduce-bs128 Text Generation • 196k • Updated Jan 25 • 4
shuoxing/qwen3-4b-full-sft-mix-mid-tweet-1m-en-reproduce-bs128 Text Generation • 196k • Updated Jan 25 • 2