SmolKalam AdaMLLab/SmolKalam-Arabic-Conversational-SFT Updated about 8 hours ago • 1 SmolKalam: Ensemble Quality-Filtered Translation at Scale for High Quality Arabic Post-Training Data Paper • 2511.18411 • Published Nov 23, 2025 • 1
SmolKalam: Ensemble Quality-Filtered Translation at Scale for High Quality Arabic Post-Training Data Paper • 2511.18411 • Published Nov 23, 2025 • 1
MixMinMatch Collection of datasets from MixMinMatch work. Mix, MinHash, and Match: Cross-Source Agreement for Multilingual Pretraining Datasets Paper • 2512.18834 • Published Dec 21, 2025 • 4 AdaMLLab/AraMix Viewer • Updated Jan 30 • 394M • 2.77k • 6 AdaMLLab/TurMix Viewer • Updated Jan 30 • 681M • 2.52k • 7 AdaMLLab/HinMix Viewer • Updated Jan 30 • 179M • 2.3k • 1
Mix, MinHash, and Match: Cross-Source Agreement for Multilingual Pretraining Datasets Paper • 2512.18834 • Published Dec 21, 2025 • 4
SmolKalam AdaMLLab/SmolKalam-Arabic-Conversational-SFT Updated about 8 hours ago • 1 SmolKalam: Ensemble Quality-Filtered Translation at Scale for High Quality Arabic Post-Training Data Paper • 2511.18411 • Published Nov 23, 2025 • 1
SmolKalam: Ensemble Quality-Filtered Translation at Scale for High Quality Arabic Post-Training Data Paper • 2511.18411 • Published Nov 23, 2025 • 1
MixMinMatch Collection of datasets from MixMinMatch work. Mix, MinHash, and Match: Cross-Source Agreement for Multilingual Pretraining Datasets Paper • 2512.18834 • Published Dec 21, 2025 • 4 AdaMLLab/AraMix Viewer • Updated Jan 30 • 394M • 2.77k • 6 AdaMLLab/TurMix Viewer • Updated Jan 30 • 681M • 2.52k • 7 AdaMLLab/HinMix Viewer • Updated Jan 30 • 179M • 2.3k • 1
Mix, MinHash, and Match: Cross-Source Agreement for Multilingual Pretraining Datasets Paper • 2512.18834 • Published Dec 21, 2025 • 4