bluelightai-dev/dolma-pretrain-mix-tokenized-gemma-3-eval
Viewer
• Updated • 211k • 14
bluelightai-dev/dolma-pretrain-mix-tokenized-gemma-3
Viewer
• Updated • 1.87M • 34
bluelightai-dev/nemotron-pretrain-mix-tokenized-nemotron-eval
Viewer
• Updated • 250k • 15
bluelightai-dev/nemotron-pretrain-mix-tokenized-nemotron-4B
Viewer
• Updated • 4M • 20
bluelightai-dev/clt-eval-modernbert-tokenized
Viewer
• Updated • 328k • 26
bluelightai-dev/clt-train-modernbert-tokenized
Viewer
• Updated • 1.94M • 37
bluelightai-dev/clt-pretrain-data-v3-eval-tokenized-Qwen3-256
Viewer
• Updated • 212k • 18
bluelightai-dev/clt-pretrain-data-v3-tokenized-Qwen3-max-1024
Viewer
• Updated • 4.04M • 73
bluelightai-dev/clt-pretrain-data-v3-tokenized-qwen3
Viewer
• Updated • 1.81M • 46
bluelightai-dev/clt-pretrain-data-v3
Viewer
• Updated • 2.99M • 73
bluelightai-dev/dolma3_dolmino_mix-100B-1125-sample
Viewer
• Updated • 6.32M • 39
bluelightai-dev/dolma3_mix-150B-1025-sample
Viewer
• Updated • 4.97M • 135
bluelightai-dev/clt-mixed-eval-data-tokenized-Qwen3
Viewer
• Updated • 115k • 36
bluelightai-dev/clt-mixed-eval-data
Viewer
• Updated • 60k • 12
bluelightai-dev/clt-mixed-data-tokenized-Qwen3
Viewer
• Updated • 2.6M • 24
bluelightai-dev/clt-pretrain-eval-data-tokenized-Qwen3-256
Viewer
• Updated • 194k • 49
bluelightai-dev/clt-pretrain-data-dedup-tokenized-Qwen3-1024
Viewer
• Updated • 2.52M • 35
bluelightai-dev/clt-pretrain-data-v2-dedup
Preview
• Updated • 6
bluelightai-dev/clt-pretrain-data-tokenized-Qwen3-1024
Viewer
• Updated • 2.44M • 37
bluelightai-dev/clt-pretrain-data-v2
Preview
• Updated • 34
bluelightai-dev/MathPile_Commercial-formatted
Viewer
• Updated • 389k • 20
bluelightai-dev/clt_posttrain_data_tokenized
Viewer
• Updated • 1.34M • 45
bluelightai-dev/common-corpus-sample-open-web
Viewer
• Updated • 4.8M • 21
bluelightai-dev/common-corpus-sample-open-source
Viewer
• Updated • 2.02M • 8
bluelightai-dev/common-corpus-sample-open-science
Viewer
• Updated • 284k • 11
bluelightai-dev/common-corpus-sample-open-government
Viewer
• Updated • 373k • 52
• 1
bluelightai-dev/common-corpus-sample-open-culture
Viewer
• Updated • 462k • 18
bluelightai-dev/clt_posttrain_data_tokenized_test_1000
Viewer
• Updated • 1.22k • 3
bluelightai-dev/dclm-full-deduped-sample
Viewer
• Updated • 4.92M • 19
bluelightai-dev/the-stack-dedup-sample
Viewer
• Updated • 474k • 23