datasets for training ai models
Erik
Tralalabs
AI & ML interests
pretraining from scratch
Recent Activity
updated a dataset about 1 hour ago
Tralalabs/The-Iberiapedia-Pile published a dataset about 2 hours ago
Tralalabs/The-Iberiapedia-Pile updated a dataset about 2 hours ago
Tralalabs/summary_glOrganizations
PII
PII stuff on Hugging Face
-
kalyan-ks/ettin-68m-nemotron-pii
Token Classification • 68.5M • Updated • 702 • 7 -
fastino/gliner2-privacy-filter-PII-multi
Token Classification • 0.3B • Updated • 32.2k • 46 -
OpenMed/OpenMed-PII-SuperClinical-Large-434M-v1
Token Classification • 0.4B • Updated • 66.7k • • 17 -
bardsai/eu-pii-anonimization-multilang
Token Classification • 0.3B • Updated • 2.4k • 15
AI Datasets
datasets for training ai models
PII
PII stuff on Hugging Face
-
kalyan-ks/ettin-68m-nemotron-pii
Token Classification • 68.5M • Updated • 702 • 7 -
fastino/gliner2-privacy-filter-PII-multi
Token Classification • 0.3B • Updated • 32.2k • 46 -
OpenMed/OpenMed-PII-SuperClinical-Large-434M-v1
Token Classification • 0.4B • Updated • 66.7k • • 17 -
bardsai/eu-pii-anonimization-multilang
Token Classification • 0.3B • Updated • 2.4k • 15
spaces 34
Paused
Agents
THMEXAAI
🌍
THMEXAAI lets you chat with SLMs.
Running
Agents
TralaIndo 32K Tokenizer
🚀
How many? Designed for Indonesian text.
Running
Agents
Huggingpedia
🏃
Wikipedia but in Hugging Face.
Starting
Agents
KnowledgeGPT
👁
No LLMs. Just prompts 😎.
Sleeping
Agents
Gpt2 Large
🏃
GPT-2 Large inference. Nostalgic, right? Old-school era AI.
Runtime error
Agents
Tokenizer 3min 32k Playground
🐨
playground of tokenizer Tralalabs/tokenizer-3min-32k.
models 63
Tralalabs/Qwen2.5-7B-Tulu3-PEFT-LoRA
Text Generation • Updated
Tralalabs/TralaIndo-32K-Tokenizer-HF
Updated
Tralalabs/TralaIndo-32K-Tokenizer
Updated
Tralalabs/Reddit-posts-tokenizer
Updated
Tralalabs/TinyGPT-8M
7.24M • Updated • 94
Tralalabs/Pythia-6.9B-Instruct-v1-Merged
Text Generation • 7B • Updated • 186 • 1
Tralalabs/Pythia-6.9B-Instruct-v1-LoRA
Text Generation • Updated
Tralalabs/OpenClaude-1.7B-Merged-Q3_K_M-GGUF
2B • Updated • 168
Tralalabs/CHEETAH-350M-Merged-FP16-Q6_K-GGUF
Text Generation • 0.4B • Updated • 154
Tralalabs/CHEETAH-350M-Merged-FP16
Text Generation • 0.4B • Updated • 144
datasets 118
Tralalabs/The-Iberiapedia-Pile
Updated
Tralalabs/summary_gl
Updated
Tralalabs/Tralish-general
Updated
Tralalabs/LarpAI-IF-Dataset
Updated • 9
Tralalabs/Reddit-Questions-and-ETC
Updated • 5
Tralalabs/image-descriptions
Updated • 4
Tralalabs/doraemon-info-dataset
Updated • 6
Tralalabs/ultrachat-uncensored-messages
Updated • 7
Tralalabs/YouTube-Comments
Updated • 21
Tralalabs/google-conversations
Updated • 10