Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction Paper ⢠2605.05242 ⢠Published 16 days ago ⢠107
view article Article DeepFabric: Generate, Train and Evaluate with Datasets curated for Model Behavior Training. lukehinds ⢠Dec 4, 2025 ⢠9
view article Article Preference Tuning LLMs with Direct Preference Optimization Methods +3 kashif, edbeeching, lewtun, lvwerra, osanseviero ⢠Jan 18, 2024 ⢠83
LLM Training Datasets Collection A collection of datasets for training LLMs. ⢠127 items ⢠Updated Mar 15 ⢠35
view article Article The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix codelion ⢠Nov 3, 2025 ⢠65
view article Article Training and Finetuning Reranker Models with Sentence Transformers tomaarsen ⢠Mar 26, 2025 ⢠194
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes Paper ⢠2311.13384 ⢠Published Nov 22, 2023 ⢠53