WTF is Fine-Tuning? (intro4devs)

Fine-tuning your LLM is like min-maxing your ARPG hero so you can push high-level dungeons and get the most out of your build/gear... Makes sense, right? 😃

Here's a cheat sheet for devs (but open to anyone!)

---

TL;DR

- Full Fine-Tuning: Max performance, high resource needs, best reliability.
- PEFT: Efficient, cost-effective, mainstream, enhanced by AutoML.
- Instruction Fine-Tuning: Ideal for command-following AI, often combined with RLHF and CoT.
- RAFT: Best for fact-grounded models with dynamic retrieval.
- RLHF: Produces ethical, high-quality conversational AI, but expensive.

Choose wisely and match your approach to your task, budget, and deployment constraints.

I just posted the full extended article here
if you want to continue reading >>>

https://huggingface.co/blog/tegridydev/fine-tuning-dev-intro-2025

reacted to Kseniase's post with 👍 over 1 year ago

Post

3356

8 New Applications of Test-Time Scaling

We've noticed a huge interest in test-time scaling (TTS), so we decided to explore this concept further. Test-time compute (TTC) refers to the amount of computational power used by an AI model when generating a response. Many researchers are now focused on scaling TTC, as it enables slow, deep "thinking" and step-by-step reasoning, which improves overall models' performance.

Here are 8 fresh studies on test-time scaling:

1. Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach (2502.05171)
Introduces an LM that scales TTC by reasoning in latent space instead of generating more tokens with no special training. Here, a recurrent block to processes information iteratively.

2. Generating Symbolic World Models via Test-time Scaling of Large Language Models (2502.04728)
Shows how TTS is applied to enhance model's Planning Domain Definition Language (PDDL) reasoning capabilities, which can be used to generate a symbolic world model.

3. Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling (2502.06703)
Analyzes optimal TTS strategies and shows how small models can outperform much larger ones.

4. Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis (2502.04128)
Shows how TTS improves expressiveness, timbre consistency and accuracy in speech synthesis with Llasa framework. It also dives into benefits of scaling train-time compute.

5. Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning (2502.07154)
Suggests a modified training loss for better reasoning of LLMs when scaling TTC.

6. Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures (2502.05078)
Unifies the strengths of chain, tree, and graph paradigms into one framework that expands reasoning only on necessary subproblems.

7. Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification (2502.01839)
Explores scaling trends of self-verification and how to improve its capabilities with TTC.

8. CodeMonkeys: Scaling Test-Time Compute for Software Engineering (2501.14723)
Explores how scaling serial compute (iterations) and parallel compute (trajectories), can improve accuracy in real-world software engineering issues.

Also, explore our article about TTS for more -> https://huggingface.co/blog/Kseniase/testtimecompute

1 reply

upvoted 2 papers over 1 year ago

Large Language Models Think Too Fast To Explore Effectively

Paper • 2501.18009 • Published Jan 29, 2025 • 23

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Paper • 2501.18585 • Published Jan 30, 2025 • 61

Adrian Chan

AI & ML interests

Recent Activity

Organizations

gravity7's activity

Argunauts: Open LLMs that Master Argument Analysis with Argdown