add newlines and thinking tokens to template to avoid having to compute 3 extra tokens per generation in chat completion+reasoning

#83

annoyed angry miku pointing gen ComfyUI 2025-06-16-15_00011_(1)
This updated template prefills the tokens the model would have generated itself to begin the thinking process.

Behavior with current template:

Thinking enabled prefill: <|turn>model\n
Model then generates: <|channel>thought\n
3 tokens are generated before beginning the thinking process. Wasted compute.

Thinking disabled prefill: <|turn>model\n
Template adds this: <|channel>thought\n<channel|>
No extra tokens generated, fine.

Now with the improved template:

Thinking enabled prefill: <|turn>model\n<|channel>thought\n
No extra tokens generated. Model starts generating the thinking process without first having to generate the 3 extra tokens.

Thinking disabled prefill: <|turn>model\n<|channel>thought\n
Template adds this: <channel|>
No extra tokens generated, same as original template.

quasar-of-mikus changed pull request title from add newlines and thinking tokens to template to avoid having to compute 3 extra tokens per generation in chat completion to add newlines and thinking tokens to template to avoid having to compute 3 extra tokens per generation in chat completion+reasoning

What does 65b7c980cdaebfe1349b1aa5/yyYc6Bt18-AfmyZbIPCX5.png have to do with this?

@drumnbass You must search within yourself to find the true answer.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment