opencsg
/

OpenCSG-R1-Qwen2.5-Code-3B-V1

Text Generation

text-generation-inference

Model card Files Files and versions

OpenCSG-R1-Qwen2.5-Code-3B-V1 / README.md

TomPei's picture

Update README.md (#2)

fc93ef9 verified about 1 year ago

|

history blame contribute delete

3.11 kB

	---
	license: apache-2.0
	datasets:
	- open-r1/OpenThoughts-114k-Code_decontaminated
	base_model:
	- Qwen/Qwen2.5-Coder-3B-Instruct
	library_name: transformers
	tags:
	- code
	- grpo
	- open-r1
	---

	# Model Card for OpenCSG-R1-Qwen2.5-Code-3B-V1

	This model is a fine-tuned version of [Qwen/Qwen2.5-Coder-3B-Instruct] (https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct) on the [open-r1/OpenThoughts-114k-Code_decontaminated] datasets.
	It has been trained using [TRL](https://github.com/huggingface/trl).

	## Quick start
	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch
	import pandas as pd

	model_name = "/data/project/pj/r1/opencsg-r1/open-r1/train/Qwen2.5-3B-Open-R1-Code-GRPO/checkpoint-150"
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype="auto",
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=False)
	df = pd.read_parquet('/data/project/pj/r1/opencsg-r1/OpenThoughts-114k-Code_decontaminated/train-00000-of-00006.parquet')
	data = df['problem'][0]
	messages = [
	{
	"role": "user",
	"content": f"Please help me solve the problem: {data}.Output the thinking process within the <think> </think> tags,and then return the final result within the <answer> </answer> tags.",
	},
	{
	"role": "assistant",
	"content": "Let's solve the problem step by step.\n<think>",
	},
	]
	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	continue_final_message=True,
	# add_generation_prompt=True
	)
	model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

	generated_ids = model.generate(
	**model_inputs,
	max_new_tokens=1024,
	temperature=0.6
	)
	generated_ids = [
	output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
	]

	response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
	print(response)
	```

	This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).

	### Framework versions

	- TRL: 0.15.2
	- Transformers: 4.49.0
	- Pytorch: 2.5.1
	- Datasets: 3.3.2
	- Tokenizers: 0.21.0

	## Citations

	Cite GRPO as:

	```bibtex
	@article{zhihong2024deepseekmath,
	title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
	author = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
	year = 2024,
	eprint = {arXiv:2402.03300},
	}

	```

	Cite TRL as:

	```bibtex
	@misc{vonwerra2022trl,
	title = {{TRL: Transformer Reinforcement Learning}},
	author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
	year = 2020,
	journal = {GitHub repository},
	publisher = {GitHub},
	howpublished = {\url{https://github.com/huggingface/trl}}
	}
	```