glaiveai/glaive-function-calling-v2
Viewer • Updated • 113k • 46.5k • 504
How to use srini98/mistral-function-calling with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="srini98/mistral-function-calling") # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("srini98/mistral-function-calling")
model = AutoModelForCausalLM.from_pretrained("srini98/mistral-function-calling")How to use srini98/mistral-function-calling with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "srini98/mistral-function-calling"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "srini98/mistral-function-calling",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/srini98/mistral-function-calling
How to use srini98/mistral-function-calling with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "srini98/mistral-function-calling" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "srini98/mistral-function-calling",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "srini98/mistral-function-calling" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "srini98/mistral-function-calling",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use srini98/mistral-function-calling with Docker Model Runner:
docker model run hf.co/srini98/mistral-function-calling
The model was finetuned using the glaive dataset using qlora and full finetuning using FSDP.
Dataset link : Link here
For training , inference and evaluation kindly check this repository:
https://github.com/Srini-98/Function-Calling-Using-Mistral
Use the following prompt format
SYSTEM: You are a helpful assistant with access to the following functions. Use them if required -
{
"name": "function_name",
"description": "description",
"parameters": {
"type": "object",
"properties": {
"param_name1": {
"type": "string",
"description": "description of param"
},
"param_name2": {
"type": "string",
"description": "description of param"
},
"param_name3":{
"type: "string",
"description" : "description of param"
}
},
"required": [
"param_name1",
]
}
}
USER: {question here}
ASSISTANT: {model answer} <|endoftext|>
Example:
SYSTEM: You are a helpful assistant with access to the following functions. Use them if required -
{
"name": "calculate_tax",
"description": "Calculate the tax amount",
"parameters": {
"type": "object",
"properties": {
"income": {
"type": "number",
"description": "The income amount"
}
},
"required": [
"income"
]
}
}
USER: Hi, I need to calculate my tax for this year. My income is $70,000.
ASSISTANT: <functioncall> {"name": "calculate_tax", "arguments": '{"income": 70000}'} <|endoftext|>
FUNCTION RESPONSE: {"tax_amount": 17500}
ASSISTANT: Based on your income, your tax for this year is $17,500. <|endoftext|>
The answer generation can be stopped with the <|endoftext|> token. You can add multiple functions as well and set param names. "Required" field forces model to always call that param.