Text Generation
Transformers
PyTorch
GGUF
Chinese
English
codeshell
wisdomshell
pku-kcl
openbankai
custom_code
Instructions to use WisdomShell/CodeShell-7B-Chat-int4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use WisdomShell/CodeShell-7B-Chat-int4 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="WisdomShell/CodeShell-7B-Chat-int4", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("WisdomShell/CodeShell-7B-Chat-int4", trust_remote_code=True, dtype="auto") - llama-cpp-python
How to use WisdomShell/CodeShell-7B-Chat-int4 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="WisdomShell/CodeShell-7B-Chat-int4", filename="codeshell-chat-q4_0.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use WisdomShell/CodeShell-7B-Chat-int4 with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf WisdomShell/CodeShell-7B-Chat-int4:Q4_0 # Run inference directly in the terminal: llama-cli -hf WisdomShell/CodeShell-7B-Chat-int4:Q4_0
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf WisdomShell/CodeShell-7B-Chat-int4:Q4_0 # Run inference directly in the terminal: llama-cli -hf WisdomShell/CodeShell-7B-Chat-int4:Q4_0
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf WisdomShell/CodeShell-7B-Chat-int4:Q4_0 # Run inference directly in the terminal: ./llama-cli -hf WisdomShell/CodeShell-7B-Chat-int4:Q4_0
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf WisdomShell/CodeShell-7B-Chat-int4:Q4_0 # Run inference directly in the terminal: ./build/bin/llama-cli -hf WisdomShell/CodeShell-7B-Chat-int4:Q4_0
Use Docker
docker model run hf.co/WisdomShell/CodeShell-7B-Chat-int4:Q4_0
- LM Studio
- Jan
- vLLM
How to use WisdomShell/CodeShell-7B-Chat-int4 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "WisdomShell/CodeShell-7B-Chat-int4" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "WisdomShell/CodeShell-7B-Chat-int4", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/WisdomShell/CodeShell-7B-Chat-int4:Q4_0
- SGLang
How to use WisdomShell/CodeShell-7B-Chat-int4 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "WisdomShell/CodeShell-7B-Chat-int4" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "WisdomShell/CodeShell-7B-Chat-int4", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "WisdomShell/CodeShell-7B-Chat-int4" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "WisdomShell/CodeShell-7B-Chat-int4", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Ollama
How to use WisdomShell/CodeShell-7B-Chat-int4 with Ollama:
ollama run hf.co/WisdomShell/CodeShell-7B-Chat-int4:Q4_0
- Unsloth Studio new
How to use WisdomShell/CodeShell-7B-Chat-int4 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for WisdomShell/CodeShell-7B-Chat-int4 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for WisdomShell/CodeShell-7B-Chat-int4 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for WisdomShell/CodeShell-7B-Chat-int4 to start chatting
- Docker Model Runner
How to use WisdomShell/CodeShell-7B-Chat-int4 with Docker Model Runner:
docker model run hf.co/WisdomShell/CodeShell-7B-Chat-int4:Q4_0
- Lemonade
How to use WisdomShell/CodeShell-7B-Chat-int4 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull WisdomShell/CodeShell-7B-Chat-int4:Q4_0
Run and chat with the model
lemonade run user.CodeShell-7B-Chat-int4-Q4_0
List all available models
lemonade list
CodeShell-7B-Chat-int4 在TGI下运行失败
#1
by melonrindrind - opened
你好,感谢开发者们开源项目,在本地使用TGI运行过程中我遇到了一些问题,由于3090显存不足无法运行7B未量化模型,我打算运行int4的量化模型,但 CodeShell-7B-Chat-int4 在TGI下运行失败,请问是什么问题
docker run --gpus 'all' --shm-size 20g -p 9090:80 -v /root/codeshell/model:/data --env LOG_LEVEL="info,text_generation_router=debug" ghcr.nju.edu.cn/huggingface/text-generation-inference:1.0.3 --model-id /data/C
odeShell-7B-Chat-int4 --num-shard 1 --max-total-tokens 5000 --max-input-length 4096 --max-stop-sequences 12 --trust-remote-code
2023-10-23T02:59:41.362133Z INFO text_generation_launcher: Args { model_id: "/data/CodeShell-7B-Chat-int4", revision: None, validation_workers: 2, sharded: None, num_shard: Some(1), quantize: None, dtype: None, trust_remote_code: true, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 12, max_top_n_tokens: 5, max_input_length: 4096, max_total_tokens: 5000, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: "f1b26d576e27", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false }
2023-10-23T02:59:41.362173Z WARN text_generation_launcher: `trust_remote_code` is set. Trusting that model `/data/CodeShell-7B-Chat-int4` do not contain malicious code.
2023-10-23T02:59:41.362290Z INFO download: text_generation_launcher: Starting download process.
2023-10-23T02:59:44.612287Z WARN text_generation_launcher: No safetensors weights found for model /data/CodeShell-7B-Chat-int4 at revision None. Converting PyTorch weights to safetensors.
Error: DownloadError
2023-10-23T02:59:47.269640Z ERROR download: text_generation_launcher: Download encountered an error: Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 195, in download_weights
utils.convert_files(local_pt_files, local_st_files, discard_names)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py", line 106, in convert_files
convert_file(pt_file, sf_file, discard_names)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py", line 68, in convert_file
to_removes = _remove_duplicate_names(loaded, discard_names=discard_names)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py", line 25, in _remove_duplicate_names
shareds = _find_shared_tensors(state_dict)
File "/opt/conda/lib/python3.9/site-packages/safetensors/torch.py", line 72, in _find_shared_tensors
if v.device != torch.device("meta") and storage_ptr(v) != 0 and storage_size(v) != 0:
AttributeError: 'list' object has no attribute 'device'
蹲, 同样的问题