Instructions to use InMedData/InMD-X-RHE with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use InMedData/InMD-X-RHE with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="InMedData/InMD-X-RHE")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("InMedData/InMD-X-RHE", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use InMedData/InMD-X-RHE with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "InMedData/InMD-X-RHE" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "InMedData/InMD-X-RHE", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/InMedData/InMD-X-RHE
- SGLang
How to use InMedData/InMD-X-RHE with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "InMedData/InMD-X-RHE" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "InMedData/InMD-X-RHE", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "InMedData/InMD-X-RHE" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "InMedData/InMD-X-RHE", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use InMedData/InMD-X-RHE with Docker Model Runner:
docker model run hf.co/InMedData/InMD-X-RHE
| library_name: transformers | |
| tags: | |
| - medical | |
| license: cc-by-nc-sa-4.0 | |
| language: | |
| - en | |
| pipeline_tag: text-generation | |
| base_model: Intel/neural-chat-7b-v3-1 | |
| ## InMD-X: Large Language Models for Internal Medicine Doctors | |
| We introduce InMD-X, a collection of | |
| multiple large language models specifically designed | |
| to cater to the unique characteristics and demands | |
| of Internal Medicine Doctors (IMD). InMD-X represents | |
| a groundbreaking development in natural language | |
| processing, offering a suite of language models | |
| fine-tuned for various aspects of the internal medicine | |
| field. These models encompass a wide range of medical | |
| sub-specialties, enabling IMDs to perform more | |
| efficient and accurate research, diagnosis, and documentation. | |
| InMD-X’s versatility and adaptability | |
| make it a valuable tool for improving the healthcare | |
| industry, enhancing communication between healthcare | |
| professionals, and advancing medical research. | |
| Each model within InMD-X is meticulously tailored | |
| to address specific challenges faced by IMDs, ensuring | |
| the highest level of precision and comprehensiveness | |
| in clinical text analysis and decision support. | |
| (This model card is for the RHEUMATOLOGY subspecialty.) | |
| ### Model Description | |
| <!-- Provide a longer summary of what this model is. --> | |
| - **Model type:** [CausalLM] | |
| - **Language(s) (NLP):** [English] | |
| - **License:** [CC-BY-NC-SA](https://creativecommons.org/licenses/by-nc-sa/4.0/) | |
| - **Finetuned from model [optional]:** [Intel/neural-chat-7b-v3-1](https://huggingface.co/Intel/neural-chat-7b-v3-1) | |
| ### Model Sources [optional] | |
| <!-- Provide the basic links for the model. --> | |
| - **Paper [optional]:** [InMD-X](http://arxiv.org/abs/2402.11883) | |
| ## Uses | |
| ```python | |
| import torch | |
| from peft import PeftModel, PeftConfig | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| peft_model_id = "InMedData/InMD-X-RHE" | |
| config = PeftConfig.from_pretrained(peft_model_id) | |
| model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, return_dict=True, load_in_8bit=True, device_map='auto') | |
| tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path) | |
| # Load the Lora model | |
| model = PeftModel.from_pretrained(model, peft_model_id) | |
| pipeline = transformers.pipeline( | |
| "text-generation", | |
| model=model, | |
| tokenizer = tokenizer, | |
| device_map="auto" # if you have GPU | |
| ) | |
| def inference(pipeline, Qustion,answer_only = False): | |
| sequences = pipeline("Answer the next question in one sentence.\n"+ | |
| Qustion, | |
| do_sample=True, | |
| top_k=10, | |
| top_p = 0.9, | |
| temperature = 0.2, | |
| num_return_sequences=1, | |
| eos_token_id=tokenizer.eos_token_id, | |
| max_length=500, # can increase the length of sequence | |
| ) | |
| Answers = [] | |
| for seq in sequences: | |
| Answer = seq['generated_text'].split(Qustion)[-1].replace("\n","") | |
| Answers.append(Answer) | |
| return Answers | |
| question = 'What is the association between long-term beta-blocker use after myocardial infarction (MI) and the risk of reinfarction and death?' | |
| answers = inference(pipeline, question) | |
| print(answers) | |
| ``` | |
| ### List of LoRA config | |
| based on [Parameter-Efficient Fine-Tuning (PEFT)](https://github.com/huggingface/peft) | |
| Parameter | PT | SFT | |
| :------:| :------:| :------: | |
| r | 8 | 8 | |
| lora alpha | 32 | 32 | |
| lora dropout | 0.05 | 0.05 | |
| target | q, k, v, o,up, down, gate | q, k, v, o,up,down, gate | |
| ### List of Training arguments | |
| based on [Transformer Reinforcement Learning (TRL)](https://github.com/huggingface/trl) | |
| Parameter | PT | SFT | |
| :------:| :------:| :------: | |
| train epochs | 3 | 1 | |
| per device train batch size | 1 | 1 | |
| optimizer | adamw_hf | adamw_hf | |
| evaluation strategy | no | no | |
| learning_rate | 1e-4 | 1e-4 | |
| <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> | |
| ### Experimental setup | |
| - **Ubuntu 22.04.3 LTS** | |
| - **GPU - NVIDIA A100(40GB)** | |
| - **Python**: 3.10.12 | |
| - **Pytorch**:2.1.1+cu118 | |
| - **Transformer**:4.37.0.dev0 | |
| ## Limitations | |
| InMD-X consists of a collection of segmented models. The integration of the models has not yet been fully accomplished, resulting in each model being fragmented. | |
| Due to the absence of benchmarks, the segmented models have not been adequately evaluated. Future research will involve the development of new benchmarks and the integration of models to facilitate an objective evaluation. | |
| ## Non-commercial use | |
| These models are available exclusively for research purposes and are not intended for commercial use. | |
| <!-- ## Citation | |
| **BibTeX:** | |
| --> | |
| ## INMED DATA | |
| INMED DATA is developing large language models (LLMs) specifically tailored for medical applications. For more information, please visit our website [TBD]. | |