Website    🤖 7B Model    🤖 8B Model based on InternVL3    🤖 32B Model    MedEvalKit    Technical Report    Lingshu MCP

Lingshu - SOTA Multimodal Large Language Models for Medical Domain

BIG NEWS: Lingshu-I-8B is released with state-of-the-art performance on medical VQA tasks. It undergoes the same training pipeline on top of InternVL3.

This repository contains the model of the paper Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning. We also release a comprehensive medical evaluation toolkit in MedEvalKit, which supports fast evaluation of major multimodal and textual medical tasks.

Highlights

  • Lingshu-I-8B model, the latest SOTA model on most medical multimodal/textual QA and report generation tasks in 8B size. It is built on top of InternVL3, while undergoes the same training pipeline with previous Lingshu models.
  • Lingshu models, built on top of Qwen2.5VL, achieve strong performance on medical multimodal/textual QA and report generation tasks for 7B and 32 model sizes.
  • Lingshu-32B outperforms GPT-4.1 and Claude Sonnet 4 in most multimodal QA and report generation tasks.
  • Lingshu supports more than 12 medical imaging modalities, including X-Ray, CT Scan, MRI, Microscopy, Ultrasound, Histopathology, Dermoscopy, Fundus, OCT, Digital Photography, Endoscopy, and PET.

Release

Disclaimer: We must note that even though the weights, codes, and demos are released in an open manner, similar to other pre-trained language models, and despite our best efforts in red teaming and safety fine-tuning and enforcement, our models come with potential risks, including but not limited to inaccurate, misleading or potentially harmful generation. Developers and stakeholders should perform their own red teaming and provide related security measures before deployment, and they must abide by and comply with local governance and regulations. In no event shall the authors be held liable for any claim, damages, or other liability arising from the use of the released weights, codes, or demos.

Evaluation

Medical Multimodal VQA

Models MMMU-Med VQA-RAD SLAKE PathVQA PMC-VQA OmniMedVQA MedXpertQA Avg.
Proprietary Models
GPT-4.1 75.2 65.0 72.2 55.5 55.2 75.5 45.2 63.4
Claude Sonnet 4 74.6 67.6 70.6 54.2 54.4 65.5 43.3 61.5
Gemini-2.5-Flash 76.9 68.5 75.8 55.4 55.4 71.0 52.8 65.1
Open-source Models (<10B)
BiomedGPT 24.9 16.6 13.6 11.3 27.6 27.9 - -
Med-R1-2B 34.8 39.0 54.5 15.3 47.4 - 21.1 -
MedVLM-R1-2B 35.2 48.6 56.0 32.5 47.6 77.7 20.4 45.4
MedGemma-4B-IT 43.7 72.5 76.4 48.8 49.9 69.8 22.3 54.8
LLaVA-Med-7B 29.3 53.7 48.0 38.8 30.5 44.3 20.3 37.8
HuatuoGPT-V-7B 47.3 67.0 67.8 48.0 53.3 74.2 21.6 54.2
BioMediX2-8B 39.8 49.2 57.7 37.0 43.5 63.3 21.8 44.6
Qwen2.5VL-7B 50.6 64.5 67.2 44.1 51.9 63.6 22.3 52.0
InternVL2.5-8B 53.5 59.4 69.0 42.1 51.3 81.3 21.7 54.0
InternVL3-8B 59.2 65.4 72.8 48.6 53.8 79.1 22.4 57.3
Lingshu-7B 54.0 67.9 83.1 61.9 56.3 82.9 26.7 61.8
Lingshu-I-8B 49.1 73.0 91.6 74.9 55.8 79.7 27.5 64.5
Open-source Models (>10B)
HealthGPT-14B 49.6 65.0 66.1 56.7 56.4 75.2 24.7 56.2
HuatuoGPT-V-34B 51.8 61.4 69.5 44.4 56.6 74.0 22.1 54.3
MedDr-40B 49.3 65.2 66.4 53.5 13.9 64.3 - -
InternVL3-14B 63.1 66.3 72.8 48.0 54.1 78.9 23.1 58.0
Qwen2.5V-32B 59.6 71.8 71.2 41.9 54.5 68.2 25.2 56.1
InternVL2.5-38B 61.6 61.4 70.3 46.9 57.2 79.9 24.4 57.4
InternVL3-38B 65.2 65.4 72.7 51.0 56.6 79.8 25.2 59.4
Lingshu-32B 62.3 76.5 89.2 65.9 57.9 83.4 30.9 66.6

Usage

Using vLLM(0.14.0)

import dataclasses
from enum import IntEnum, auto
from typing import Dict, List, Tuple, Union
from transformers import AutoProcessor
from vllm import LLM, SamplingParams
import os
from PIL import Image

class SeparatorStyle(IntEnum):
    """Separator styles."""

    ADD_COLON_SINGLE = auto()
    ADD_COLON_TWO = auto()
    ADD_COLON_SPACE_SINGLE = auto()
    NO_COLON_SINGLE = auto()
    NO_COLON_TWO = auto()
    ADD_NEW_LINE_SINGLE = auto()
    LLAMA2 = auto()
    CHATGLM = auto()
    CHATML = auto()
    CHATINTERN = auto()
    DOLLY = auto()
    RWKV = auto()
    PHOENIX = auto()
    ROBIN = auto()
    FALCON_CHAT = auto()
    CHATGLM3 = auto()
    INTERNVL_ZH = auto()
    MPT = auto()


@dataclasses.dataclass
class Conversation:
    """A class that manages prompt templates and keeps all conversation history."""

    # The name of this template
    name: str
    # The template of the system prompt
    system_template: str = '{system_message}'
    # The system message
    system_message: str = ''
    # The names of two roles
    roles: Tuple[str] = ('USER', 'ASSISTANT')
    # All messages. Each item is (role, message).
    messages: List[List[str]] = ()
    # The number of few shot examples
    offset: int = 0
    # The separator style and configurations
    sep_style: SeparatorStyle = SeparatorStyle.ADD_COLON_SINGLE
    sep: str = '\n'
    sep2: str = None
    # Stop criteria (the default one is EOS token)
    stop_str: Union[str, List[str]] = None
    # Stops generation if meeting any token in this list
    stop_token_ids: List[int] = None

    def get_prompt(self) -> str:
        """Get the prompt for generation."""
        system_prompt = self.system_template.format(system_message=self.system_message)
        if self.sep_style == SeparatorStyle.ADD_COLON_SINGLE:
            ret = system_prompt + self.sep
            for role, message in self.messages:
                if message:
                    ret += role + ': ' + message + self.sep
                else:
                    ret += role + ':'
            return ret
        elif self.sep_style == SeparatorStyle.ADD_COLON_TWO:
            seps = [self.sep, self.sep2]
            ret = system_prompt + seps[0]
            for i, (role, message) in enumerate(self.messages):
                if message:
                    ret += role + ': ' + message + seps[i % 2]
                else:
                    ret += role + ':'
            return ret
        elif self.sep_style == SeparatorStyle.ADD_COLON_SPACE_SINGLE:
            ret = system_prompt + self.sep
            for role, message in self.messages:
                if message:
                    ret += role + ': ' + message + self.sep
                else:
                    ret += role + ': '  # must be end with a space
            return ret
        elif self.sep_style == SeparatorStyle.ADD_NEW_LINE_SINGLE:
            ret = '' if system_prompt == '' else system_prompt + self.sep
            for role, message in self.messages:
                if message:
                    ret += role + '\n' + message + self.sep
                else:
                    ret += role + '\n'
            return ret
        elif self.sep_style == SeparatorStyle.NO_COLON_SINGLE:
            ret = system_prompt
            for role, message in self.messages:
                if message:
                    ret += role + message + self.sep
                else:
                    ret += role
            return ret
        elif self.sep_style == SeparatorStyle.NO_COLON_TWO:
            seps = [self.sep, self.sep2]
            ret = system_prompt
            for i, (role, message) in enumerate(self.messages):
                if message:
                    ret += role + message + seps[i % 2]
                else:
                    ret += role
            return ret
        elif self.sep_style == SeparatorStyle.RWKV:
            ret = system_prompt
            for i, (role, message) in enumerate(self.messages):
                if message:
                    ret += (
                        role
                        + ': '
                        + message.replace('\r\n', '\n').replace('\n\n', '\n')
                    )
                    ret += '\n\n'
                else:
                    ret += role + ':'
            return ret
        elif self.sep_style == SeparatorStyle.LLAMA2:
            seps = [self.sep, self.sep2]
            if self.system_message:
                ret = system_prompt
            else:
                ret = '[INST] '
            for i, (role, message) in enumerate(self.messages):
                tag = self.roles[i % 2]
                if message:
                    if i == 0:
                        ret += message + ' '
                    else:
                        ret += tag + ' ' + message + seps[i % 2]
                else:
                    ret += tag
            return ret
        elif self.sep_style == SeparatorStyle.CHATGLM:
            # source: https://huggingface.co/THUDM/chatglm-6b/blob/1d240ba371910e9282298d4592532d7f0f3e9f3e/modeling_chatglm.py#L1302-L1308
            # source2: https://huggingface.co/THUDM/chatglm2-6b/blob/e186c891cf64310ac66ef10a87e6635fa6c2a579/modeling_chatglm.py#L926
            round_add_n = 1 if self.name == 'chatglm2' else 0
            if system_prompt:
                ret = system_prompt + self.sep
            else:
                ret = ''

            for i, (role, message) in enumerate(self.messages):
                if i % 2 == 0:
                    ret += f'[Round {i//2 + round_add_n}]{self.sep}'

                if message:
                    ret += f'{role}:{message}{self.sep}'
                else:
                    ret += f'{role}:'
            return ret
        elif self.sep_style == SeparatorStyle.CHATML:
            ret = '' if system_prompt == '' else system_prompt + self.sep + '\n'
            for role, message in self.messages:
                if message:
                    ret += role + '\n' + message + self.sep + '\n'
                else:
                    ret += role + '\n'
            return ret
        elif self.sep_style == SeparatorStyle.CHATGLM3:
            ret = ''
            if self.system_message:
                ret += system_prompt
            for role, message in self.messages:
                if message:
                    ret += role + '\n' + ' ' + message
                else:
                    ret += role
            return ret
        elif self.sep_style == SeparatorStyle.CHATINTERN:
            # source: https://huggingface.co/internlm/internlm-chat-7b-8k/blob/bd546fa984b4b0b86958f56bf37f94aa75ab8831/modeling_internlm.py#L771
            seps = [self.sep, self.sep2]
            ret = system_prompt
            for i, (role, message) in enumerate(self.messages):
                # if i % 2 == 0:
                #     ret += "<s>"
                if message:
                    ret += role + ':' + message + seps[i % 2] + '\n'
                else:
                    ret += role + ':'
            return ret
        elif self.sep_style == SeparatorStyle.DOLLY:
            seps = [self.sep, self.sep2]
            ret = system_prompt
            for i, (role, message) in enumerate(self.messages):
                if message:
                    ret += role + ':\n' + message + seps[i % 2]
                    if i % 2 == 1:
                        ret += '\n\n'
                else:
                    ret += role + ':\n'
            return ret
        elif self.sep_style == SeparatorStyle.PHOENIX:
            ret = system_prompt
            for role, message in self.messages:
                if message:
                    ret += role + ': ' + '<s>' + message + '</s>'
                else:
                    ret += role + ': ' + '<s>'
            return ret
        elif self.sep_style == SeparatorStyle.ROBIN:
            ret = system_prompt + self.sep
            for role, message in self.messages:
                if message:
                    ret += role + ':\n' + message + self.sep
                else:
                    ret += role + ':\n'
            return ret
        elif self.sep_style == SeparatorStyle.FALCON_CHAT:
            ret = ''
            if self.system_message:
                ret += system_prompt + self.sep
            for role, message in self.messages:
                if message:
                    ret += role + ': ' + message + self.sep
                else:
                    ret += role + ':'

            return ret
        elif self.sep_style == SeparatorStyle.INTERNVL_ZH:
            seps = [self.sep, self.sep2]
            ret = self.system_message + seps[0]
            for i, (role, message) in enumerate(self.messages):
                if message:
                    ret += role + ': ' + message + seps[i % 2]
                else:
                    ret += role + ':'
            return ret
        elif self.sep_style == SeparatorStyle.MPT:
            ret = system_prompt + self.sep
            for role, message in self.messages:
                if message:
                    if type(message) is tuple:
                        message, _, _ = message
                    ret += role + message + self.sep
                else:
                    ret += role
            return ret
        else:
            raise ValueError(f'Invalid style: {self.sep_style}')

    def set_system_message(self, system_message: str):
        """Set the system message."""
        self.system_message = system_message

    def append_message(self, role: str, message: str):
        """Append a new message."""
        self.messages.append([role, message])

    def update_last_message(self, message: str):
        """Update the last output.
        The last message is typically set to be None when constructing the prompt,
        so we need to update it in-place after getting the response from a model.
        """
        self.messages[-1][1] = message

    def to_gradio_chatbot(self):
        """Convert the conversation to gradio chatbot format."""
        ret = []
        for i, (role, msg) in enumerate(self.messages[self.offset :]):
            if i % 2 == 0:
                ret.append([msg, None])
            else:
                ret[-1][-1] = msg
        return ret

    def to_openai_api_messages(self):
        """Convert the conversation to OpenAI chat completion format."""
        ret = [{'role': 'system', 'content': self.system_message}]

        for i, (_, msg) in enumerate(self.messages[self.offset :]):
            if i % 2 == 0:
                ret.append({'role': 'user', 'content': msg})
            else:
                if msg is not None:
                    ret.append({'role': 'assistant', 'content': msg})
        return ret

    def copy(self):
        return Conversation(
            name=self.name,
            system_template=self.system_template,
            system_message=self.system_message,
            roles=self.roles,
            messages=[[x, y] for x, y in self.messages],
            offset=self.offset,
            sep_style=self.sep_style,
            sep=self.sep,
            sep2=self.sep2,
            stop_str=self.stop_str,
            stop_token_ids=self.stop_token_ids,
        )

    def dict(self):
        return {
            'template_name': self.name,
            'system_message': self.system_message,
            'roles': self.roles,
            'messages': self.messages,
            'offset': self.offset,
        }


# A global registry for all conversation templates
conv_templates: Dict[str, Conversation] = {}


def register_conv_template(template: Conversation, override: bool = False):
    """Register a new conversation template."""
    if not override:
        assert (
            template.name not in conv_templates
        ), f'{template.name} has been registered.'

    conv_templates[template.name] = template


def get_conv_template(name: str) -> Conversation:
    """Get a conversation template."""
    return conv_templates[name].copy()


def process_messages(messages):
    conv = Conversation(
            name='internvl2_5',
            system_template='<|im_start|>system\n{system_message}',
            system_message='',
            roles=('<|im_start|>user\n', '<|im_start|>assistant\n'),
            sep_style=SeparatorStyle.MPT,
            sep='<|im_end|>\n',
        )
    conv.messages = []
    imgs = []
    for message in messages:
        role = message['role']
        content = message['content']
        if isinstance(content, str):
            conv.append_message(role, content)
        elif isinstance(content, list):
            text = ""
            for item in content:
                if item['type'] == 'text':
                    text += item['text']
                elif item['type'] == 'image':
                    text = text + "\n<IMG_CONTEXT>"
                    image = item['image']
                    if isinstance(image, str):
                        image = Image.open(item['image'])
                    imgs.append(image)
            
            conv.append_message(role, text)
                        
    
    conv.append_message(conv.roles[1], None)
    prompt = conv.get_prompt()
    mm_data = {}
    if len(imgs) > 0:
        mm_data["image"] = imgs


    llm_inputs = {
        "prompt": prompt,
        "multi_modal_data": mm_data,
    }

    return llm_inputs


if __name__ == '__main__': 
    os.environ["VLLM_WORKER_MULTIPROC_METHOD"] = "spawn"
    os.environ["TOKENIZERS_PARALLELISM"] = "false"

    processor = AutoProcessor.from_pretrained("lingshu-medical-mllm/Lingshu-I-8B")
    llm = LLM(model="lingshu-medical-mllm/Lingshu-I-8B", limit_mm_per_prompt = {"image": 4}, tensor_parallel_size=2, enforce_eager=True, trust_remote_code=True,gpu_memory_utilization = 0.7)
    sampling_params = SamplingParams(
                temperature=0.7,
                top_p=1,
                repetition_penalty=1,
                max_tokens=1024,
                stop_token_ids=[],
            )

    text = "What does the image show?"
    image_path = "example.jpg"

    message = [
        {
            "role":"user",
            "content":[
                {"type":"image","image":image_path},
                {"type":"text","text":text}
                ]
        }
    ]
    llm_inputs = process_messages(message)
    outputs = llm.generate([llm_inputs], sampling_params=sampling_params)
    print(outputs[0].outputs[0].text)

Citation

If you find our project useful, we hope you would kindly star our repo and cite our work as follows:

@article{xu2025lingshu,
  title={Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning},
  author={Xu, Weiwen and Chan, Hou Pong and Li, Long and Aljunied, Mahani and Yuan, Ruifeng and Wang, Jianyu and Xiao, Chenghao and Chen, Guizhen and Liu, Chaoqun and Li, Zhaodonghui and others},
  journal={arXiv preprint arXiv:2506.07044},
  year={2025}
}
Downloads last month
13,753
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including lingshu-medical-mllm/Lingshu-I-8B

Paper for lingshu-medical-mllm/Lingshu-I-8B