Website 🤖 7B Model 🤖 8B Model based on InternVL3 🤖 32B Model MedEvalKit Technical Report Lingshu MCP

Lingshu - SOTA Multimodal Large Language Models for Medical Domain

BIG NEWS: Lingshu-I-8B is released with state-of-the-art performance on medical VQA tasks. It undergoes the same training pipeline on top of InternVL3.

This repository contains the model of the paper Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning. We also release a comprehensive medical evaluation toolkit in MedEvalKit, which supports fast evaluation of major multimodal and textual medical tasks.

Highlights

Lingshu-I-8B model, the latest SOTA model on most medical multimodal/textual QA and report generation tasks in 8B size. It is built on top of InternVL3, while undergoes the same training pipeline with previous Lingshu models.
Lingshu models, built on top of Qwen2.5VL, achieve strong performance on medical multimodal/textual QA and report generation tasks for 7B and 32 model sizes.
Lingshu-32B outperforms GPT-4.1 and Claude Sonnet 4 in most multimodal QA and report generation tasks.
Lingshu supports more than 12 medical imaging modalities, including X-Ray, CT Scan, MRI, Microscopy, Ultrasound, Histopathology, Dermoscopy, Fundus, OCT, Digital Photography, Endoscopy, and PET.

Release

Technical report: Arxiv: Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning.
Model weights:

Disclaimer: We must note that even though the weights, codes, and demos are released in an open manner, similar to other pre-trained language models, and despite our best efforts in red teaming and safety fine-tuning and enforcement, our models come with potential risks, including but not limited to inaccurate, misleading or potentially harmful generation. Developers and stakeholders should perform their own red teaming and provide related security measures before deployment, and they must abide by and comply with local governance and regulations. In no event shall the authors be held liable for any claim, damages, or other liability arising from the use of the released weights, codes, or demos.

Evaluation

Medical Multimodal VQA

Models	MMMU-Med	VQA-RAD	SLAKE	PathVQA	PMC-VQA	OmniMedVQA	MedXpertQA	Avg.
Proprietary Models
GPT-4.1	75.2	65.0	72.2	55.5	55.2	75.5	45.2	63.4
Claude Sonnet 4	74.6	67.6	70.6	54.2	54.4	65.5	43.3	61.5
Gemini-2.5-Flash	76.9	68.5	75.8	55.4	55.4	71.0	52.8	65.1
Open-source Models (<10B)
BiomedGPT	24.9	16.6	13.6	11.3	27.6	27.9	-	-
Med-R1-2B	34.8	39.0	54.5	15.3	47.4	-	21.1	-
MedVLM-R1-2B	35.2	48.6	56.0	32.5	47.6	77.7	20.4	45.4
MedGemma-4B-IT	43.7	72.5	76.4	48.8	49.9	69.8	22.3	54.8
LLaVA-Med-7B	29.3	53.7	48.0	38.8	30.5	44.3	20.3	37.8
HuatuoGPT-V-7B	47.3	67.0	67.8	48.0	53.3	74.2	21.6	54.2
BioMediX2-8B	39.8	49.2	57.7	37.0	43.5	63.3	21.8	44.6
Qwen2.5VL-7B	50.6	64.5	67.2	44.1	51.9	63.6	22.3	52.0
InternVL2.5-8B	53.5	59.4	69.0	42.1	51.3	81.3	21.7	54.0
InternVL3-8B	59.2	65.4	72.8	48.6	53.8	79.1	22.4	57.3
Lingshu-7B	54.0	67.9	83.1	61.9	56.3	82.9	26.7	61.8
Lingshu-I-8B	49.1	73.0	91.6	74.9	55.8	79.7	27.5	64.5
Open-source Models (>10B)
HealthGPT-14B	49.6	65.0	66.1	56.7	56.4	75.2	24.7	56.2
HuatuoGPT-V-34B	51.8	61.4	69.5	44.4	56.6	74.0	22.1	54.3
MedDr-40B	49.3	65.2	66.4	53.5	13.9	64.3	-	-
InternVL3-14B	63.1	66.3	72.8	48.0	54.1	78.9	23.1	58.0
Qwen2.5V-32B	59.6	71.8	71.2	41.9	54.5	68.2	25.2	56.1
InternVL2.5-38B	61.6	61.4	70.3	46.9	57.2	79.9	24.4	57.4
InternVL3-38B	65.2	65.4	72.7	51.0	56.6	79.8	25.2	59.4
Lingshu-32B	62.3	76.5	89.2	65.9	57.9	83.4	30.9	66.6

Usage

Using vLLM(0.14.0)

import dataclasses
from enum import IntEnum, auto
from typing import Dict, List, Tuple, Union
from transformers import AutoProcessor
from vllm import LLM, SamplingParams
import os
from PIL import Image

class SeparatorStyle(IntEnum):
    """Separator styles."""

    ADD_COLON_SINGLE = auto()
    ADD_COLON_TWO = auto()
    ADD_COLON_SPACE_SINGLE = auto()
    NO_COLON_SINGLE = auto()
    NO_COLON_TWO = auto()
    ADD_NEW_LINE_SINGLE = auto()
    LLAMA2 = auto()
    CHATGLM = auto()
    CHATML = auto()
    CHATINTERN = auto()
    DOLLY = auto()
    RWKV = auto()
    PHOENIX = auto()
    ROBIN = auto()
    FALCON_CHAT = auto()
    CHATGLM3 = auto()
    INTERNVL_ZH = auto()
    MPT = auto()


@dataclasses.dataclass
class Conversation:
    """A class that manages prompt templates and keeps all conversation history."""

    # The name of this template
    name: str
    # The template of the system prompt
    system_template: str = '{system_message}'
    # The system message
    system_message: str = ''
    # The names of two roles
    roles: Tuple[str] = ('USER', 'ASSISTANT')
    # All messages. Each item is (role, message).
    messages: List[List[str]] = ()
    # The number of few shot examples
    offset: int = 0
    # The separator style and configurations
    sep_style: SeparatorStyle = SeparatorStyle.ADD_COLON_SINGLE
    sep: str = '\n'
    sep2: str = None
    # Stop criteria (the default one is EOS token)
    stop_str: Union[str, List[str]] = None
    # Stops generation if meeting any token in this list
    stop_token_ids: List[int] = None

    def get_prompt(self) -> str:
        """Get the prompt for generation."""
        system_prompt = self.system_template.format(system_message=self.system_message)
        if self.sep_style == SeparatorStyle.ADD_COLON_SINGLE:
            ret = system_prompt + self.sep
            for role, message in self.messages:
                if message:
                    ret += role + ': ' + message + self.sep
                else:
                    ret += role + ':'
            return ret
        elif self.sep_style == SeparatorStyle.ADD_COLON_TWO:
            seps = [self.sep, self.sep2]
            ret = system_prompt + seps[0]
            for i, (role, message) in enumerate(self.messages):
                if message:
                    ret += role + ': ' + message + seps[i % 2]
                else:
                    ret += role + ':'
            return ret
        elif self.sep_style == SeparatorStyle.ADD_COLON_SPACE_SINGLE:
            ret = system_prompt + self.sep
            for role, message in self.messages:
                if message:
                    ret += role + ': ' + message + self.sep
                else:
                    ret += role + ': '  # must be end with a space
            return ret
        elif self.sep_style == SeparatorStyle.ADD_NEW_LINE_SINGLE:
            ret = '' if system_prompt == '' else system_prompt + self.sep
            for role, message in self.messages:
                if message:
                    ret += role + '\n' + message + self.sep
                else:
                    ret += role + '\n'
            return ret
        elif self.sep_style == SeparatorStyle.NO_COLON_SINGLE:
            ret = system_prompt
            for role, message in self.messages:
                if message:
                    ret += role + message + self.sep
                else:
                    ret += role
            return ret
        elif self.sep_style == SeparatorStyle.NO_COLON_TWO:
            seps = [self.sep, self.sep2]
            ret = system_prompt
            for i, (role, message) in enumerate(self.messages):
                if message:
                    ret += role + message + seps[i % 2]
                else:
                    ret += role
            return ret
        elif self.sep_style == SeparatorStyle.RWKV:
            ret = system_prompt
            for i, (role, message) in enumerate(self.messages):
                if message:
                    ret += (
                        role
                        + ': '
                        + message.replace('\r\n', '\n').replace('\n\n', '\n')
                    )
                    ret += '\n\n'
                else:
                    ret += role + ':'
            return ret
        elif self.sep_style == SeparatorStyle.LLAMA2:
            seps = [self.sep, self.sep2]
            if self.system_message:
                ret = system_prompt
            else:
                ret = '[INST] '
            for i, (role, message) in enumerate(self.messages):
                tag = self.roles[i % 2]
                if message:
                    if i == 0:
                        ret += message + ' '
                    else:
                        ret += tag + ' ' + message + seps[i % 2]
                else:
                    ret += tag
            return ret
        elif self.sep_style == SeparatorStyle.CHATGLM:
            # source: https://huggingface.co/THUDM/chatglm-6b/blob/1d240ba371910e9282298d4592532d7f0f3e9f3e/modeling_chatglm.py#L1302-L1308
            # source2: https://huggingface.co/THUDM/chatglm2-6b/blob/e186c891cf64310ac66ef10a87e6635fa6c2a579/modeling_chatglm.py#L926
            round_add_n = 1 if self.name == 'chatglm2' else 0
            if system_prompt:
                ret = system_prompt + self.sep
            else:
                ret = ''

            for i, (role, message) in enumerate(self.messages):
                if i % 2 == 0:
                    ret += f'[Round {i//2 + round_add_n}]{self.sep}'

                if message:
                    ret += f'{role}：{message}{self.sep}'
                else:
                    ret += f'{role}：'
            return ret
        elif self.sep_style == SeparatorStyle.CHATML:
            ret = '' if system_prompt == '' else system_prompt + self.sep + '\n'
            for role, message in self.messages:
                if message:
                    ret += role + '\n' + message + self.sep + '\n'
                else:
                    ret += role + '\n'
            return ret
        elif self.sep_style == SeparatorStyle.CHATGLM3:
            ret = ''
            if self.system_message:
                ret += system_prompt
            for role, message in self.messages:
                if message:
                    ret += role + '\n' + ' ' + message
                else:
                    ret += role
            return ret
        elif self.sep_style == SeparatorStyle.CHATINTERN:
            # source: https://huggingface.co/internlm/internlm-chat-7b-8k/blob/bd546fa984b4b0b86958f56bf37f94aa75ab8831/modeling_internlm.py#L771
            seps = [self.sep, self.sep2]
            ret = system_prompt
            for i, (role, message) in enumerate(self.messages):
                # if i % 2 == 0:
                #     ret += "<s>"
                if message:
                    ret += role + ':' + message + seps[i % 2] + '\n'
                else:
                    ret += role + ':'
            return ret
        elif self.sep_style == SeparatorStyle.DOLLY:
            seps = [self.sep, self.sep2]
            ret = system_prompt
            for i, (role, message) in enumerate(self.messages):
                if message:
                    ret += role + ':\n' + message + seps[i % 2]
                    if i % 2 == 1:
                        ret += '\n\n'
                else:
                    ret += role + ':\n'
            return ret
        elif self.sep_style == SeparatorStyle.PHOENIX:
            ret = system_prompt
            for role, message in self.messages:
                if message:
                    ret += role + ': ' + '<s>' + message + '</s>'
                else:
                    ret += role + ': ' + '<s>'
            return ret
        elif self.sep_style == SeparatorStyle.ROBIN:
            ret = system_prompt + self.sep
            for role, message in self.messages:
                if message:
                    ret += role + ':\n' + message + self.sep
                else:
                    ret += role + ':\n'
            return ret
        elif self.sep_style == SeparatorStyle.FALCON_CHAT:
            ret = ''
            if self.system_message:
                ret += system_prompt + self.sep
            for role, message in self.messages:
                if message:
                    ret += role + ': ' + message + self.sep
                else:
                    ret += role + ':'

            return ret
        elif self.sep_style == SeparatorStyle.INTERNVL_ZH:
            seps = [self.sep, self.sep2]
            ret = self.system_message + seps[0]
            for i, (role, message) in enumerate(self.messages):
                if message:
                    ret += role + ': ' + message + seps[i % 2]
                else:
                    ret += role + ':'
            return ret
        elif self.sep_style == SeparatorStyle.MPT:
            ret = system_prompt + self.sep
            for role, message in self.messages:
                if message:
                    if type(message) is tuple:
                        message, _, _ = message
                    ret += role + message + self.sep
                else:
                    ret += role
            return ret
        else:
            raise ValueError(f'Invalid style: {self.sep_style}')

    def set_system_message(self, system_message: str):
        """Set the system message."""
        self.system_message = system_message

    def append_message(self, role: str, message: str):
        """Append a new message."""
        self.messages.append([role, message])

    def update_last_message(self, message: str):
        """Update the last output.
        The last message is typically set to be None when constructing the prompt,
        so we need to update it in-place after getting the response from a model.
        """
        self.messages[-1][1] = message

    def to_gradio_chatbot(self):
        """Convert the conversation to gradio chatbot format."""
        ret = []
        for i, (role, msg) in enumerate(self.messages[self.offset :]):
            if i % 2 == 0:
                ret.append([msg, None])
            else:
                ret[-1][-1] = msg
        return ret

    def to_openai_api_messages(self):
        """Convert the conversation to OpenAI chat completion format."""
        ret = [{'role': 'system', 'content': self.system_message}]

        for i, (_, msg) in enumerate(self.messages[self.offset :]):
            if i % 2 == 0:
                ret.append({'role': 'user', 'content': msg})
            else:
                if msg is not None:
                    ret.append({'role': 'assistant', 'content': msg})
        return ret

    def copy(self):
        return Conversation(
            name=self.name,
            system_template=self.system_template,
            system_message=self.system_message,
            roles=self.roles,
            messages=[[x, y] for x, y in self.messages],
            offset=self.offset,
            sep_style=self.sep_style,
            sep=self.sep,
            sep2=self.sep2,
            stop_str=self.stop_str,
            stop_token_ids=self.stop_token_ids,
        )

    def dict(self):
        return {
            'template_name': self.name,
            'system_message': self.system_message,
            'roles': self.roles,
            'messages': self.messages,
            'offset': self.offset,
        }


# A global registry for all conversation templates
conv_templates: Dict[str, Conversation] = {}


def register_conv_template(template: Conversation, override: bool = False):
    """Register a new conversation template."""
    if not override:
        assert (
            template.name not in conv_templates
        ), f'{template.name} has been registered.'

    conv_templates[template.name] = template


def get_conv_template(name: str) -> Conversation:
    """Get a conversation template."""
    return conv_templates[name].copy()


def process_messages(messages):
    conv = Conversation(
            name='internvl2_5',
            system_template='<|im_start|>system\n{system_message}',
            system_message='',
            roles=('<|im_start|>user\n', '<|im_start|>assistant\n'),
            sep_style=SeparatorStyle.MPT,
            sep='<|im_end|>\n',
        )
    conv.messages = []
    imgs = []
    for message in messages:
        role = message['role']
        content = message['content']
        if isinstance(content, str):
            conv.append_message(role, content)
        elif isinstance(content, list):
            text = ""
            for item in content:
                if item['type'] == 'text':
                    text += item['text']
                elif item['type'] == 'image':
                    text = text + "\n<IMG_CONTEXT>"
                    image = item['image']
                    if isinstance(image, str):
                        image = Image.open(item['image'])
                    imgs.append(image)
            
            conv.append_message(role, text)
                        
    
    conv.append_message(conv.roles[1], None)
    prompt = conv.get_prompt()
    mm_data = {}
    if len(imgs) > 0:
        mm_data["image"] = imgs


    llm_inputs = {
        "prompt": prompt,
        "multi_modal_data": mm_data,
    }

    return llm_inputs


if __name__ == '__main__': 
    os.environ["VLLM_WORKER_MULTIPROC_METHOD"] = "spawn"
    os.environ["TOKENIZERS_PARALLELISM"] = "false"

    processor = AutoProcessor.from_pretrained("lingshu-medical-mllm/Lingshu-I-8B")
    llm = LLM(model="lingshu-medical-mllm/Lingshu-I-8B", limit_mm_per_prompt = {"image": 4}, tensor_parallel_size=2, enforce_eager=True, trust_remote_code=True,gpu_memory_utilization = 0.7)
    sampling_params = SamplingParams(
                temperature=0.7,
                top_p=1,
                repetition_penalty=1,
                max_tokens=1024,
                stop_token_ids=[],
            )

    text = "What does the image show?"
    image_path = "example.jpg"

    message = [
        {
            "role":"user",
            "content":[
                {"type":"image","image":image_path},
                {"type":"text","text":text}
                ]
        }
    ]
    llm_inputs = process_messages(message)
    outputs = llm.generate([llm_inputs], sampling_params=sampling_params)
    print(outputs[0].outputs[0].text)

Citation

If you find our project useful, we hope you would kindly star our repo and cite our work as follows:

@article{xu2025lingshu,
  title={Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning},
  author={Xu, Weiwen and Chan, Hou Pong and Li, Long and Aljunied, Mahani and Yuan, Ruifeng and Wang, Jianyu and Xiao, Chenghao and Chen, Guizhen and Liu, Chaoqun and Li, Zhaodonghui and others},
  journal={arXiv preprint arXiv:2506.07044},
  year={2025}
}

Downloads last month: 13,753

Safetensors

Model size

8B params

Tensor type

BF16

Collection including lingshu-medical-mllm/Lingshu-I-8B

Lingshu MLLMs

Collection

Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning • 5 items • Updated 11 days ago • 21

Paper for lingshu-medical-mllm/Lingshu-I-8B

Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning

Paper • 2506.07044 • Published Jun 8, 2025 • 114