Model Details

Model Description

This model is a PEFT adapter for binary sequence classification built on top of the base model JeloH/xGenq-qwen2.5-coder-7b-instruct-OKI. It is designed to classify source code or code-like input into one of two classes:

BENIGN
MALWARE

Uses

This model is intended for binary classification of source code snippets, program text, and related code artifacts. A primary use case is malware detection, where the model predicts whether an input sample is more likely to be:

MALWARE
BENIGN

Potential direct uses include:

automated triage of code samples
code security analysis pipelines
research experiments on malicious vs. benign code classification
assisting analysts in prioritizing suspicious samples

Citation

@article{jelodar2025xgenq, title={XGen-Q: An Explainable Domain-Adaptive LLM Framework with Retrieval-Augmented Generation for Software Security}, author={Jelodar, Hamed and Meymani, Mohammad and Razavi-Far, Roozbeh and Ghorbani, Ali}, journal={arXiv preprint arXiv:2510.19006}, year={2025} }

Results

Limitations

The model focusd on pattrns that is realted to windows PE files.

How to Get Started with the Model

Use the code below to load the tokenizer, base model, and PEFT adapter for inference:

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel

BASE_MODEL = "JeloH/xGenq-qwen2.5-coder-7b-instruct-OKI"
ADAPTER_PATH = "JeloH/xGenq-qwen2.5-coder-7b-BinaryCodeSecurity-Adapter"

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# tokenizer (always from base model)
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# base model
base_model = AutoModelForSequenceClassification.from_pretrained(BASE_MODEL)

# load adapter from Hugging Face
model = PeftModel.from_pretrained(base_model, ADAPTER_PATH)

model.to(device)
model.eval()

# input
input_single = """#include <iostream>
#include <string>
#include <cstdint>
... your code ...
"""

inputs = tokenizer(
    input_single,
    truncation=True,
    padding=True,
    return_tensors="pt"
).to(device)

# inference
with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=1)
    pred = outputs.logits.argmax(dim=1).item()

label = "MALWARE" if pred == 1 else "BENIGN"

print("Prediction:", label)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for JeloH/xGenq-qwen2.5-coder-7b-BinaryCodeSecurity-Adapter

XGen-Q: An Explainable Domain-Adaptive LLM Framework with Retrieval-Augmented Generation for Software Security

Paper • 2510.19006 • Published Oct 21, 2025