YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Model Details

Model Description

This model is a PEFT adapter for binary sequence classification built on top of the base model JeloH/xGenq-qwen2.5-coder-7b-instruct-OKI. It is designed to classify source code or code-like input into one of two classes:

  • BENIGN
  • MALWARE

Uses

This model is intended for binary classification of source code snippets, program text, and related code artifacts. A primary use case is malware detection, where the model predicts whether an input sample is more likely to be:

  • MALWARE
  • BENIGN

Potential direct uses include:

  • automated triage of code samples
  • code security analysis pipelines
  • research experiments on malicious vs. benign code classification
  • assisting analysts in prioritizing suspicious samples

Citation

@article{jelodar2025xgenq, title={XGen-Q: An Explainable Domain-Adaptive LLM Framework with Retrieval-Augmented Generation for Software Security}, author={Jelodar, Hamed and Meymani, Mohammad and Razavi-Far, Roozbeh and Ghorbani, Ali}, journal={arXiv preprint arXiv:2510.19006}, year={2025} }

Results

image

Limitations

The model focusd on pattrns that is realted to windows PE files.

How to Get Started with the Model

Use the code below to load the tokenizer, base model, and PEFT adapter for inference:

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel

BASE_MODEL = "JeloH/xGenq-qwen2.5-coder-7b-instruct-OKI"
ADAPTER_PATH = "JeloH/xGenq-qwen2.5-coder-7b-BinaryCodeSecurity-Adapter"

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# tokenizer (always from base model)
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# base model
base_model = AutoModelForSequenceClassification.from_pretrained(BASE_MODEL)

# load adapter from Hugging Face
model = PeftModel.from_pretrained(base_model, ADAPTER_PATH)

model.to(device)
model.eval()

# input
input_single = """#include <iostream>
#include <string>
#include <cstdint>
... your code ...
"""

inputs = tokenizer(
    input_single,
    truncation=True,
    padding=True,
    return_tensors="pt"
).to(device)

# inference
with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=1)
    pred = outputs.logits.argmax(dim=1).item()

label = "MALWARE" if pred == 1 else "BENIGN"

print("Prediction:", label)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for JeloH/xGenq-qwen2.5-coder-7b-BinaryCodeSecurity-Adapter