YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Model Details
Model Description
This model is a PEFT adapter for binary sequence classification built on top of the base model JeloH/xGenq-qwen2.5-coder-7b-instruct-OKI. It is designed to classify source code or code-like input into one of two classes:
BENIGNMALWARE
Uses
This model is intended for binary classification of source code snippets, program text, and related code artifacts. A primary use case is malware detection, where the model predicts whether an input sample is more likely to be:
MALWAREBENIGN
Potential direct uses include:
- automated triage of code samples
- code security analysis pipelines
- research experiments on malicious vs. benign code classification
- assisting analysts in prioritizing suspicious samples
Citation
@article{jelodar2025xgenq, title={XGen-Q: An Explainable Domain-Adaptive LLM Framework with Retrieval-Augmented Generation for Software Security}, author={Jelodar, Hamed and Meymani, Mohammad and Razavi-Far, Roozbeh and Ghorbani, Ali}, journal={arXiv preprint arXiv:2510.19006}, year={2025} }
Results
Limitations
The model focusd on pattrns that is realted to windows PE files.
How to Get Started with the Model
Use the code below to load the tokenizer, base model, and PEFT adapter for inference:
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel
BASE_MODEL = "JeloH/xGenq-qwen2.5-coder-7b-instruct-OKI"
ADAPTER_PATH = "JeloH/xGenq-qwen2.5-coder-7b-BinaryCodeSecurity-Adapter"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# tokenizer (always from base model)
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
# base model
base_model = AutoModelForSequenceClassification.from_pretrained(BASE_MODEL)
# load adapter from Hugging Face
model = PeftModel.from_pretrained(base_model, ADAPTER_PATH)
model.to(device)
model.eval()
# input
input_single = """#include <iostream>
#include <string>
#include <cstdint>
... your code ...
"""
inputs = tokenizer(
input_single,
truncation=True,
padding=True,
return_tensors="pt"
).to(device)
# inference
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=1)
pred = outputs.logits.argmax(dim=1).item()
label = "MALWARE" if pred == 1 else "BENIGN"
print("Prediction:", label)
