from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model = AutoModelForSequenceClassification.from_pretrained(
"will702/indo-roBERTa-financial-sentiment-v2"
)
tokenizer = AutoTokenizer.from_pretrained(
"will702/indo-roBERTa-financial-sentiment-v2"
)
text = "Rupiah melemah tajam terhadap dolar AS akibat sentimen global"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
label_map = {0: "Positive", 1: "Neutral", 2: "Negative"}
predicted = torch.argmax(outputs.logits, dim=1).item()
print(f"Sentiment: {label_map[predicted]}")
Improvements Over Base Model
Aspect
Base (v1)
This Model (v2)
Datasets
1 (intanm only)
3 (intanm + CNBC + SmSA)
Learning Rate
2e-5
1e-5 (preserves prior knowledge)
Scheduler
Linear
Cosine with warmup
Primary Metric
Accuracy
F1 (weighted)
Early Stopping
patience=2
patience=3
Limitations
Primarily trained on formal financial news — may underperform on very informal social media text or slang
Label mapping is non-standard (0=Positive) — ensure downstream systems account for this
Augmented data includes synthetic samples which may not perfectly reflect real-world distribution
Citation
@misc{indo_roberta_fin_v2_2026,
title = {IndoRoBERTa Financial Sentiment v2},
author = {Gregorius Willson},
howpublished = {\url{https://huggingface.co/will702/indo-roBERTa-financial-sentiment-v2}},
year = {2026},
note = {Fine-tuned from ihsan31415/indo-roBERTa-financial-sentiment with multi-source data and augmentation},
}