Text Classification
Transformers
Safetensors
English
llama
text-generation
content-moderation
safety
text-embeddings-inference
Instructions to use UnionStreet/VISION-1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use UnionStreet/VISION-1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="UnionStreet/VISION-1")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("UnionStreet/VISION-1") model = AutoModelForCausalLM.from_pretrained("UnionStreet/VISION-1") - Notebooks
- Google Colab
- Kaggle
VISION-1: Content Safety Analysis Model
VISION-1 is a fine-tuned version of Llama 3.1 8B Instruct, specialized for content safety analysis and moderation. The model is trained to identify and analyze potential safety concerns in text content, including scams, fraud, harmful content, and inappropriate material.
Model Details
- Base Model: Llama 3.1 8B Instruct
- Training Data: Specialized safety and content moderation dataset
- Model Type: Decoder-only transformer
- Parameters: 8 billion
- Training Infrastructure: 2x NVIDIA H200 SXM GPUs
- License: Same as base model
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
model = AutoModelForCausalLM.from_pretrained("OverseerAI/VISION-1")
# Format prompt
prompt = "Analyze the following content for safety concerns: 'Click here to win a free iPhone! Just enter your credit card details.'"
formatted_prompt = f"<s>[INST] {prompt} [/INST]"
# Generate response
inputs = tokenizer(formatted_prompt, return_tensors="pt", padding=True)
outputs = model.generate(**inputs, max_new_tokens=128)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
Training Details
- Training Type: Fine-tuning
- Framework: PyTorch with DeepSpeed
- Training Data: Specialized dataset focused on content safety
- Hardware: 2x NVIDIA H100 SXM GPUs
- Training Time: ~4 epochs
Intended Use
- Content moderation
- Safety analysis
- Fraud detection
- Harmful content identification
Limitations
- Model outputs should be used as suggestions, not definitive judgments
- May have biases from training data
- Should be used as part of a broader content moderation strategy
- Performance may vary based on content type and context
Ethical Considerations
- Model should be used responsibly for content moderation
- Human oversight recommended for critical decisions
- Consider privacy implications when analyzing user content
- Regular evaluation of model outputs for bias
- Downloads last month
- 1