Multilingual Indic Profanity Detector (XLM-RoBERTa)
This is a text classification model fine-tuned on xlm-roberta-base to detect profanity in multilingual Indic text, with a strong focus on Malayalam. It is designed to be used as a content moderation tool or an LLM guardrail.
The model classifies text into two categories: safe and not safe.
Model Details
- Base Model:
xlm-roberta-base - Dataset:
mangalathkedar/multilingual-indic-profane - Task: Binary Text Classification (Profanity Detection)
- Training Framework: Hugging Face Transformers with PyTorch
Key Features
- Multilingual: Built on XLM-RoBERTa, capable of understanding multiple languages, especially those in the Indic family.
- Handles Class Imbalance: Trained using a custom
WeightedTrainerwith class weights to counteract the imbalance between 'safe' and 'not safe' samples in the training data, improving recall for the minority class. - Optimized for Production: Trained with mixed-precision (
fp16) for faster inference and a smaller memory footprint.
Performance
The model was evaluated on a held-out test set (15% of the original data) and achieved the following performance:
| Metric | Score |
|---|---|
| Accuracy | 0.8836 |
| F1 Score | 0.8918 |
| Precision | 0.8790 |
| Recall | 0.9050 |
Detailed Classification Report
The report below shows the precision, recall, and F1-score for each class on the test set.
precision recall f1-score support
safe 0.8893 0.8595 0.8741 299
not safe 0.8790 0.9050 0.8918 337
accuracy 0.8836 636
macro avg 0.8841 0.8823 0.8830 636
weighted avg 0.8838 0.8836 0.8835 636
Confusion Matrix
The confusion matrix provides a detailed look at the model's predictions versus the true labels.
| Predicted: Safe | Predicted: Not Safe | |
|---|---|---|
| Actual: Safe | 257 (TN) | 42 (FP) |
| Actual: Not Safe | 32 (FN) | 305 (TP) |
- True Negatives (TN): 257 texts were correctly identified as
safe. - False Positives (FP): 42
safetexts were incorrectly flagged asnot safe. - False Negatives (FN): 32
not safetexts were missed and incorrectly classified assafe. - True Positives (TP): 305 texts were correctly identified as
not safe.
How to Use
You can easily use this model with the transformers library's pipeline.
from transformers import pipeline
# Replace with your model's name on the Hub
hub_model_name = "{hub_model_name}"
# Load the model from the Hub
classifier = pipeline("text-classification", model=hub_model_name)
# --- Test Cases ---
# Example 1 (Safe Malayalam)
text_safe_ml = "നല്ല ദിവസം" # "Good day"
print(classifier(text_safe_ml))
# Expected output: [{'label': 'safe', 'score': ...}]
# Example 2 (Not Safe Malayalam)
text_profane_ml = "നീ ഒരു മൈരൻ ആണ്" # Profanity
print(classifier(text_profane_ml))
# Expected output: [{'label': 'not safe', 'score': ...}]
# Example 3 (Safe English)
text_safe_en = "Have a wonderful afternoon!"
print(classifier(text_safe_en))
# Expected output: [{'label': 'safe', 'score': ...}]
- Downloads last month
- 18
Model tree for mangalathkedar/indic-profanity-detector-xlm-roberta
Base model
FacebookAI/xlm-roberta-base