Multilingual Indic Profanity Detector (XLM-RoBERTa)

This is a text classification model fine-tuned on xlm-roberta-base to detect profanity in multilingual Indic text, with a strong focus on Malayalam. It is designed to be used as a content moderation tool or an LLM guardrail.

The model classifies text into two categories: safe and not safe.

Model Details

Base Model: xlm-roberta-base
Dataset: mangalathkedar/multilingual-indic-profane
Task: Binary Text Classification (Profanity Detection)
Training Framework: Hugging Face Transformers with PyTorch

Key Features

Multilingual: Built on XLM-RoBERTa, capable of understanding multiple languages, especially those in the Indic family.
Handles Class Imbalance: Trained using a custom WeightedTrainer with class weights to counteract the imbalance between 'safe' and 'not safe' samples in the training data, improving recall for the minority class.
Optimized for Production: Trained with mixed-precision (fp16) for faster inference and a smaller memory footprint.

Performance

The model was evaluated on a held-out test set (15% of the original data) and achieved the following performance:

Metric	Score
Accuracy	0.8836
F1 Score	0.8918
Precision	0.8790
Recall	0.9050

Detailed Classification Report

The report below shows the precision, recall, and F1-score for each class on the test set.

              precision    recall  f1-score   support

        safe     0.8893    0.8595    0.8741       299
    not safe     0.8790    0.9050    0.8918       337

    accuracy                         0.8836       636
   macro avg     0.8841    0.8823    0.8830       636
weighted avg     0.8838    0.8836    0.8835       636

Confusion Matrix

The confusion matrix provides a detailed look at the model's predictions versus the true labels.

	Predicted: Safe	Predicted: Not Safe
Actual: Safe	257 (TN)	42 (FP)
Actual: Not Safe	32 (FN)	305 (TP)

True Negatives (TN): 257 texts were correctly identified as safe.
False Positives (FP): 42 safe texts were incorrectly flagged as not safe.
False Negatives (FN): 32 not safe texts were missed and incorrectly classified as safe.
True Positives (TP): 305 texts were correctly identified as not safe.

How to Use

You can easily use this model with the transformers library's pipeline.

from transformers import pipeline

# Replace with your model's name on the Hub
hub_model_name = "{hub_model_name}" 

# Load the model from the Hub
classifier = pipeline("text-classification", model=hub_model_name)

# --- Test Cases ---

# Example 1 (Safe Malayalam)
text_safe_ml = "നല്ല ദിവസം" # "Good day"
print(classifier(text_safe_ml))
# Expected output: [{'label': 'safe', 'score': ...}]

# Example 2 (Not Safe Malayalam)
text_profane_ml = "നീ ഒരു മൈരൻ ആണ്" # Profanity
print(classifier(text_profane_ml))
# Expected output: [{'label': 'not safe', 'score': ...}]

# Example 3 (Safe English)
text_safe_en = "Have a wonderful afternoon!"
print(classifier(text_safe_en))
# Expected output: [{'label': 'safe', 'score': ...}]

Downloads last month: 18

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for mangalathkedar/indic-profanity-detector-xlm-roberta

Base model

FacebookAI/xlm-roberta-base

Finetuned

(3555)

this model

mangalathkedar
/

indic-profanity-detector-xlm-roberta