---
language: tr
license: other
license_name: siriusai-premium-v1
license_link: LICENSE
tags:
- turkish
- content-moderation
- multi-label-classification
- text-classification
- safety
- moderation
- bert
- nlp
- transformers
base_model: dbmdz/bert-base-turkish-uncased
datasets:
- custom
metrics:
- f1
- precision
- recall
- accuracy
- mcc
library_name: transformers
pipeline_tag: text-classification
model-index:
- name: turkish-safety
  results:
  - task:
      type: text-classification
      name: Multi-label Content Safety Classification
    metrics:
    - type: f1
      value: 0.9165
      name: Macro F1
    - type: mcc
      value: 0.9045
      name: Matthews Correlation Coefficient
---

# Turkish Safety - Content Moderation Classifier v5.0

**Multi-label classification model for Turkish content moderation**

*Developed by SiriusAI Tech Brain Team*

---

## Mission

> **Empowering digital platforms with AI-driven content safety solutions.**

Turkish Safety is an advanced NLP model that analyzes Turkish content in real-time and detects harmful content across 7 different categories. It provides comprehensive content moderation for social media platforms, messaging applications, in-game chats, and community forums.

### Why This Model Matters

- **7 Risk Categories**: Detects SAFE, GROOMING, SEXUAL, OFFENSIVE, BULLYING, SELF_HARM, and THREAT
- **Turkish-First Design**: Optimized for Turkish linguistics and cultural context using BERTurk
- **Production-Ready**: <50ms inference, battle-tested architecture, enterprise-grade reliability
- **Multi-Label Intelligence**: Smart classification that understands content can belong to multiple categories
- **Expert Validation**: Curated training data with clear category boundaries and edge case handling

---

## Model Overview

| Property | Value |
|----------|-------|
| **Architecture** | BERT (Bidirectional Encoder Representations from Transformers) |
| **Base Model** | `dbmdz/bert-base-turkish-uncased` (BERTurk) |
| **Task** | Multi-label Text Classification |
| **Language** | Turkish (tr) |
| **Categories** | 7 content safety labels |
| **Model Size** | 443 MB (FP32) |
| **Inference Time** | ~10-15ms (GPU) / ~40-50ms (CPU) |

---

## Performance Metrics

### Final Evaluation Results (Epoch 2)

| Metric | Score | Description |
|--------|-------|-------------|
| **Macro F1** | **0.9165** | Harmonic mean of precision and recall across all categories |
| **MCC** | **0.9045** | Matthews Correlation Coefficient (robust multi-class metric) |
| **Eval Loss** | 0.0268 | Focal loss on validation set |

### Training Progress

| Epoch | Train Loss | Eval Loss | Macro F1 | MCC |
|-------|------------|-----------|----------|-----|
| 1 | 0.038 | 0.0282 | 0.9085 | 0.8957 |
| **2** | **0.038** | **0.0268** | **0.9165** | **0.9045** |

### Validation Test Results (86.4% Accuracy)

| Category | Test Cases | Correct | Notes |
|----------|-----------|---------|-------|
| **SAFE** | 5 | 4 | One false positive (compliment → offensive) |
| **GROOMING** | 4 | 2 | Boundary cases with SEXUAL/THREAT |
| **SEXUAL** | 3 | 3 | Perfect detection |
| **OFFENSIVE** | 3 | 3 | Perfect detection |
| **THREAT** | 3 | 3 | Perfect detection |
| **SELF_HARM** | 2 | 2 | Perfect detection |
| **BULLYING** | 2 | 2 | Perfect detection |

---

## Dataset

### Dataset Statistics

| Split | Samples | Purpose |
|-------|---------|---------|
| **Train** | 68,128 | Model training |
| **Test** | 17,033 | Model evaluation |
| **Total** | 85,161 | Complete dataset |

### Category Distribution (Full Dataset)

| Category | Samples | Percentage | Description |
|----------|---------|------------|-------------|
| **SAFE** | 25,488 | 29.9% | Benign, normal communication |
| **SELF_HARM** | 14,234 | 16.7% | Self-harm ideation, suicidal thoughts |
| **BULLYING** | 13,259 | 15.6% | Harassment, exclusion, cyberbullying |
| **THREAT** | 9,193 | 10.8% | Physical threats, violence, blackmail |
| **SEXUAL** | 8,642 | 10.1% | Sexual content, body comments |
| **GROOMING** | 7,517 | 8.8% | Manipulation, trust-building tactics |
| **OFFENSIVE** | 6,849 | 8.0% | Profanity, slurs, offensive language |

### Subcategory Breakdown

| Category | Subcategories |
|----------|---------------|
| **SAFE** | greetings (1,958), farewells (1,485), wellbeing_questions (2,900), daily_conversation (2,435), weather_talk (1,445), food_drink (1,481), normal_questions (1,861), school_talk (1,961), family_talk (1,487), hobbies_games (1,455), sports_talk (1,000), tech_internet (994), genuine_compliments (1,000), encouragement (1,000), appreciation (1,000), apology_understanding (998), help_cooperation (1,000) |
| **GROOMING** | secrecy (953), isolation (729), trust_manipulation (792), meeting_private (701), gift_promise (565), age_questioning (688), private_communication (628), emotional_manipulation (654), normalization (655), excessive_flattery (559), testing_boundaries (583) |
| **THREAT** | physical_violence (1,307), weapon_threat (936), blackmail (1,168), family_threat (1,071), implicit_threat (906), revenge (947), death_threat (886), social_threat (930), stalking_threat (532), property_threat (500) |
| **OFFENSIVE** | insults (1,286), cursing_sik (1,535), cursing_am (1,398), cursing_ana_orospu (1,383), derogatory (849), mockery (398) |
| **SEXUAL** | explicit_content (1,085), sexual_body_focus (1,612), sexual_invitation (1,237), pornographic (1,060), sexual_questions (1,232), romantic_pressure (1,030), inappropriate_comments (856), sexual_fantasy (530) |
| **BULLYING** | exclusion (1,904), mockery_repeated (1,690), emotional_abuse (1,678), appearance_attack (1,490), public_humiliation (1,091), intimidation (979), cyberbullying (1,138), name_calling (1,178), spreading_rumors (1,000), academic_bullying (1,111) |
| **SELF_HARM** | hopelessness (1,923), giving_up (1,690), not_waking_up (1,435), suicide_ideation (1,413), self_harm_plan (1,532), burden_feeling (1,018), worthlessness (1,037), isolation_feeling (1,025), goodbye_messages (807), self_blame (894), depression_signs (1,452) |

### Data Generation Methodology

1. **Synthetic Generation**: LLM-based generation with expert-defined category boundaries
2. **Hard Negative Mining**: Difficult edge cases for boundary discrimination
3. **Quality Filtering**: Duplicate detection, minimum word count, forbidden token filtering
4. **Parallel Processing**: 20 concurrent workers with batch size of 50
5. **Pass Rate**: 97.5% average acceptance rate across all categories

---

## Label Definitions

The model classifies text into 7 mutually non-exclusive categories:

| Label | ID | Description | Turkish Examples |
|-------|-----|-------------|------------------|
| `SAFE` | 0 | Benign, normal communication | "Bugün hava güzel", "Oyun oynayalım mı?" |
| `OFFENSIVE` | 1 | Profanity, slurs, offensive language | "Aptal mısın", "Salak herif" |
| `SELF_HARM` | 2 | Self-harm ideation, suicidal thoughts | "Ölmek istiyorum", "Kendimi kesmek istiyorum" |
| `GROOMING` | 3 | Manipulation, trust-building, isolation tactics | "Kimseye söyleme", "Sen özelsin", "Evime gel" |
| `BULLYING` | 4 | Harassment, exclusion, cyberbullying | "Kimse seninle oynamak istemiyor", "Çirkinsin" |
| `SEXUAL` | 5 | Sexual content, body comments, inappropriate questions | "Vücudun güzel", "Hiç öpüştün mü?", "Ne giyiyorsun?" |
| `THREAT` | 6 | Physical threats, violence, blackmail | "Seni döverim", "Fotoğrafını yayarım" |

### Important: Category Boundaries

**GROOMING vs SEXUAL Distinction:**
- **GROOMING**: Non-sexual manipulation tactics (trust-building, secrecy, gift promises, meeting requests)
- **SEXUAL**: Any body-related comments, physical compliments, sexual questions, explicit content

```
"Kimseye söyleme tamam mı?"  → GROOMING (secrecy/isolation)
"Vücudun çok güzel"          → SEXUAL (body comment)
"Telefon alırım sana"        → GROOMING (gift promise)
"Dudakların çok güzel"       → SEXUAL (body-focused compliment)
"Gel evime yalnızım"         → GROOMING (meeting request/isolation)
"Hiç öpüştün mü?"            → SEXUAL (sexual experience question)
```

---

## Training Procedure

### Hyperparameters

| Parameter | Value |
|-----------|-------|
| **Base Model** | `dbmdz/bert-base-turkish-uncased` |
| **Max Sequence Length** | 64 tokens |
| **Batch Size** | 16 (effective 32 with gradient accumulation) |
| **Gradient Accumulation** | 2 steps |
| **Learning Rate** | 2e-5 (with cosine restarts) |
| **Epochs** | 2 |
| **Optimizer** | AdamW |
| **Weight Decay** | 0.01 |
| **Warmup Ratio** | 0.1 |
| **Loss Function** | Focal Loss (gamma=1.2) |
| **Label Smoothing** | 0.05 |
| **Problem Type** | Multi-label Classification |
| **Evaluation Strategy** | Per epoch |

### Training Environment

| Resource | Specification |
|----------|---------------|
| **Hardware** | Apple M1 Pro (MPS) |
| **Framework** | PyTorch 2.x + Transformers 4.37+ |
| **Training Time** | ~14 minutes (864 seconds) |
| **Throughput** | 157.8 samples/second |
| **Steps** | 4,258 total |

### Learning Rate Schedule

```
Peak LR: 2e-5 (after warmup)
Schedule: Cosine with restarts
Final LR: ~1.1e-8
```

---

## Usage

### Installation

```bash
pip install transformers torch
```

### Quick Start

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model
model_name = "hayatiali/turkish-safety"
tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-base-turkish-uncased")
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()

# Label mapping (MUST match model's id2label)
LABELS = ["SAFE", "OFFENSIVE", "SELF_HARM", "GROOMING", "BULLYING", "SEXUAL", "THREAT"]

def predict(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)

    with torch.no_grad():
        outputs = model(**inputs)
        # Multi-label: use sigmoid (NOT softmax!)
        probs = torch.sigmoid(outputs.logits)[0].numpy()

    scores = {label: float(prob) for label, prob in zip(LABELS, probs)}
    primary = max(scores, key=scores.get)

    return {"category": primary, "confidence": scores[primary], "all_scores": scores}

# Examples
print(predict("Vücudun çok güzel"))       # → SEXUAL
print(predict("Kimseye söyleme tamam mı")) # → GROOMING
print(predict("Ölmek istiyorum"))          # → SELF_HARM
print(predict("Bugün hava güzel"))         # → SAFE
```

### Production Class

```python
class TurkishSafetyClassifier:
    LABELS = ["SAFE", "OFFENSIVE", "SELF_HARM", "GROOMING", "BULLYING", "SEXUAL", "THREAT"]

    def __init__(self, model_path="hayatiali/turkish-safety"):
        self.tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-base-turkish-uncased")
        self.model = AutoModelForSequenceClassification.from_pretrained(model_path)
        self.device = "cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu"
        self.model.to(self.device).eval()

    def predict(self, text: str) -> dict:
        inputs = self.tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
        inputs = {k: v.to(self.device) for k, v in inputs.items()}

        with torch.no_grad():
            logits = self.model(**inputs).logits
            probs = torch.sigmoid(logits)[0].cpu().numpy()

        scores = dict(zip(self.LABELS, probs))
        primary = max(scores, key=scores.get)

        return {
            "category": primary,
            "confidence": scores[primary],
            "scores": scores,
            "action": self._get_action(scores[primary], primary)
        }

    def _get_action(self, score: float, category: str) -> str:
        # Critical categories have lower thresholds
        if category in ["GROOMING", "SEXUAL", "SELF_HARM", "THREAT"]:
            if score > 0.5: return "hard_block"
            if score > 0.3: return "soft_block"

        if score > 0.75: return "hard_block"
        if score > 0.60: return "soft_block"
        if score > 0.45: return "flag"
        if score > 0.30: return "allow_log"
        return "allow"
```

### Batch Inference

```python
def predict_batch(texts: list, batch_size: int = 32) -> list:
    results = []
    for i in range(0, len(texts), batch_size):
        batch = texts[i:i + batch_size]
        inputs = tokenizer(batch, return_tensors="pt", truncation=True, max_length=128, padding=True)
        inputs = {k: v.to(device) for k, v in inputs.items()}

        with torch.no_grad():
            probs = torch.sigmoid(model(**inputs).logits).cpu().numpy()

        for prob in probs:
            scores = dict(zip(LABELS, prob))
            results.append(scores)

    return results
```

---

## Limitations & Known Issues

### ⚠️ Evaluation Limitations

**Note**: Two separate evaluation sets exist:
- **Automated Test Set**: 17,033 samples from test.csv → Macro F1: 0.9165, MCC: 0.9045
- **Manual Edge Case Test**: 22 hand-picked samples → 86.4% accuracy (19/22 correct)

| Limitation | Details | Impact |
|------------|---------|--------|
| **Small Manual Test Set** | Edge case validation on only 22 samples (86.4%) | Manual test not statistically significant; automated metrics (17K samples) more reliable |
| **No Per-Class Metrics** | Only Macro F1 and MCC reported for 17K test set | Cannot assess individual category performance (e.g., SELF_HARM Precision/Recall vs SAFE) |
| **No Confusion Matrix** | Category confusion patterns not documented | Unclear which categories are most confused beyond GROOMING/SEXUAL boundary |
| **No PR/ROC Curves** | Precision-Recall and ROC analysis not performed | Optimal threshold selection methodology not documented |
| **No Calibration Analysis** | Model confidence calibration not tested | Unknown if 0.7 confidence truly represents 70% probability |

### ⚠️ Architectural Limitations

| Limitation | Details | Impact |
|------------|---------|--------|
| **Short Context Window** | Max sequence length: 64 tokens | Long messages may lose critical information; truncation may remove key context |
| **Single-Turn Only** | No conversation history analysis | GROOMING patterns often emerge across multiple messages ("Kaç yaşındasın?", "Nerelisin?", "Fotoğraf atar mısın?" may each appear SAFE individually) |
| **No Temporal Patterns** | No escalation detection capability | Cannot detect behavior changes over time; user history not considered |
| **Static Analysis** | Each message analyzed independently | Contextual red flags from message sequences not captured |

### ⚠️ Data & Coverage Limitations

| Limitation | Details | Impact |
|------------|---------|--------|
| **Dialect/Slang Gaps** | Regional dialects and internet slang underrepresented | Performance may degrade on: "napıon", "nbr", "slm", "mrb", regional variations |
| **No Adversarial Testing** | Evasion techniques not systematically tested | Unknown robustness against: "S 3 x" instead of "sex", character substitution, unicode tricks |
| **Synthetic Data Bias** | 97.5% of training data is LLM-generated | May not capture real-world linguistic patterns; potential distribution shift |
| **Spelling Error Tolerance** | Not explicitly tested | Common typos and intentional misspellings may bypass detection |

### ⚠️ Production Deployment Considerations

| Consideration | Details | Recommendation |
|---------------|---------|----------------|
| **Threshold Selection** | Current thresholds (0.3, 0.5, 0.75) are heuristic | Perform PR curve analysis for your specific use case; adjust based on FP/FN tolerance |
| **Confidence Calibration** | Model may be over/under-confident | Consider temperature scaling or Platt calibration before production |
| **Category Boundaries** | GROOMING ↔ SEXUAL boundary is known issue | Review flagged content in these categories; implement human review for edge cases |
| **Real-Time Context** | No session-level analysis | Consider implementing sliding window or conversation aggregation layer |

### Not Suitable For

- Languages other than Turkish
- Adult content moderation (requires different domain expertise)
- Sole decision-making without human review for high-stakes situations
- Legal evidence or court proceedings
- Detection of sophisticated, multi-turn grooming attempts without additional context layer
- Highly informal/slang-heavy communications without additional preprocessing

---

## Ethical Considerations

### Intended Use

- Social media content moderation
- Messaging platform safety filters
- Gaming chat moderation
- Community forum monitoring
- Parental control applications
- Research and educational purposes

### Risks

- **False Negatives**: May miss sophisticated grooming attempts
- **False Positives**: May flag benign content incorrectly
- **Automation Bias**: Over-reliance on model predictions

### Recommendations

1. **Human Oversight**: Always combine with human review for critical decisions
2. **Threshold Calibration**: Adjust thresholds based on your risk tolerance
3. **Monitoring**: Track performance metrics in production
4. **Regular Updates**: Retrain with new data periodically
5. **Transparency**: Inform users about automated moderation

---

## Technical Specifications

### Model Architecture

```
BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings
    (encoder): BertEncoder (12 layers)
    (pooler): BertPooler
  )
  (dropout): Dropout(p=0.1)
  (classifier): Linear(in_features=768, out_features=7)
)

Total Parameters: ~110M
Trainable Parameters: ~110M
```

### Input/Output

- **Input**: Turkish text (max 128 tokens)
- **Output**: 7-dimensional probability vector (sigmoid activated)
- **Tokenizer**: BERTurk WordPiece (32k vocab)

---

## Citation

```bibtex
@misc{turkish-safety-2025,
  title={Turkish Safety - Content Moderation Classifier},
  author={SiriusAI Tech Brain Team},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/hayatiali/turkish-safety}},
  note={Fine-tuned from dbmdz/bert-base-turkish-uncased, Macro F1: 0.9076}
}
```

---

## Model Card Authors

**SiriusAI Tech Brain Team**

## Contact

- **Issues**: [GitHub Issues](https://github.com/sirius-tedarik/Omni-Moderation-API/issues)
- **Repository**: [Omni-Moderation-API](https://github.com/sirius-tedarik/Omni-Moderation-API)

---

## Changelog

### v5.0 (Current)
- Major dataset expansion: 85,161 samples (68,128 train / 17,033 test)
- Improved metrics: **Macro F1: 0.9165**, **MCC: 0.9045**
- Optimized hyperparameters for large dataset (Focal Loss, cosine restarts)
- 67 subcategories across 7 main categories
- 86.4% validation accuracy on edge cases

### v4.0
- Initial production release
- 7-category multi-label content safety classification
- Macro F1: 0.9076, MCC: 0.8931
- Training on 30,596 samples
- Clear category boundary definitions (GROOMING vs SEXUAL)
- Optimized for real-time inference (<50ms)

---

**License**: SiriusAI Tech Premium License v1.0

**Commercial Use**: Requires Premium License. Contact: info@siriusaitech.com

**Free Use Allowed For**:
- Academic research and education
- Non-profit organizations (with approval)
- Evaluation (30 days)

**Disclaimer**: This model is designed for content moderation and safety applications. Always implement with appropriate safeguards and human oversight. Model predictions should inform decisions, not replace human judgment.