---
license: mit
datasets:
- BothBosu/scam-dialogue
- BothBosu/Scammer-Conversation
- BothBosu/youtube-scam-conversations
- BothBosu/multi-agent-scam-conversation
- BothBosu/single-agent-scam-conversations
- an19352/scam-baiting-conversations
- scambaitermailbox/scambaiting_dataset
language:
- en
metrics:
- bertscore
- perplexity
- f1
- rouge
- distinct-n
- dialogrpt
base_model:
- meta-llama/LlamaGuard-7b
- meta-llama/Meta-Llama-Guard-2-8B
- meta-llama/Llama-Guard-3-8B
- OpenSafetyLab/MD-Judge-v0.1
pipeline_tag: text-generation
library_name: transformers
---

# Model Card: AI-in-the-Loop for Real-Time Scam Detection & Scam-Baiting

This repository contains **instruction-tuned large language models (LLMs)** designed for **real-time scam detection, conversational scam-baiting, and privacy-preserving federated learning**.  
The models are trained and evaluated as part of the paper:  
**[AI-in-the-Loop: Privacy Preserving Real-Time Scam Detection and Conversational Scambaiting by Leveraging LLMs and Federated Learning](https://supreme-lab.github.io/ai-in-the-loop/)**  

---

## Model Details

- **Developed by:** Supreme Lab, University of Texas at El Paso & Southern Illinois University Carbondale  
- **Funded by:** U.S. National Science Foundation (Award No. 2451946) and U.S. Nuclear Regulatory Commission (Award No. 31310025M0012)  
- **Shared by:** Ismail Hossain, Sai Puppala, Sajedul Talukder, Md Jahangir Alam  
- **Model type:** Multi-task instruction-tuned LLMs (classification + safe text generation)  
- **Languages:** English  
- **License:** MIT  
- **Finetuned from:** LlamaGuard family & MD-Judge  

### Model Sources
- **Repository:** [GitHub – supreme-lab/ai-in-the-loop](https://github.com/supreme-lab/ai-in-the-loop)  
- **Hugging Face:** [supreme-lab/ai-in-the-loop](https://huggingface.co/supreme-lab/ai-in-the-loop)  
- **Paper:** [ArXiv version](#)  

---

## Uses

### Direct Use
- Real-time scam classification (scam vs. non-scam conversations)  
- Conversational **scam-baiting** to waste scammer time safely  
- **PII risk scoring** to filter unsafe outputs  

### Downstream Use
- Integration into messaging platforms for scam prevention  
- Benchmarks for **AI safety alignment** in adversarial contexts  
- Research in **federated privacy-preserving LLMs**  

### Out-of-Scope Use
- Should **not** be used as a replacement for law enforcement tools  
- Should **not** be deployed without safety filters and human-in-the-loop monitoring  
- Not intended for **financial or medical decision-making**  

---

## Bias, Risks, and Limitations

- Models may **over-engage with scammers** in rare cases  
- Possible **false positives** in benign conversations  
- Cultural/linguistic bias: trained primarily on **English data**  
- Risk of **hallucination** when generating long responses  

### Recommendations
- Always deploy with **safety thresholds (δ, θ1, θ2)**  
- Use in **controlled environments** first (research, simulations)  
- Extend to **multilingual settings** before real-world deployment  

---

## How to Get Started

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
# Replase the <x> with 2 or 3 and Nothing (when it is llama-guard-multi-task)
model_id = "supreme-lab/ai-in-the-loop/llama-guard-<x>-multi-task"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

inputs = tokenizer("Scammer: Hello, I need your SSN.", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

---

## Training Details

### Training Data
- **Classification:** SSD, SSC, SASC, MASC (synthetic scam/non-scam dialogues)  
- **Generation:** SBC (254 real scam-baiting convs), ASB (>37k msgs), YTSC (YouTube scam transcriptions)  
- **Auxiliary:** ConvAI, DailyDialog (engagement), HarmfulQA, Microsoft PII dataset  

### Training Procedure
- **Fine-tuning setup:**  
  - 3 epochs, batch size = 8  
  - LoRA rank = 8, α = 16  
  - Mixed precision (bf16)  
  - Optimizer: AdamW  
- **Federated Learning (FL):**  
  - Simulated 10 clients, 30 rounds FedAvg  
  - Optional **Differential Privacy** (noise multipliers: 0.1, 0.8)  

---

## Evaluation

### Metrics
- **Classification:** F1, AUPRC, FPR, FNR  
- **Generation:** Perplexity, Distinct-1/2, DialogRPT, BERTScore, ROUGE-L, HarmBench  

### Results
- **Classification:** BiGRU/BiLSTM > 0.99 F1, RoBERTa competitive  
- **Instruction-tuned LLMs:** MD-Judge best overall (F1 = 0.89+), LlamaGuard3 strong for moderation  
- **Generation:** MD-Judge achieved **lowest perplexity (22.3)**, **highest engagement (0.79)**, **96% safety compliance** in human evals  

---

## Environmental Impact

- **Hardware:** NVIDIA H100 GPUs  
- **Training Time:** ~30 hrs across models  
- **Federated Setup:** 10 simulated clients, 30 rounds 

---

## Technical Specifications

- **Architecture:** Instruction-tuned transformer (decoder-only)  
- **Objective:** Multi-task (classification, risk scoring, safe generation)
---

## Citation

If you use these models, please cite our paper:

```bibtex
@article{hossain2025aiintheloop,
  title={AI-in-the-Loop: Privacy Preserving Real-Time Scam Detection and Conversational Scambaiting by Leveraging LLMs and Federated Learning},
  author={Hossain, Ismail; Puppala, Sai; Alam, Md Jahangir; and Talukder, Sajedul},
  journal={[arXiv preprint arXiv:2509.05362](https://arxiv.org/abs/2509.05362)},
  year={2025}
}
```

---

## Contact

- **Authors:** ihossain@miners.utep.edu, sai.puppala@siu.edu, stalukder@utep.edu  
- **Lab:** [Supreme Lab](https://www.cs.utep.edu/stalukder/supremelab/index.html)
- **Personal Web:**   [https://ismail102.github.io/](https://ismail102.github.io/)

---