--- license: mit datasets: - BothBosu/scam-dialogue - BothBosu/Scammer-Conversation - BothBosu/youtube-scam-conversations - BothBosu/multi-agent-scam-conversation - BothBosu/single-agent-scam-conversations - an19352/scam-baiting-conversations - scambaitermailbox/scambaiting_dataset language: - en metrics: - bertscore - perplexity - f1 - rouge - distinct-n - dialogrpt base_model: - meta-llama/LlamaGuard-7b - meta-llama/Meta-Llama-Guard-2-8B - meta-llama/Llama-Guard-3-8B - OpenSafetyLab/MD-Judge-v0.1 pipeline_tag: text-generation library_name: transformers --- # Model Card: AI-in-the-Loop for Real-Time Scam Detection & Scam-Baiting This repository contains **instruction-tuned large language models (LLMs)** designed for **real-time scam detection, conversational scam-baiting, and privacy-preserving federated learning**. The models are trained and evaluated as part of the paper: **[AI-in-the-Loop: Privacy Preserving Real-Time Scam Detection and Conversational Scambaiting by Leveraging LLMs and Federated Learning](https://supreme-lab.github.io/ai-in-the-loop/)** --- ## Model Details - **Developed by:** Supreme Lab, University of Texas at El Paso & Southern Illinois University Carbondale - **Funded by:** U.S. National Science Foundation (Award No. 2451946) and U.S. Nuclear Regulatory Commission (Award No. 31310025M0012) - **Shared by:** Ismail Hossain, Sai Puppala, Sajedul Talukder, Md Jahangir Alam - **Model type:** Multi-task instruction-tuned LLMs (classification + safe text generation) - **Languages:** English - **License:** MIT - **Finetuned from:** LlamaGuard family & MD-Judge ### Model Sources - **Repository:** [GitHub – supreme-lab/ai-in-the-loop](https://github.com/supreme-lab/ai-in-the-loop) - **Hugging Face:** [supreme-lab/ai-in-the-loop](https://huggingface.co/supreme-lab/ai-in-the-loop) - **Paper:** [ArXiv version](#) --- ## Uses ### Direct Use - Real-time scam classification (scam vs. non-scam conversations) - Conversational **scam-baiting** to waste scammer time safely - **PII risk scoring** to filter unsafe outputs ### Downstream Use - Integration into messaging platforms for scam prevention - Benchmarks for **AI safety alignment** in adversarial contexts - Research in **federated privacy-preserving LLMs** ### Out-of-Scope Use - Should **not** be used as a replacement for law enforcement tools - Should **not** be deployed without safety filters and human-in-the-loop monitoring - Not intended for **financial or medical decision-making** --- ## Bias, Risks, and Limitations - Models may **over-engage with scammers** in rare cases - Possible **false positives** in benign conversations - Cultural/linguistic bias: trained primarily on **English data** - Risk of **hallucination** when generating long responses ### Recommendations - Always deploy with **safety thresholds (δ, θ1, θ2)** - Use in **controlled environments** first (research, simulations) - Extend to **multilingual settings** before real-world deployment --- ## How to Get Started ```python from transformers import AutoModelForCausalLM, AutoTokenizer # Replase the with 2 or 3 and Nothing (when it is llama-guard-multi-task) model_id = "supreme-lab/ai-in-the-loop/llama-guard--multi-task" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id) inputs = tokenizer("Scammer: Hello, I need your SSN.", return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=100) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` --- ## Training Details ### Training Data - **Classification:** SSD, SSC, SASC, MASC (synthetic scam/non-scam dialogues) - **Generation:** SBC (254 real scam-baiting convs), ASB (>37k msgs), YTSC (YouTube scam transcriptions) - **Auxiliary:** ConvAI, DailyDialog (engagement), HarmfulQA, Microsoft PII dataset ### Training Procedure - **Fine-tuning setup:** - 3 epochs, batch size = 8 - LoRA rank = 8, α = 16 - Mixed precision (bf16) - Optimizer: AdamW - **Federated Learning (FL):** - Simulated 10 clients, 30 rounds FedAvg - Optional **Differential Privacy** (noise multipliers: 0.1, 0.8) --- ## Evaluation ### Metrics - **Classification:** F1, AUPRC, FPR, FNR - **Generation:** Perplexity, Distinct-1/2, DialogRPT, BERTScore, ROUGE-L, HarmBench ### Results - **Classification:** BiGRU/BiLSTM > 0.99 F1, RoBERTa competitive - **Instruction-tuned LLMs:** MD-Judge best overall (F1 = 0.89+), LlamaGuard3 strong for moderation - **Generation:** MD-Judge achieved **lowest perplexity (22.3)**, **highest engagement (0.79)**, **96% safety compliance** in human evals --- ## Environmental Impact - **Hardware:** NVIDIA H100 GPUs - **Training Time:** ~30 hrs across models - **Federated Setup:** 10 simulated clients, 30 rounds --- ## Technical Specifications - **Architecture:** Instruction-tuned transformer (decoder-only) - **Objective:** Multi-task (classification, risk scoring, safe generation) --- ## Citation If you use these models, please cite our paper: ```bibtex @article{hossain2025aiintheloop, title={AI-in-the-Loop: Privacy Preserving Real-Time Scam Detection and Conversational Scambaiting by Leveraging LLMs and Federated Learning}, author={Hossain, Ismail; Puppala, Sai; Alam, Md Jahangir; and Talukder, Sajedul}, journal={[arXiv preprint arXiv:2509.05362](https://arxiv.org/abs/2509.05362)}, year={2025} } ``` --- ## Contact - **Authors:** ihossain@miners.utep.edu, sai.puppala@siu.edu, stalukder@utep.edu - **Lab:** [Supreme Lab](https://www.cs.utep.edu/stalukder/supremelab/index.html) - **Personal Web:** [https://ismail102.github.io/](https://ismail102.github.io/) ---