|
|
--- |
|
|
title: Falconz - Red teamers |
|
|
emoji: ⚡ |
|
|
colorFrom: blue |
|
|
colorTo: yellow |
|
|
sdk: gradio |
|
|
sdk_version: 5.49.1 |
|
|
app_file: app.py |
|
|
pinned: true |
|
|
thumbnail: >- |
|
|
/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F621c88aca7d6c7e0563256ae%2FsCv6mFixuQLmzhTJuzgXG.png%3C%2Fspan%3E%3C!-- HTML_TAG_END --> |
|
|
short_description: MCP Powered Redteaming tool to Safeguard your Agentic Apps!! |
|
|
tags: |
|
|
- building-mcp-track-enterprise |
|
|
- mcp-in-action-track-enterprise |
|
|
- security |
|
|
- red-teaming |
|
|
- ai-safety |
|
|
--- |
|
|
|
|
|
# 🛡️ Falconz – Unified LLM Security & Red Teaming Platform |
|
|
|
|
|
Welcome to our submission for the **Hugging Face GenAI Agents & MCP Hackathon**! |
|
|
Falconz is a **multi-model AI security platform** built with **Gradio & MCP** and Anthropic Claude models, designed to detect **jailbreaks, prompt injections, and unsafe LLM outputs in Agentic pipelines / LLM based workflows across multiple foundation models** in real time. |
|
|
|
|
|
|
|
|
|
|
|
🎥 **Demo working Video:** |
|
|
Main Falconz demo showcasing core features with MCP in Action in Claude Desktop. |
|
|
https://www.youtube.com/watch?v=HTEs5Sw-ID0 |
|
|
|
|
|
|
|
|
🌐 **Social media - LinkedIn and twitter Post:** |
|
|
Public announcement . |
|
|
https://www.linkedin.com/posts/sallu-mandya_ai-aiagents-mcp-activity-7399436956662841344-3o1I?utm_source=share&utm_medium=member_desktop&rcm=ACoAACD-K8sBnXZWALlW2yw-AnT_4KptCJFJs7M |
|
|
|
|
|
https://x.com/SalluMandya/status/1993948272780825003?s=20 |
|
|
|
|
|
|
|
|
🌐 **Google CO:lab:** |
|
|
https://colab.research.google.com/drive/1PSuPQ35UZntKcUBd43QtjrsRLVvHJYlm?usp=sharing |
|
|
|
|
|
🌐 **HF Blog:** |
|
|
https://huggingface.co/blog/Xhaheen/falconz-mcp-hackathon |
|
|
|
|
|
## 🏷️ Hackathon Track Tags |
|
|
|
|
|
This project is officially submitted to the following MCP Hackathon tracks: |
|
|
|
|
|
- **building-mcp-track-enterprise** |
|
|
- **mcp-in-action-track-enterprise** |
|
|
- **security** |
|
|
- **red-teaming** |
|
|
- **ai-safety** |
|
|
## 🌐 Platform Overview |
|
|
|
|
|
Falconz provides a unified security layer for LLM-based apps by combining: |
|
|
|
|
|
- 🔐 **Real-time jailbreak & prompt-injection detection using CLaude Model** |
|
|
- 🧠 **Multi-model testing across Anthropic, OpenAI, Gemini, Mistral, Phi & more** |
|
|
- 🖼️ **Image-based prompt injection scanning** |
|
|
- 📊 **Analytics dashboard for threat trends** |
|
|
- 🪝 **MCP integration for agentic workflows** |
|
|
|
|
|
This platform helps developers validate and harden LLM systems against manipulation and unsafe outputs. |
|
|
|
|
|
--- |
|
|
|
|
|
## 🧩 Core Modules |
|
|
|
|
|
### 💬 Chat & Response Analysis |
|
|
- Interact with multiple LLMs |
|
|
- Automatically evaluates model responses for: |
|
|
- Jailbreak signals |
|
|
- Policy violations |
|
|
- Manipulation attempts |
|
|
- Outputs structured JSON + visual risk scoring |
|
|
|
|
|
### 📝 Prompt Tester |
|
|
- Test known or custom jailbreak prompts |
|
|
- Compare how different models respond |
|
|
- Ideal for red-teaming and benchmarking model safety |
|
|
|
|
|
### 🖼️ Image Scanner |
|
|
- Detects hidden prompt instructions within images |
|
|
- Flags potential injection attempts (SAFE / UNSAFE) |
|
|
|
|
|
### ⚙️ Prompt Library (Customizable) |
|
|
- Built-in top 10 jailbreak templates (OWASP-inspired) |
|
|
- Users can update and auto-modify prompt templates |
|
|
- Supports CSV import + dynamic replacements |
|
|
|
|
|
### 📊 Analytics Dashboard |
|
|
- Trends of SAFE vs UNSAFE detections |
|
|
- Risk score visualization |
|
|
- Model performance insights |
|
|
|
|
|
--- |
|
|
|
|
|
## 🔗 Multi-Model Support |
|
|
|
|
|
Falconz integrates with (With openAI like Endpoints): |
|
|
- ✅ Anthropic |
|
|
- ✅ openai |
|
|
- ✅ Google Gemini |
|
|
- ✅ Mistral |
|
|
- ✅ Microsoft Phi |
|
|
- ✅ Meta (Guard Models) |
|
|
- ✅ Meta (Guard Models) |
|
|
- Any Custom model from OpenRouter or OpenAI like endpoints |
|
|
|
|
|
Each model can be tested independently for safety robustness. |
|
|
|
|
|
--- |
|
|
|
|
|
|
|
|
High-level components: |
|
|
- **Frontend:** Gradio UI (Multi-tab interaction) |
|
|
- **Middleware:** MCP-powered routing & agent logic |
|
|
- **Backend:** Multi-model OpenRouter API |
|
|
- **Analytics:** Local CSV logging + dashboards |
|
|
|
|
|
--- |
|
|
|
|
|
## 🚀 How It Works (Full App Flow Across All Tabs) |
|
|
|
|
|
### ✅ 1️⃣ Chat & Analysis Flow |
|
|
1. User enters a message in the **Chat** tab |
|
|
2. Falconz sends the message to the selected LLM model |
|
|
3. The model responds normally |
|
|
4. The response is passed through the **risk analysis engine** |
|
|
5. A JSON risk score + visual report is generated |
|
|
6. Conversation & analysis logs are stored for analytics |
|
|
|
|
|
--- |
|
|
|
|
|
### ✅ 2️⃣ Text Prompt Tester Flow |
|
|
1. User inputs a jailbreak/prompt-injection test prompt |
|
|
2. Falconz sends it directly to the selected guard model |
|
|
3. The raw model response is returned (no chat history) |
|
|
4. Users compare responses to evaluate model safety behavior |
|
|
|
|
|
--- |
|
|
|
|
|
### ✅ 3️⃣ Image Scanner Flow |
|
|
1. User uploads an image containing text or hidden instructions |
|
|
2. Falconz extracts image content and sends it to a vision model |
|
|
3. The model evaluates the content for injection threats |
|
|
4. Output is classified as **SAFE** or **UNSAFE** |
|
|
|
|
|
## 🧑💻 Authors |
|
|
|
|
|
- [Mohammed Arsalan](http://linkedin.com/in/sallu-mandya/) |
|
|
|
|
|
## 📝 License |
|
|
|
|
|
This project is licensed under the **MIT License**. |
|
|
|
|
|
--- |
|
|
## 📝 Architecture |
|
|
|
|
|
[View Architecture Diagram](https://huggingface.co/spaces/MCP-1st-Birthday/Falconzz_M.C.P_Hackathon/blob/main/mcparchitecture.png) |
|
|
|
|
|
## 🏗️ System Architecture Overview |
|
|
|
|
|
Falconz is a **multi-layered LLM security platform** with the following core components: |
|
|
|
|
|
--- |
|
|
|
|
|
## 📊 Component Breakdown |
|
|
|
|
|
### 1️⃣ **Frontend Layer** (Gradio UI) |
|
|
- **Chat & Analysis Tab** - Real-time chat with integrated threat detection |
|
|
- **Image Scanner Tab** - Vision-based prompt injection detection |
|
|
- **Text Prompt Tester Tab** - Custom jailbreak testing interface |
|
|
- **Analytics Dashboard Tab** - Real-time threat trends and metrics |
|
|
- **Learning Hub Tab** - Educational resources for red teaming |
|
|
|
|
|
### 2️⃣ **Detection Engine Layer** (Claude-Powered) |
|
|
- **falcon_prompt_text** - Text-based jailbreak and prompt injection detection |
|
|
- **Falcon_prompt_image** - Vision-based injection scanning (SAFE/UNSAFE classification) |
|
|
- **prompt_injection_templates** - Top 10 OWASP-inspired jailbreak patterns |
|
|
- **Risk Scoring Engine** - Generates risk scores (0-100) with policy violation flags |
|
|
|
|
|
### 3️⃣ **Multi-Model API Layer** (OpenRouter Gateway) |
|
|
|
|
|
**Detection Models:** |
|
|
- Claude Sonnet 4.5 |
|
|
- Claude Opus 4.1 |
|
|
- Claude Haiku 4.5 |
|
|
- Llama Guard 4 |
|
|
|
|
|
**Chat Models:** |
|
|
- Google Gemini 2.5 |
|
|
- OpenAI GPT-4o |
|
|
- Mistral Medium |
|
|
- Microsoft Phi-4 |
|
|
|
|
|
**Vision Models:** |
|
|
- Claude Sonnet 4.5 |
|
|
- Google Gemini 2.5 |
|
|
- OpenAI GPT-4o |
|
|
- Phi-4 Multimodal |
|
|
|
|
|
### 4️⃣ **Data Storage Layer** (CSV & Logging) |
|
|
- **analytics.csv** - Logs timestamps, detection results, and models used |
|
|
- **Prompts.csv** - Customizable prompt injection templates |
|
|
- **Prompts_updated.csv** - Modified templates with dynamic replacements |
|
|
|
|
|
### 5️⃣ **Analysis Engine Layer** (Processing & Formatting) |
|
|
- **JSON Parser** - Extracts risk_score, jailbreak flags, policy breaks, attack types |
|
|
- **Visual Formatter** - Color-coded risk display (Green/Orange/Red), Markdown rendering |
|
|
- **Dashboard Aggregator** - Computes trends, KPIs, and generates recommendations |
|
|
|
|
|
### 6️⃣ **Output Layer** (Results & Reports) |
|
|
- **Raw JSON Output** - Structured threat detection data |
|
|
- **Visual Analysis Report** - Color-coded risk scores with policy violations |
|
|
- **Analytics Dashboard** - Interactive charts, trends, and security insights |
|
|
|
|
|
--- |
|
|
|
|
|
## 🔄 Data Flow |
|
|
|
|
|
``` |
|
|
User Input (Frontend) |
|
|
↓ |
|
|
Request Router (Message/Image/Prompt Handler) |
|
|
↓ |
|
|
Detection Engine (Claude Analysis) |
|
|
↓ |
|
|
Multi-Model API (OpenRouter Gateway) |
|
|
↓ |
|
|
External APIs (Google, OpenAI, Meta) |
|
|
↓ |
|
|
Data Storage (CSV Logging) |
|
|
↓ |
|
|
Analysis Engine (JSON/Format Processing) |
|
|
↓ |
|
|
Output Layer (JSON, Visual Report, Dashboard) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## 🛠️ Technology Stack |
|
|
|
|
|
| Component | Technology | |
|
|
|-----------|-----------| |
|
|
| **Frontend** | Gradio 5.49.1 (Glass Theme) | |
|
|
| **Backend** | Python 3.x + OpenAI Client | |
|
|
| **API Gateway** | OpenRouter.ai/api/v1 | |
|
|
| **Detection** | Anthropic Claude Models | |
|
|
| **Data Format** | JSON, CSV, Pandas | |
|
|
| **Visualization** | Matplotlib, Markdown | |
|
|
| **Logging** | IST Timezone + CSV Storage | |
|
|
| **Deployment** | Gradio Share + MCP Support | |
|
|
|
|
|
--- |
|
|
|
|
|
## ✨ Key Features |
|
|
|
|
|
✅ **Real-time Detection** - Jailbreak and prompt injection scanning |
|
|
✅ **Multi-Model Testing** - Compare safety across 15+ LLM providers |
|
|
✅ **Vision Scanning** - Image-based threat detection |
|
|
✅ **Customizable Templates** - Top 10 OWASP-inspired attack patterns |
|
|
✅ **Risk Scoring** - Automated 0-100 risk assessment |
|
|
✅ **Analytics Dashboard** - Trend visualization and KPI tracking |
|
|
✅ **MCP Integration** - Enterprise-grade agentic workflow support |
|
|
✅ **Ethical Red Teaming** - Secure, responsible AI safety testing |
|
|
|
|
|
--- |
|
|
|
|
|
## 🎯 Use Cases |
|
|
|
|
|
- **AI Safety Teams** - Test LLM robustness against adversarial prompts |
|
|
- **Security Researchers** - Benchmark jailbreak techniques across models |
|
|
- **DevOps Engineers** - Monitor LLM-based applications for injection risks |
|
|
- **Enterprise Security** - Validate agentic systems before production deployment |
|
|
|
|
|
--- |
|
|
|
|
|
## 📄 License |
|
|
|
|
|
MIT License - See project repository for details |
|
|
|
|
|
## ✅ Reminder |
|
|
|
|
|
Falconz is intended **only for ethical security testing** and **AI safety research** as part of MCP Gradio Hackathon. |
|
|
Users are responsible for complying with all laws, policies, and platform terms. |
|
|
|
|
|
🛡️ Build safe. Test responsibly. Protect the future of AI , contact me to [Xhaheen](http://linkedin.com/in/sallu-mandya/) for Collab . |