Spaces:
Running
A newer version of the Gradio SDK is available:
5.49.1
π Enhanced RAG System Setup Guide
This guide will help you set up the Enhanced RAG (Retrieval-Augmented Generation) system for saving high-confidence news to Google Drive.
π Overview
The Enhanced RAG system automatically saves news with 95%+ confidence from Gemini analysis to Google Drive, allowing you to:
- View all high-confidence news entries
- Use them for better RAG analysis
- Track user input patterns
- Build a comprehensive knowledge base
π§ Setup Steps
Step 1: Google Cloud Console Setup
Go to Google Cloud Console
Create or Select Project
- Create a new project or select existing one
- Note your project ID
Enable Google Drive API
- Go to "APIs & Services" β "Library"
- Search for "Google Drive API"
- Click "Enable"
Create OAuth 2.0 Credentials
- Go to "APIs & Services" β "Credentials"
- Click "Create Credentials" β "OAuth 2.0 Client IDs"
- Choose "Desktop application"
- Download the JSON file
- Rename it to
credentials.json - Place it in your project directory
Step 2: Local Setup
Run the Setup Script
python setup_google_drive_rag.pyFollow the Authentication Process
- A browser window will open
- Log in with your Google account
- Grant permissions for Google Drive access
- The script will save your credentials
Verify Setup
- The script will test Google Drive access
- It will create the RAG folder and file
- You'll see confirmation messages
Step 3: Hugging Face Spaces Setup (Optional)
If you want to use this on Hugging Face Spaces:
Add Secrets to Hugging Face
- Go to your Space settings
- Add these secrets:
GOOGLE_CLIENT_ID: Your OAuth client IDGOOGLE_CLIENT_SECRET: Your OAuth client secretGOOGLE_REFRESH_TOKEN: Get this from your local token.json
Get Refresh Token
- Run the setup script locally first
- Check the
token.jsonfile - Copy the
refresh_tokenvalue
π File Structure
After setup, you'll have:
your-project/
βββ credentials.json # Google OAuth credentials
βββ token.json # Saved authentication token
βββ rag_news_manager.py # Main RAG system
βββ setup_google_drive_rag.py # Setup script
βββ view_rag_news.py # News viewer
βββ app.py # Your main app (updated)
π Google Drive Structure
The system creates:
Google Drive/
βββ Vietnamese_Fake_News_RAG/
βββ high_confidence_news.json
π How It Works
Automatic Saving
- When users input news, the system analyzes it
- If Gemini confidence > 95%, it's automatically saved to Google Drive
- Each entry includes:
- News text
- Prediction (REAL/FAKE)
- Confidence score
- Gemini analysis
- Search results
- Timestamp
Data Format
{
"metadata": {
"created_at": "2024-01-01T00:00:00",
"description": "High-confidence Vietnamese fake news for RAG",
"threshold": 0.95,
"total_entries": 10,
"last_updated": "2024-01-01T12:00:00"
},
"news_entries": [
{
"id": 1,
"content_hash": "abc123...",
"news_text": "Argentina vΓ΄ Δα»ch World Cup 2022...",
"prediction": "REAL",
"gemini_confidence": 0.98,
"gemini_analysis": "1. KαΊΎT LUαΊ¬N: THαΊ¬T...",
"distilbert_confidence": 0.85,
"search_results": [...],
"created_at": "2024-01-01T10:00:00",
"source": "user_input",
"verified": true
}
]
}
π₯οΈ Viewing Saved News
Option 1: Command Line Viewer
python view_rag_news.py
Features:
- View all saved news
- Filter by prediction (REAL/FAKE)
- Search through entries
- View statistics
- Open Google Drive directly
Option 2: Google Drive Web Interface
- Go to your Google Drive
- Find the "Vietnamese_Fake_News_RAG" folder
- Open "high_confidence_news.json"
- View the raw JSON data
Option 3: Direct Google Drive Links
The system provides direct links:
- Folder:
https://drive.google.com/drive/folders/{folder_id} - File:
https://drive.google.com/file/d/{file_id}/view
π§ Configuration
In app.py
# Enhanced RAG System Configuration
ENABLE_ENHANCED_RAG = True # Enable/disable the system
RAG_CONFIDENCE_THRESHOLD = 0.95 # 95% threshold for saving
Thresholds
- 95%: Only very high-confidence predictions are saved
- 90%: More entries saved, but still high quality
- 85%: More entries, but some uncertainty
π Statistics
The system tracks:
- Total entries saved
- Real vs Fake news count
- Average confidence score
- Latest entry timestamp
- Google Drive folder/file IDs
π¨ Troubleshooting
Common Issues
"credentials.json not found"
- Make sure you downloaded the OAuth credentials
- Rename the file to exactly
credentials.json - Place it in the project directory
"Authentication failed"
- Check your internet connection
- Make sure Google Drive API is enabled
- Try running the setup script again
"Permission denied"
- Make sure you granted all required permissions
- Check if your Google account has Drive access
"RAG system not available"
- Check if all dependencies are installed
- Make sure
rag_news_manager.pyis in the same directory
Debug Mode
Add this to see detailed logs:
import logging
logging.basicConfig(level=logging.DEBUG)
π Integration with Existing System
The Enhanced RAG system works alongside your existing knowledge base:
- Local Knowledge Base: Still works as before
- Enhanced RAG: Additional Google Drive storage
- Both systems: Can be used together for comprehensive RAG
π± Usage Examples
View Recent News
python view_rag_news.py
# Select option 2: View Recent News
Search for Specific Topics
python view_rag_news.py
# Select option 6: Search News
# Enter: "COVID-19"
Check Statistics
python view_rag_news.py
# Select option 1: View Statistics
π― Benefits
- Automatic Collection: No manual intervention needed
- High Quality: Only 95%+ confidence entries saved
- Easy Access: View through multiple interfaces
- Scalable: Google Drive handles large datasets
- Searchable: Find specific news entries quickly
- Analytics: Track patterns and statistics
π Security
- OAuth 2.0 authentication
- Credentials stored securely
- Only your Google account can access
- No sensitive data exposed
π Support
If you encounter issues:
- Check the troubleshooting section
- Verify all setup steps completed
- Check Google Cloud Console for API quotas
- Ensure proper file permissions
π Congratulations! You now have a comprehensive RAG system that automatically saves high-confidence news to Google Drive for analysis and viewing!