# πŸš€ Enhanced RAG System Setup Guide This guide will help you set up the Enhanced RAG (Retrieval-Augmented Generation) system for saving high-confidence news to Google Drive. ## πŸ“‹ Overview The Enhanced RAG system automatically saves news with **95%+ confidence** from Gemini analysis to Google Drive, allowing you to: - View all high-confidence news entries - Use them for better RAG analysis - Track user input patterns - Build a comprehensive knowledge base ## πŸ”§ Setup Steps ### Step 1: Google Cloud Console Setup 1. **Go to Google Cloud Console** - Visit: https://console.cloud.google.com/ 2. **Create or Select Project** - Create a new project or select existing one - Note your project ID 3. **Enable Google Drive API** - Go to "APIs & Services" β†’ "Library" - Search for "Google Drive API" - Click "Enable" 4. **Create OAuth 2.0 Credentials** - Go to "APIs & Services" β†’ "Credentials" - Click "Create Credentials" β†’ "OAuth 2.0 Client IDs" - Choose "Desktop application" - Download the JSON file - Rename it to `credentials.json` - Place it in your project directory ### Step 2: Local Setup 1. **Run the Setup Script** ```bash python setup_google_drive_rag.py ``` 2. **Follow the Authentication Process** - A browser window will open - Log in with your Google account - Grant permissions for Google Drive access - The script will save your credentials 3. **Verify Setup** - The script will test Google Drive access - It will create the RAG folder and file - You'll see confirmation messages ### Step 3: Hugging Face Spaces Setup (Optional) If you want to use this on Hugging Face Spaces: 1. **Add Secrets to Hugging Face** - Go to your Space settings - Add these secrets: - `GOOGLE_CLIENT_ID`: Your OAuth client ID - `GOOGLE_CLIENT_SECRET`: Your OAuth client secret - `GOOGLE_REFRESH_TOKEN`: Get this from your local token.json 2. **Get Refresh Token** - Run the setup script locally first - Check the `token.json` file - Copy the `refresh_token` value ## πŸ“ File Structure After setup, you'll have: ``` your-project/ β”œβ”€β”€ credentials.json # Google OAuth credentials β”œβ”€β”€ token.json # Saved authentication token β”œβ”€β”€ rag_news_manager.py # Main RAG system β”œβ”€β”€ setup_google_drive_rag.py # Setup script β”œβ”€β”€ view_rag_news.py # News viewer └── app.py # Your main app (updated) ``` ## πŸ” Google Drive Structure The system creates: ``` Google Drive/ └── Vietnamese_Fake_News_RAG/ └── high_confidence_news.json ``` ## πŸ“Š How It Works ### Automatic Saving - When users input news, the system analyzes it - If Gemini confidence > 95%, it's automatically saved to Google Drive - Each entry includes: - News text - Prediction (REAL/FAKE) - Confidence score - Gemini analysis - Search results - Timestamp ### Data Format ```json { "metadata": { "created_at": "2024-01-01T00:00:00", "description": "High-confidence Vietnamese fake news for RAG", "threshold": 0.95, "total_entries": 10, "last_updated": "2024-01-01T12:00:00" }, "news_entries": [ { "id": 1, "content_hash": "abc123...", "news_text": "Argentina vΓ΄ Δ‘α»‹ch World Cup 2022...", "prediction": "REAL", "gemini_confidence": 0.98, "gemini_analysis": "1. KαΊΎT LUαΊ¬N: THαΊ¬T...", "distilbert_confidence": 0.85, "search_results": [...], "created_at": "2024-01-01T10:00:00", "source": "user_input", "verified": true } ] } ``` ## πŸ–₯️ Viewing Saved News ### Option 1: Command Line Viewer ```bash python view_rag_news.py ``` Features: - View all saved news - Filter by prediction (REAL/FAKE) - Search through entries - View statistics - Open Google Drive directly ### Option 2: Google Drive Web Interface - Go to your Google Drive - Find the "Vietnamese_Fake_News_RAG" folder - Open "high_confidence_news.json" - View the raw JSON data ### Option 3: Direct Google Drive Links The system provides direct links: - Folder: `https://drive.google.com/drive/folders/{folder_id}` - File: `https://drive.google.com/file/d/{file_id}/view` ## πŸ”§ Configuration ### In app.py ```python # Enhanced RAG System Configuration ENABLE_ENHANCED_RAG = True # Enable/disable the system RAG_CONFIDENCE_THRESHOLD = 0.95 # 95% threshold for saving ``` ### Thresholds - **95%**: Only very high-confidence predictions are saved - **90%**: More entries saved, but still high quality - **85%**: More entries, but some uncertainty ## πŸ“ˆ Statistics The system tracks: - Total entries saved - Real vs Fake news count - Average confidence score - Latest entry timestamp - Google Drive folder/file IDs ## 🚨 Troubleshooting ### Common Issues 1. **"credentials.json not found"** - Make sure you downloaded the OAuth credentials - Rename the file to exactly `credentials.json` - Place it in the project directory 2. **"Authentication failed"** - Check your internet connection - Make sure Google Drive API is enabled - Try running the setup script again 3. **"Permission denied"** - Make sure you granted all required permissions - Check if your Google account has Drive access 4. **"RAG system not available"** - Check if all dependencies are installed - Make sure `rag_news_manager.py` is in the same directory ### Debug Mode Add this to see detailed logs: ```python import logging logging.basicConfig(level=logging.DEBUG) ``` ## πŸ”„ Integration with Existing System The Enhanced RAG system works alongside your existing knowledge base: - **Local Knowledge Base**: Still works as before - **Enhanced RAG**: Additional Google Drive storage - **Both systems**: Can be used together for comprehensive RAG ## πŸ“± Usage Examples ### View Recent News ```bash python view_rag_news.py # Select option 2: View Recent News ``` ### Search for Specific Topics ```bash python view_rag_news.py # Select option 6: Search News # Enter: "COVID-19" ``` ### Check Statistics ```bash python view_rag_news.py # Select option 1: View Statistics ``` ## 🎯 Benefits 1. **Automatic Collection**: No manual intervention needed 2. **High Quality**: Only 95%+ confidence entries saved 3. **Easy Access**: View through multiple interfaces 4. **Scalable**: Google Drive handles large datasets 5. **Searchable**: Find specific news entries quickly 6. **Analytics**: Track patterns and statistics ## πŸ” Security - OAuth 2.0 authentication - Credentials stored securely - Only your Google account can access - No sensitive data exposed ## πŸ“ž Support If you encounter issues: 1. Check the troubleshooting section 2. Verify all setup steps completed 3. Check Google Cloud Console for API quotas 4. Ensure proper file permissions --- **πŸŽ‰ Congratulations!** You now have a comprehensive RAG system that automatically saves high-confidence news to Google Drive for analysis and viewing!