Spaces:

jacob-c
/

resumescreener_v2

Paused

App Files Files Community

root commited on May 27

Commit

26e8660

1 Parent(s): bca907c

ss

Browse files

Files changed (6) hide show

README.md +249 -4
config.py +182 -0
requirements.txt +16 -2
sample_resumes.csv +143 -0
src/streamlit_app.py +730 -38
test_installation.py +99 -0

README.md CHANGED Viewed

@@ -11,9 +11,254 @@ pinned: false
 short_description: Streamlit template space
 ---
-# Welcome to Streamlit!
-Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
-If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
-forums](https://discuss.streamlit.io).

 short_description: Streamlit template space
 ---
+# 🤖 AI Resume Screener
+An advanced Streamlit application that automatically ranks candidate resumes against job descriptions using a sophisticated multi-stage AI pipeline.
+## 🚀 Features
+### Multi-Stage AI Pipeline
+1. **FAISS Recall**: Semantic similarity search using BGE embeddings (top 50 candidates)
+2. **Cross-Encoder Reranking**: Deep semantic matching using MS-Marco model (top 20 candidates)
+3. **BM25 Scoring**: Traditional keyword-based relevance scoring
+4. **Intent Analysis**: AI-powered candidate interest assessment using Qwen LLM
+5. **Final Ranking**: Weighted combination of all scores
+### Advanced AI Models
+- **Embedding Model**: BAAI/bge-large-en-v1.5 for semantic understanding
+- **Cross-Encoder**: cross-encoder/ms-marco-MiniLM-L6-v2 for precise ranking
+- **LLM**: Qwen2-1.5B with 4-bit quantization for intent analysis
+### Multiple Input Methods
+- **File Upload**: PDF, DOCX, TXT files
+- **CSV Upload**: Bulk resume processing
+- **Hugging Face Datasets**: Direct integration with HF datasets
+### Comprehensive Analysis
+- **Skills Extraction**: Technical skills and job-specific keywords
+- **Score Breakdown**: Detailed analysis of each scoring component
+- **Interactive Visualizations**: Charts and metrics for insights
+- **Export Capabilities**: Download results as CSV
+## 📋 Requirements
+### System Requirements
+- Python 3.8+
+- CUDA-compatible GPU (recommended for optimal performance)
+- 8GB+ RAM (16GB+ recommended)
+- 10GB+ disk space for models
+### Dependencies
+All dependencies are listed in `requirements.txt`:
+- streamlit
+- sentence-transformers
+- transformers
+- torch
+- faiss-cpu
+- rank-bm25
+- nltk
+- pdfplumber
+- PyPDF2
+- python-docx
+- datasets
+- plotly
+- pandas
+- numpy
+## 🛠️ Installation
+1. **Clone the repository**:
+```bash
+git clone <repository-url>
+cd resumescreener_v2
+```
+2. **Install dependencies**:
+```bash
+pip install -r requirements.txt
+```
+3. **Run the application**:
+```bash
+streamlit run src/streamlit_app.py
+```
+## 📖 Usage Guide
+### Step 1: Model Loading
+- Models are automatically loaded when the app starts
+- First run may take 5-10 minutes to download models
+- Check the sidebar for model loading status
+### Step 2: Job Description
+- Enter the complete job description in the text area
+- Include requirements, responsibilities, and desired skills
+- More detailed descriptions yield better matching results
+### Step 3: Load Resumes
+Choose from three options:
+#### Option A: File Upload
+- Upload PDF, DOCX, or TXT files
+- Supports multiple file selection
+- Automatic text extraction
+#### Option B: CSV Upload
+- Upload CSV with resume texts
+- Select text and name columns
+- Bulk processing capability
+#### Option C: Hugging Face Dataset
+- Load from public datasets
+- Specify dataset name and columns
+- Limited to 100 resumes for performance
+### Step 4: Run Pipeline
+- Click "Run Advanced Ranking Pipeline"
+- Monitor progress through 5 stages
+- Results appear in three tabs
+### Step 5: Analyze Results
+#### Summary Tab
+- Top-ranked candidates table
+- Key metrics and scores
+- CSV download option
+#### Detailed Analysis Tab
+- Individual candidate breakdowns
+- Score components explanation
+- Skills and keywords analysis
+- Resume excerpts
+#### Visualizations Tab
+- Score distribution charts
+- Comparative analysis
+- Intent distribution
+- Average metrics
+## 🧮 Scoring Formula
+**Final Score = 0.5 × Cross-Encoder + 0.3 × BM25 + 0.2 × Intent**
+### Score Components
+1. **Cross-Encoder Score (50%)**
+   - Deep semantic matching between job and resume
+   - Considers context and meaning
+   - Range: 0-1 (normalized)
+2. **BM25 Score (30%)**
+   - Traditional keyword-based relevance
+   - Term frequency and document frequency
+   - Range: 0-1 (normalized)
+3. **Intent Score (20%)**
+   - AI-assessed candidate interest level
+   - Based on experience-job alignment
+   - Categories: Yes (0.9), Maybe (0.5), No (0.1)
+## 🎯 Best Practices
+### For Optimal Results
+1. **Detailed Job Descriptions**: Include specific requirements, technologies, and responsibilities
+2. **Quality Resume Data**: Ensure resumes contain relevant information
+3. **Appropriate Batch Size**: Process 20-100 resumes for best performance
+4. **Clear Requirements**: Specify must-have vs. nice-to-have skills
+### Performance Tips
+1. **GPU Usage**: Enable CUDA for faster processing
+2. **Memory Management**: Use cleanup controls for large batches
+3. **Model Caching**: Models are cached after first load
+4. **Batch Processing**: Process resumes in smaller batches if memory limited
+## 🔧 Configuration
+### Model Configuration
+Models can be customized by modifying the `load_models()` function:
+- Change model names for different embeddings
+- Adjust quantization settings
+- Modify device mapping
+### Scoring Weights
+Adjust weights in `calculate_final_scores()`:
+```python
+final_scores = 0.5 * ce_scores + 0.3 * bm25_scores + 0.2 * intent_scores
+```
+### Skills List
+Customize the predefined skills list in the `ResumeScreener` class:
+```python
+self.skills_list = [
+    'python', 'java', 'javascript',
+    # Add your specific skills
+]
+```
+## 🐛 Troubleshooting
+### Common Issues
+1. **Model Loading Errors**
+   - Check internet connection for model downloads
+   - Ensure sufficient disk space
+   - Verify CUDA compatibility
+2. **Memory Issues**
+   - Reduce batch size
+   - Use CPU-only mode
+   - Clear cache between runs
+3. **File Processing Errors**
+   - Check file formats (PDF, DOCX, TXT)
+   - Ensure files are not corrupted
+   - Verify text extraction quality
+4. **Performance Issues**
+   - Enable GPU acceleration
+   - Process smaller batches
+   - Use model quantization
+### Error Messages
+- **"Models not loaded"**: Wait for model loading to complete
+- **"ML libraries not available"**: Install missing dependencies
+- **"CUDA out of memory"**: Reduce batch size or use CPU
+## 📊 Sample Data
+Use the included `sample_resumes.csv` for testing:
+- 5 sample resumes with different roles
+- Realistic job experience and skills
+- Good for testing all features
+## 🤝 Contributing
+1. Fork the repository
+2. Create a feature branch
+3. Make your changes
+4. Add tests if applicable
+5. Submit a pull request
+## 📄 License
+This project is licensed under the MIT License - see the LICENSE file for details.
+## 🙏 Acknowledgments
+- **BAAI** for the BGE embedding model
+- **Microsoft** for the MS-Marco cross-encoder
+- **Alibaba** for the Qwen language model
+- **Streamlit** for the web framework
+- **Hugging Face** for model hosting and transformers library
+## 📞 Support
+For issues and questions:
+1. Check the troubleshooting section
+2. Review error messages in the sidebar
+3. Open an issue on GitHub
+4. Check model compatibility
+---
+**Built with ❤️ using Streamlit and state-of-the-art AI models**

config.py ADDED Viewed

	@@ -0,0 +1,182 @@

+"""
+Configuration file for AI Resume Screener
+Modify these settings to customize the application behavior
+"""
+# Model Configuration
+MODELS = {
+    "embedding_model": "BAAI/bge-large-en-v1.5",
+    "cross_encoder": "cross-encoder/ms-marco-MiniLM-L6-v2",
+    "llm_model": "Qwen/Qwen2-1.5B",  # Using smaller model for compatibility
+}
+# Pipeline Configuration
+PIPELINE_CONFIG = {
+    "faiss_recall_top_k": 50,
+    "cross_encoder_top_k": 20,
+    "max_text_length": 8000,
+    "embedding_dimension": 1024,
+}
+# Scoring Weights (must sum to 1.0)
+SCORING_WEIGHTS = {
+    "cross_encoder": 0.5,
+    "bm25": 0.3,
+    "intent": 0.2,
+}
+# Intent Analysis Configuration
+INTENT_CONFIG = {
+    "max_prompt_length": 1024,
+    "max_new_tokens": 10,
+    "temperature": 0.1,
+    "intent_scores": {
+        "yes": 0.9,
+        "maybe": 0.5,
+        "no": 0.1,
+    }
+}
+# File Processing Configuration
+FILE_CONFIG = {
+    "supported_formats": ["pdf", "docx", "txt", "csv"],
+    "max_file_size_mb": 10,
+    "max_files_per_upload": 50,
+}
+# UI Configuration
+UI_CONFIG = {
+    "page_title": "🤖 AI Resume Screener",
+    "page_icon": "🤖",
+    "layout": "wide",
+    "sidebar_state": "expanded",
+    "max_display_resumes": 100,
+}
+# Performance Configuration
+PERFORMANCE_CONFIG = {
+    "use_gpu": True,
+    "quantization": True,
+    "batch_size": 32,
+    "cache_models": True,
+}
+# Skills Database
+TECHNICAL_SKILLS = [
+    # Programming Languages
+    'python', 'java', 'javascript', 'typescript', 'c++', 'c#', 'go', 'rust',
+    'scala', 'r', 'matlab', 'php', 'ruby', 'swift', 'kotlin', 'dart',
+    # Web Technologies
+    'html', 'css', 'react', 'angular', 'vue', 'node.js', 'express', 'django',
+    'flask', 'fastapi', 'spring', 'laravel', 'bootstrap', 'tailwind',
+    # Databases
+    'sql', 'mongodb', 'postgresql', 'mysql', 'redis', 'elasticsearch',
+    'cassandra', 'dynamodb', 'sqlite', 'oracle',
+    # Cloud & DevOps
+    'aws', 'azure', 'gcp', 'docker', 'kubernetes', 'terraform', 'ansible',
+    'jenkins', 'gitlab', 'github', 'ci/cd', 'devops', 'microservices',
+    # Data Science & ML
+    'machine learning', 'deep learning', 'tensorflow', 'pytorch', 'keras',
+    'scikit-learn', 'pandas', 'numpy', 'matplotlib', 'plotly', 'seaborn',
+    'jupyter', 'spark', 'hadoop', 'kafka', 'airflow',
+    # Analytics & BI
+    'tableau', 'powerbi', 'excel', 'google analytics', 'mixpanel', 'amplitude',
+    'looker', 'qlik', 'sas', 'spss', 'stata',
+    # Operating Systems & Tools
+    'linux', 'ubuntu', 'centos', 'windows', 'macos', 'bash', 'powershell',
+    'git', 'vim', 'vscode', 'intellij', 'eclipse',
+    # Methodologies
+    'agile', 'scrum', 'kanban', 'lean', 'waterfall', 'tdd', 'bdd',
+    # Networking & Security
+    'tcp/ip', 'http', 'https', 'ssl', 'oauth', 'jwt', 'api', 'rest', 'graphql',
+    'nginx', 'apache', 'load balancing', 'vpn', 'firewall',
+]
+# Job Categories for Enhanced Matching
+JOB_CATEGORIES = {
+    "software_engineer": [
+        "programming", "coding", "development", "software", "engineer", "developer"
+    ],
+    "data_scientist": [
+        "data", "analytics", "machine learning", "statistics", "modeling", "scientist"
+    ],
+    "devops_engineer": [
+        "devops", "infrastructure", "deployment", "automation", "cloud", "operations"
+    ],
+    "product_manager": [
+        "product", "manager", "strategy", "roadmap", "requirements", "stakeholder"
+    ],
+    "designer": [
+        "design", "ui", "ux", "user experience", "interface", "visual", "creative"
+    ],
+    "marketing": [
+        "marketing", "campaign", "brand", "social media", "content", "seo", "sem"
+    ],
+    "sales": [
+        "sales", "business development", "account", "revenue", "client", "customer"
+    ]
+}
+# Default Job Description Template
+DEFAULT_JOB_DESCRIPTION = """
+Software Engineer - Full Stack Development
+We are looking for a talented Software Engineer to join our growing team.
+Requirements:
+- 3+ years of experience in software development
+- Proficiency in Python, JavaScript, and SQL
+- Experience with React and Node.js
+- Knowledge of cloud platforms (AWS, Azure, or GCP)
+- Familiarity with Docker and CI/CD pipelines
+- Strong problem-solving and communication skills
+Responsibilities:
+- Develop and maintain web applications
+- Collaborate with cross-functional teams
+- Write clean, maintainable code
+- Participate in code reviews
+- Contribute to technical architecture decisions
+Nice to have:
+- Experience with machine learning
+- Knowledge of microservices architecture
+- DevOps experience
+- Open source contributions
+"""
+# Error Messages
+ERROR_MESSAGES = {
+    "models_not_loaded": "❌ AI models are still loading. Please wait...",
+    "no_job_description": "❌ Please enter a job description",
+    "no_resumes": "❌ Please load some resumes first",
+    "file_processing_error": "❌ Error processing file: {filename}",
+    "model_loading_error": "❌ Error loading model: {model_name}",
+    "pipeline_error": "❌ Error in pipeline stage: {stage}",
+}
+# Success Messages
+SUCCESS_MESSAGES = {
+    "models_loaded": "✅ All AI models loaded successfully!",
+    "files_processed": "✅ Processed {count} resume files",
+    "pipeline_complete": "✅ Resume screening pipeline completed!",
+    "results_exported": "✅ Results exported successfully",
+}
+# Validation Rules
+VALIDATION_RULES = {
+    "min_job_description_length": 50,
+    "max_job_description_length": 10000,
+    "min_resume_length": 20,
+    "max_resume_length": 20000,
+    "min_resumes_for_ranking": 1,
+    "max_resumes_for_ranking": 1000,
+}

requirements.txt CHANGED Viewed

@@ -1,3 +1,17 @@
-altair
 pandas
-streamlit

+streamlit
 pandas
+numpy
+sentence-transformers
+transformers
+torch
+accelerate
+bitsandbytes
+faiss-cpu
+rank-bm25
+nltk
+pdfplumber
+PyPDF2
+python-docx
+datasets
+plotly
+altair

sample_resumes.csv ADDED Viewed

	@@ -0,0 +1,143 @@

+name,resume_text
+John Smith,"John Smith
+Software Engineer
+Email: [email protected]
+Phone: (555) 123-4567
+EXPERIENCE
+Senior Software Engineer | TechCorp | 2020-2023
+- Developed scalable web applications using Python, Django, and React
+- Led a team of 5 developers in building microservices architecture
+- Implemented CI/CD pipelines using Jenkins and Docker
+- Worked with AWS services including EC2, S3, and RDS
+Software Developer | StartupXYZ | 2018-2020
+- Built REST APIs using Flask and PostgreSQL
+- Developed frontend components using JavaScript and Vue.js
+- Collaborated with cross-functional teams using Agile methodology
+EDUCATION
+Bachelor of Science in Computer Science | University of Technology | 2018
+SKILLS
+Programming: Python, JavaScript, Java, SQL
+Frameworks: Django, Flask, React, Vue.js
+Databases: PostgreSQL, MySQL, MongoDB
+Cloud: AWS, Docker, Kubernetes
+Tools: Git, Jenkins, JIRA"
+Sarah Johnson,"Sarah Johnson
+Data Scientist
+Email: [email protected]
+Phone: (555) 987-6543
+EXPERIENCE
+Senior Data Scientist | DataTech Solutions | 2021-2023
+- Developed machine learning models using Python, scikit-learn, and TensorFlow
+- Built predictive analytics solutions for customer behavior analysis
+- Created data pipelines using Apache Spark and Kafka
+- Deployed models to production using MLOps practices
+Data Analyst | Analytics Inc | 2019-2021
+- Performed statistical analysis using R and Python
+- Created interactive dashboards using Tableau and PowerBI
+- Worked with large datasets using SQL and Pandas
+- Collaborated with business stakeholders to define KPIs
+EDUCATION
+Master of Science in Data Science | Data University | 2019
+Bachelor of Science in Statistics | Math College | 2017
+SKILLS
+Programming: Python, R, SQL, Scala
+ML/AI: scikit-learn, TensorFlow, PyTorch, Keras
+Big Data: Spark, Hadoop, Kafka
+Visualization: Tableau, PowerBI, Matplotlib, Plotly
+Statistics: Hypothesis testing, A/B testing, Regression analysis"
+Mike Chen,"Mike Chen
+DevOps Engineer
+Email: [email protected]
+Phone: (555) 456-7890
+EXPERIENCE
+DevOps Engineer | CloudFirst | 2020-2023
+- Managed AWS infrastructure using Terraform and CloudFormation
+- Implemented monitoring and alerting using Prometheus and Grafana
+- Automated deployment processes using Jenkins and GitLab CI
+- Maintained Kubernetes clusters and Docker containers
+System Administrator | TechServices | 2018-2020
+- Administered Linux servers and network infrastructure
+- Implemented backup and disaster recovery solutions
+- Managed database systems including MySQL and PostgreSQL
+- Provided technical support and troubleshooting
+EDUCATION
+Bachelor of Science in Information Technology | Tech Institute | 2018
+SKILLS
+Cloud Platforms: AWS, Azure, GCP
+Infrastructure: Terraform, CloudFormation, Ansible
+Containers: Docker, Kubernetes, OpenShift
+Monitoring: Prometheus, Grafana, ELK Stack
+Operating Systems: Linux, Ubuntu, CentOS
+Scripting: Bash, Python, PowerShell"
+Lisa Wang,"Lisa Wang
+Frontend Developer
+Email: [email protected]
+Phone: (555) 321-0987
+EXPERIENCE
+Senior Frontend Developer | WebSolutions | 2021-2023
+- Developed responsive web applications using React and TypeScript
+- Implemented modern CSS frameworks including Tailwind and Bootstrap
+- Optimized application performance and user experience
+- Collaborated with UX/UI designers and backend developers
+Frontend Developer | DigitalAgency | 2019-2021
+- Built interactive user interfaces using Angular and JavaScript
+- Created mobile-responsive designs using HTML5 and CSS3
+- Integrated frontend applications with REST APIs
+- Participated in code reviews and agile development processes
+EDUCATION
+Bachelor of Arts in Web Design | Design College | 2019
+SKILLS
+Languages: JavaScript, TypeScript, HTML5, CSS3
+Frameworks: React, Angular, Vue.js
+Styling: Tailwind CSS, Bootstrap, Sass, Less
+Tools: Webpack, Vite, npm, yarn
+Version Control: Git, GitHub, GitLab
+Testing: Jest, Cypress, React Testing Library"
+Robert Brown,"Robert Brown
+Product Manager
+Email: [email protected]
+Phone: (555) 654-3210
+EXPERIENCE
+Senior Product Manager | InnovateTech | 2020-2023
+- Led product strategy and roadmap for B2B SaaS platform
+- Managed cross-functional teams of 15+ engineers and designers
+- Conducted market research and competitive analysis
+- Defined product requirements and user stories using Agile methodology
+Product Manager | StartupHub | 2018-2020
+- Launched 3 new product features resulting in 25% user growth
+- Collaborated with engineering teams to prioritize development tasks
+- Analyzed user feedback and metrics to drive product decisions
+- Coordinated go-to-market strategies with marketing and sales teams
+EDUCATION
+MBA in Business Administration | Business School | 2018
+Bachelor of Science in Engineering | Engineering University | 2016
+SKILLS
+Product Management: Roadmapping, User Research, A/B Testing
+Analytics: Google Analytics, Mixpanel, Amplitude
+Project Management: JIRA, Asana, Trello
+Methodologies: Agile, Scrum, Lean Startup
+Communication: Stakeholder Management, Presentation Skills"

src/streamlit_app.py CHANGED Viewed

@@ -1,40 +1,732 @@
-import altair as alt
-import numpy as np
-import pandas as pd
 import streamlit as st
-"""
-# Welcome to Streamlit!
-Edit `/streamlit_app.py` to customize this app to your heart's desire :heart:.
-If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
-forums](https://discuss.streamlit.io).
-In the meantime, below is an example of what you can do with just a few lines of code:
-"""
-num_points = st.slider("Number of points in spiral", 1, 10000, 1100)
-num_turns = st.slider("Number of turns in spiral", 1, 300, 31)
-indices = np.linspace(0, 1, num_points)
-theta = 2 * np.pi * num_turns * indices
-radius = indices
-x = radius * np.cos(theta)
-y = radius * np.sin(theta)
-df = pd.DataFrame({
-    "x": x,
-    "y": y,
-    "idx": indices,
-    "rand": np.random.randn(num_points),
-})
-st.altair_chart(alt.Chart(df, height=700, width=700)
-    .mark_point(filled=True)
-    .encode(
-        x=alt.X("x", axis=None),
-        y=alt.Y("y", axis=None),
-        color=alt.Color("idx", legend=None, scale=alt.Scale()),
-        size=alt.Size("rand", legend=None, scale=alt.Scale(range=[1, 150])),
-    ))

 import streamlit as st
+import pandas as pd
+import numpy as np
+import plotly.express as px
+import plotly.graph_objects as go
+from io import BytesIO
+import base64
+import os
+import re
+import warnings
+warnings.filterwarnings("ignore")
+# ML/NLP imports
+try:
+    from sentence_transformers import SentenceTransformer, CrossEncoder
+    from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
+    import torch
+    import faiss
+    from rank_bm25 import BM25Okapi
+    import nltk
+    from nltk.tokenize import word_tokenize
+    import pdfplumber
+    import PyPDF2
+    from docx import Document
+    from datasets import load_dataset
+    ML_IMPORTS_AVAILABLE = True
+except ImportError as e:
+    st.error(f"Missing required ML libraries: {e}")
+    ML_IMPORTS_AVAILABLE = False
+# Download NLTK data
+try:
+    nltk.download('punkt', quiet=True)
+    nltk.download('stopwords', quiet=True)
+except:
+    pass
+# Page configuration
+st.set_page_config(
+    page_title="🤖 AI Resume Screener",
+    page_icon="🤖",
+    layout="wide",
+    initial_sidebar_state="expanded"
+)
+# Initialize session state
+if 'models_loaded' not in st.session_state:
+    st.session_state.models_loaded = False
+if 'embedding_model' not in st.session_state:
+    st.session_state.embedding_model = None
+if 'cross_encoder' not in st.session_state:
+    st.session_state.cross_encoder = None
+if 'llm_tokenizer' not in st.session_state:
+    st.session_state.llm_tokenizer = None
+if 'llm_model' not in st.session_state:
+    st.session_state.llm_model = None
+if 'model_errors' not in st.session_state:
+    st.session_state.model_errors = {}
+if 'resume_texts' not in st.session_state:
+    st.session_state.resume_texts = []
+if 'resume_filenames' not in st.session_state:
+    st.session_state.resume_filenames = []
+if 'results' not in st.session_state:
+    st.session_state.results = None
+def load_models():
+    """Load all ML models at startup"""
+    if st.session_state.models_loaded:
+        return
+    st.info("🔄 Loading AI models... This may take a few minutes on first run.")
+    # Load embedding model
+    try:
+        print("Loading embedding model: BAAI/bge-large-en-v1.5")
+        st.text("Loading embedding model...")
+        try:
+            st.session_state.embedding_model = SentenceTransformer(
+                'BAAI/bge-large-en-v1.5',
+                device_map="auto"
+            )
+        except Exception as e:
+            print(f"Device map failed, falling back to default: {e}")
+            st.session_state.embedding_model = SentenceTransformer('BAAI/bge-large-en-v1.5')
+        print("✅ Embedding model loaded successfully")
+    except Exception as e:
+        print(f"❌ Error loading embedding model: {e}")
+        st.session_state.model_errors['embedding'] = str(e)
+    # Load cross-encoder
+    try:
+        print("Loading cross-encoder: cross-encoder/ms-marco-MiniLM-L6-v2")
+        st.text("Loading cross-encoder...")
+        st.session_state.cross_encoder = CrossEncoder('cross-encoder/ms-marco-MiniLM-L6-v2')
+        print("✅ Cross-encoder loaded successfully")
+    except Exception as e:
+        print(f"❌ Error loading cross-encoder: {e}")
+        st.session_state.model_errors['cross_encoder'] = str(e)
+    # Load LLM for intent analysis
+    try:
+        print("Loading LLM: Qwen/Qwen2-1.5B")  # Using smaller model for better compatibility
+        st.text("Loading LLM for intent analysis...")
+        # Quantization config
+        bnb_config = BitsAndBytesConfig(
+            load_in_4bit=True,
+            bnb_4bit_use_double_quant=True,
+            bnb_4bit_quant_type="nf4",
+            bnb_4bit_compute_dtype=torch.bfloat16
+        )
+        st.session_state.llm_tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-1.5B")
+        st.session_state.llm_model = AutoModelForCausalLM.from_pretrained(
+            "Qwen/Qwen2-1.5B",
+            quantization_config=bnb_config,
+            device_map="auto",
+            trust_remote_code=True
+        )
+        print("✅ LLM loaded successfully")
+    except Exception as e:
+        print(f"❌ Error loading LLM: {e}")
+        st.session_state.model_errors['llm'] = str(e)
+    st.session_state.models_loaded = True
+    st.success("✅ All models loaded successfully!")
+class ResumeScreener:
+    def __init__(self):
+        self.embedding_model = st.session_state.embedding_model
+        self.cross_encoder = st.session_state.cross_encoder
+        self.llm_tokenizer = st.session_state.llm_tokenizer
+        self.llm_model = st.session_state.llm_model
+        # Predefined skills list
+        self.skills_list = [
+            'python', 'java', 'javascript', 'react', 'angular', 'vue', 'node.js',
+            'sql', 'mongodb', 'postgresql', 'mysql', 'aws', 'azure', 'gcp',
+            'docker', 'kubernetes', 'git', 'machine learning', 'deep learning',
+            'tensorflow', 'pytorch', 'scikit-learn', 'pandas', 'numpy',
+            'html', 'css', 'bootstrap', 'tailwind', 'api', 'rest', 'graphql',
+            'microservices', 'agile', 'scrum', 'devops', 'ci/cd', 'jenkins',
+            'linux', 'bash', 'shell scripting', 'data analysis', 'statistics',
+            'excel', 'powerbi', 'tableau', 'spark', 'hadoop', 'kafka',
+            'redis', 'elasticsearch', 'nginx', 'apache', 'django', 'flask',
+            'spring', 'express', 'fastapi', 'laravel', 'php', 'c++', 'c#',
+            'go', 'rust', 'scala', 'r', 'matlab', 'sas', 'spss'
+        ]
+    def extract_text_from_file(self, file):
+        """Extract text from uploaded files"""
+        try:
+            if file.type == "application/pdf":
+                # Try pdfplumber first
+                try:
+                    with pdfplumber.open(file) as pdf:
+                        text = ""
+                        for page in pdf.pages:
+                            text += page.extract_text() or ""
+                    return text
+                except:
+                    # Fallback to PyPDF2
+                    file.seek(0)
+                    reader = PyPDF2.PdfReader(file)
+                    text = ""
+                    for page in reader.pages:
+                        text += page.extract_text()
+                    return text
+            elif file.type == "application/vnd.openxmlformats-officedocument.wordprocessingml.document":
+                doc = Document(file)
+                text = ""
+                for paragraph in doc.paragraphs:
+                    text += paragraph.text + "\n"
+                return text
+            elif file.type == "text/plain":
+                return str(file.read(), "utf-8")
+            elif file.type == "text/csv":
+                df = pd.read_csv(file)
+                return df.to_string()
+            else:
+                return "Unsupported file type"
+        except Exception as e:
+            st.warning(f"Error extracting text from {file.name}: {str(e)}")
+            return ""
+    def get_embedding(self, text):
+        """Get embedding for text"""
+        if not self.embedding_model:
+            return None
+        if not text or len(text.strip()) == 0:
+            return np.zeros(1024)  # Default embedding size for BGE
+        # Truncate if too long
+        if len(text) > 8000:
+            text = text[:8000]
+        try:
+            embedding = self.embedding_model.encode(text, normalize_embeddings=True)
+            return embedding
+        except Exception as e:
+            st.warning(f"Error getting embedding: {e}")
+            return np.zeros(1024)
+    def calculate_bm25_scores(self, resume_texts, job_description):
+        """Calculate BM25 scores"""
+        try:
+            # Tokenize documents
+            tokenized_resumes = [word_tokenize(text.lower()) for text in resume_texts]
+            tokenized_job = word_tokenize(job_description.lower())
+            # Create BM25 object
+            bm25 = BM25Okapi(tokenized_resumes)
+            # Get scores
+            scores = bm25.get_scores(tokenized_job)
+            return scores
+        except Exception as e:
+            st.warning(f"Error calculating BM25 scores: {e}")
+            return np.zeros(len(resume_texts))
+    def faiss_recall(self, resume_texts, job_description, top_k=50):
+        """FAISS-based recall for top candidates"""
+        try:
+            if not self.embedding_model:
+                return list(range(min(top_k, len(resume_texts))))
+            # Get embeddings
+            resume_embeddings = np.array([self.get_embedding(text) for text in resume_texts])
+            job_embedding = self.get_embedding(job_description).reshape(1, -1)
+            # Build FAISS index
+            dimension = resume_embeddings.shape[1]
+            index = faiss.IndexFlatIP(dimension)  # Inner product for cosine similarity
+            index.add(resume_embeddings.astype('float32'))
+            # Search
+            scores, indices = index.search(job_embedding.astype('float32'), min(top_k, len(resume_texts)))
+            return indices[0].tolist()
+        except Exception as e:
+            st.warning(f"Error in FAISS recall: {e}")
+            return list(range(min(top_k, len(resume_texts))))
+    def cross_encoder_rerank(self, resume_texts, job_description, candidate_indices, top_k=20):
+        """Re-rank candidates using cross-encoder"""
+        try:
+            if not self.cross_encoder:
+                return candidate_indices[:top_k]
+            # Prepare pairs for cross-encoder
+            pairs = [(job_description, resume_texts[i]) for i in candidate_indices]
+            # Get scores
+            scores = self.cross_encoder.predict(pairs)
+            # Sort by scores and return top_k
+            scored_indices = list(zip(candidate_indices, scores))
+            scored_indices.sort(key=lambda x: x[1], reverse=True)
+            return [idx for idx, _ in scored_indices[:top_k]]
+        except Exception as e:
+            st.warning(f"Error in cross-encoder reranking: {e}")
+            return candidate_indices[:top_k]
+    def analyze_intent(self, resume_text, job_description):
+        """Analyze candidate intent using LLM"""
+        try:
+            if not self.llm_model or not self.llm_tokenizer:
+                return "Maybe", 0.5
+            prompt = f"""Analyze if this candidate is genuinely interested in this job based on their resume.
+Job Description: {job_description[:500]}...
+Resume: {resume_text[:1000]}...
+Based on the alignment between the candidate's experience and the job requirements, classify their intent as:
+- Yes: Strong alignment and genuine interest
+- Maybe: Some alignment but unclear intent
+- No: Poor alignment or likely not interested
+Intent:"""
+            inputs = self.llm_tokenizer(prompt, return_tensors="pt", truncation=True, max_length=1024)
+            with torch.no_grad():
+                outputs = self.llm_model.generate(
+                    **inputs,
+                    max_new_tokens=10,
+                    temperature=0.1,
+                    do_sample=True,
+                    pad_token_id=self.llm_tokenizer.eos_token_id
+                )
+            response = self.llm_tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
+            # Parse response
+            if "yes" in response.lower():
+                return "Yes", 0.9
+            elif "no" in response.lower():
+                return "No", 0.1
+            else:
+                return "Maybe", 0.5
+        except Exception as e:
+            st.warning(f"Error in intent analysis: {e}")
+            return "Maybe", 0.5
+    def extract_skills(self, text, job_description):
+        """Extract matching skills from resume"""
+        text_lower = text.lower()
+        job_lower = job_description.lower()
+        # Find skills from predefined list
+        found_skills = []
+        for skill in self.skills_list:
+            if skill in text_lower:
+                found_skills.append(skill)
+        # Extract job-specific keywords (simple approach)
+        job_words = set(re.findall(r'\b[a-zA-Z]{3,}\b', job_lower))
+        text_words = set(re.findall(r'\b[a-zA-Z]{3,}\b', text_lower))
+        job_specific = list(job_words.intersection(text_words))[:10]  # Top 10
+        return {
+            'technical_skills': found_skills,
+            'job_specific_keywords': job_specific,
+            'total_skills': len(found_skills) + len(job_specific)
+        }
+    def add_bm25_scores(self, results_df, resume_texts, job_description):
+        """Add BM25 scores to results"""
+        bm25_scores = self.calculate_bm25_scores(resume_texts, job_description)
+        results_df['bm25_score'] = bm25_scores
+        return results_df
+    def add_intent_scores(self, results_df, resume_texts, job_description):
+        """Add intent analysis scores"""
+        intent_labels = []
+        intent_scores = []
+        progress_bar = st.progress(0)
+        for i, text in enumerate(resume_texts):
+            label, score = self.analyze_intent(text, job_description)
+            intent_labels.append(label)
+            intent_scores.append(score)
+            progress_bar.progress((i + 1) / len(resume_texts))
+        results_df['intent_label'] = intent_labels
+        results_df['intent_score'] = intent_scores
+        return results_df
+    def calculate_final_scores(self, results_df):
+        """Calculate final weighted scores"""
+        # Normalize scores to 0-1 range
+        if 'cross_encoder_score' in results_df.columns:
+            ce_scores = (results_df['cross_encoder_score'] - results_df['cross_encoder_score'].min()) / \
+                       (results_df['cross_encoder_score'].max() - results_df['cross_encoder_score'].min() + 1e-8)
+        else:
+            ce_scores = np.zeros(len(results_df))
+        if 'bm25_score' in results_df.columns:
+            bm25_scores = (results_df['bm25_score'] - results_df['bm25_score'].min()) / \
+                         (results_df['bm25_score'].max() - results_df['bm25_score'].min() + 1e-8)
+        else:
+            bm25_scores = np.zeros(len(results_df))
+        intent_scores = results_df.get('intent_score', np.ones(len(results_df)) * 0.5)
+        # Weighted combination
+        final_scores = 0.5 * ce_scores + 0.3 * bm25_scores + 0.2 * intent_scores
+        results_df['final_score'] = final_scores
+        return results_df.sort_values('final_score', ascending=False)
+    def advanced_pipeline_ranking(self, resume_texts, resume_filenames, job_description):
+        """Run the complete advanced pipeline"""
+        st.info("🚀 Starting advanced pipeline ranking...")
+        # Stage 1: FAISS Recall
+        st.text("Stage 1: FAISS-based recall (top 50 candidates)")
+        top_50_indices = self.faiss_recall(resume_texts, job_description, top_k=50)
+        # Stage 2: Cross-encoder reranking
+        st.text("Stage 2: Cross-encoder reranking (top 20 candidates)")
+        top_20_indices = self.cross_encoder_rerank(resume_texts, job_description, top_50_indices, top_k=20)
+        # Create results dataframe
+        results_df = pd.DataFrame({
+            'rank': range(1, len(top_20_indices) + 1),
+            'filename': [resume_filenames[i] for i in top_20_indices],
+            'resume_index': top_20_indices
+        })
+        # Stage 3: Add cross-encoder scores
+        st.text("Stage 3: Adding detailed cross-encoder scores")
+        if self.cross_encoder:
+            pairs = [(job_description, resume_texts[i]) for i in top_20_indices]
+            ce_scores = self.cross_encoder.predict(pairs)
+            results_df['cross_encoder_score'] = ce_scores
+        # Stage 4: Add BM25 scores
+        st.text("Stage 4: Adding BM25 scores")
+        top_20_texts = [resume_texts[i] for i in top_20_indices]
+        results_df = self.add_bm25_scores(results_df, top_20_texts, job_description)
+        # Stage 5: Add intent analysis
+        st.text("Stage 5: Analyzing candidate intent")
+        results_df = self.add_intent_scores(results_df, top_20_texts, job_description)
+        # Calculate final scores
+        st.text("Calculating final weighted scores...")
+        results_df = self.calculate_final_scores(results_df)
+        # Add skills analysis
+        st.text("Extracting skills and keywords...")
+        skills_data = []
+        for i in top_20_indices:
+            skills = self.extract_skills(resume_texts[i], job_description)
+            skills_data.append({
+                'top_skills': ', '.join(skills['technical_skills'][:5]),
+                'job_keywords': ', '.join(skills['job_specific_keywords'][:5]),
+                'total_skills_count': skills['total_skills']
+            })
+        skills_df = pd.DataFrame(skills_data)
+        results_df = pd.concat([results_df, skills_df], axis=1)
+        st.success("✅ Pipeline completed successfully!")
+        return results_df
+# Load models on startup
+if ML_IMPORTS_AVAILABLE and not st.session_state.models_loaded:
+    load_models()
+# Initialize screener
+if ML_IMPORTS_AVAILABLE and st.session_state.models_loaded:
+    screener = ResumeScreener()
+# Sidebar
+with st.sidebar:
+    st.title("🤖 AI Resume Screener")
+    st.markdown("---")
+    st.subheader("📋 Pipeline Stages")
+    st.markdown("""
+    1. **FAISS Recall**: Semantic similarity search (top 50)
+    2. **Cross-Encoder**: Deep reranking (top 20)
+    3. **BM25 Scoring**: Keyword-based relevance
+    4. **Intent Analysis**: AI-powered candidate intent
+    5. **Final Ranking**: Weighted score combination
+    """)
+    st.subheader("🧠 AI Models")
+    if st.session_state.models_loaded:
+        st.success("✅ Embedding: BGE-Large-EN")
+        st.success("✅ Cross-Encoder: MS-Marco-MiniLM")
+        st.success("✅ LLM: Qwen2-1.5B")
+    else:
+        st.warning("⏳ Models loading...")
+    if st.session_state.model_errors:
+        st.error("❌ Model Errors:")
+        for model, error in st.session_state.model_errors.items():
+            st.text(f"{model}: {error[:100]}...")
+    st.subheader("📊 Scoring Formula")
+    st.markdown("""
+    **Final Score = 0.5 × Cross-Encoder + 0.3 × BM25 + 0.2 × Intent**
+    - Cross-Encoder: Deep semantic matching
+    - BM25: Keyword relevance
+    - Intent: Candidate interest level
+    """)
+# Main content
+st.title("🤖 AI Resume Screener")
+st.markdown("Automatically rank candidate resumes against job descriptions using advanced AI")
+# Step 1: Job Description Input
+st.header("📝 Step 1: Job Description")
+job_description = st.text_area(
+    "Enter the job description:",
+    height=200,
+    placeholder="Paste the complete job description here..."
+)
+# Step 2: Resume Upload
+st.header("📄 Step 2: Load Resumes")
+upload_option = st.radio(
+    "Choose how to load resumes:",
+    ["Upload Files", "Upload CSV", "Load from Hugging Face Dataset"]
+)
+if upload_option == "Upload Files":
+    uploaded_files = st.file_uploader(
+        "Upload resume files",
+        type=['pdf', 'docx', 'txt'],
+        accept_multiple_files=True
+    )
+    if uploaded_files and st.button("Process Uploaded Files"):
+        with st.spinner("Processing files..."):
+            texts = []
+            filenames = []
+            for file in uploaded_files:
+                if ML_IMPORTS_AVAILABLE and st.session_state.models_loaded:
+                    text = screener.extract_text_from_file(file)
+                    if text:
+                        texts.append(text)
+                        filenames.append(file.name)
+                else:
+                    st.error("Models not loaded. Cannot process files.")
+                    break
+            st.session_state.resume_texts = texts
+            st.session_state.resume_filenames = filenames
+            st.success(f"✅ Processed {len(texts)} resumes")
+elif upload_option == "Upload CSV":
+    csv_file = st.file_uploader("Upload CSV with resume texts", type=['csv'])
+    if csv_file:
+        df = pd.read_csv(csv_file)
+        st.write("CSV Preview:", df.head())
+        text_column = st.selectbox("Select text column:", df.columns)
+        name_column = st.selectbox("Select name/ID column:", df.columns)
+        if st.button("Load from CSV"):
+            st.session_state.resume_texts = df[text_column].fillna("").tolist()
+            st.session_state.resume_filenames = df[name_column].fillna("Unknown").tolist()
+            st.success(f"✅ Loaded {len(st.session_state.resume_texts)} resumes from CSV")
+elif upload_option == "Load from Hugging Face Dataset":
+    dataset_name = st.text_input("Dataset name:", "resume-dataset/resume-screening")
+    if st.button("Load Dataset"):
+        try:
+            with st.spinner("Loading dataset..."):
+                dataset = load_dataset(dataset_name, split="train")
+                # Try to identify text and name columns
+                columns = dataset.column_names
+                text_col = st.selectbox("Select text column:", columns)
+                name_col = st.selectbox("Select name/ID column:", columns)
+                if text_col and name_col:
+                    st.session_state.resume_texts = dataset[text_col][:100]  # Limit to 100
+                    st.session_state.resume_filenames = [f"Resume_{i}" for i in range(len(st.session_state.resume_texts))]
+                    st.success(f"✅ Loaded {len(st.session_state.resume_texts)} resumes from dataset")
+        except Exception as e:
+            st.error(f"Error loading dataset: {e}")
+# Display current resume count
+if st.session_state.resume_texts:
+    st.info(f"📊 Currently loaded: {len(st.session_state.resume_texts)} resumes")
+# Step 3: Run Pipeline
+st.header("🚀 Step 3: Run Advanced Pipeline")
+can_run = (
+    ML_IMPORTS_AVAILABLE and
+    st.session_state.models_loaded and
+    job_description.strip() and
+    st.session_state.resume_texts
+)
+if st.button("🎯 Run Advanced Ranking Pipeline", disabled=not can_run):
+    if not can_run:
+        if not ML_IMPORTS_AVAILABLE:
+            st.error("❌ ML libraries not available")
+        elif not st.session_state.models_loaded:
+            st.error("❌ Models not loaded")
+        elif not job_description.strip():
+            st.error("❌ Please enter a job description")
+        elif not st.session_state.resume_texts:
+            st.error("❌ Please load some resumes")
+    else:
+        with st.spinner("Running advanced pipeline..."):
+            results = screener.advanced_pipeline_ranking(
+                st.session_state.resume_texts,
+                st.session_state.resume_filenames,
+                job_description
+            )
+            st.session_state.results = results
+# Display Results
+if st.session_state.results is not None:
+    st.header("📊 Results")
+    # Create tabs for different views
+    tab1, tab2, tab3 = st.tabs(["📋 Summary", "🔍 Detailed Analysis", "📈 Visualizations"])
+    with tab1:
+        st.subheader("Top Ranked Candidates")
+        # Style the dataframe
+        display_df = st.session_state.results[['rank', 'filename', 'final_score', 'cross_encoder_score',
+                                              'bm25_score', 'intent_score', 'intent_label', 'top_skills']].copy()
+        display_df['final_score'] = display_df['final_score'].round(3)
+        display_df['cross_encoder_score'] = display_df['cross_encoder_score'].round(3)
+        display_df['bm25_score'] = display_df['bm25_score'].round(3)
+        display_df['intent_score'] = display_df['intent_score'].round(3)
+        st.dataframe(display_df, use_container_width=True)
+        # Download link
+        csv = display_df.to_csv(index=False)
+        b64 = base64.b64encode(csv.encode()).decode()
+        href = f'<a href="data:file/csv;base64,{b64}" download="resume_rankings.csv">📥 Download Results as CSV</a>'
+        st.markdown(href, unsafe_allow_html=True)
+    with tab2:
+        st.subheader("Detailed Candidate Analysis")
+        for idx, row in st.session_state.results.iterrows():
+            with st.expander(f"#{row['rank']} - {row['filename']} (Score: {row['final_score']:.3f})"):
+                col1, col2 = st.columns(2)
+                with col1:
+                    st.metric("Final Score", f"{row['final_score']:.3f}")
+                    st.metric("Cross-Encoder", f"{row['cross_encoder_score']:.3f}")
+                    st.metric("BM25 Score", f"{row['bm25_score']:.3f}")
+                with col2:
+                    st.metric("Intent Score", f"{row['intent_score']:.3f}")
+                    st.metric("Intent Label", row['intent_label'])
+                    st.metric("Skills Count", row['total_skills_count'])
+                st.write("**Top Skills:**", row['top_skills'])
+                st.write("**Job Keywords:**", row['job_keywords'])
+                # Show resume excerpt
+                resume_text = st.session_state.resume_texts[row['resume_index']]
+                st.text_area("Resume Excerpt:", resume_text[:500] + "...", height=100, key=f"excerpt_{idx}")
+    with tab3:
+        st.subheader("Score Visualizations")
+        # Score distribution
+        fig1 = px.bar(
+            st.session_state.results.head(10),
+            x='filename',
+            y='final_score',
+            title="Top 10 Candidates - Final Scores",
+            color='final_score',
+            color_continuous_scale='viridis'
+        )
+        fig1.update_xaxis(tickangle=45)
+        st.plotly_chart(fig1, use_container_width=True)
+        # Score breakdown
+        score_cols = ['cross_encoder_score', 'bm25_score', 'intent_score']
+        fig2 = go.Figure()
+        for i, col in enumerate(score_cols):
+            fig2.add_trace(go.Bar(
+                name=col.replace('_', ' ').title(),
+                x=st.session_state.results['filename'].head(10),
+                y=st.session_state.results[col].head(10)
+            ))
+        fig2.update_layout(
+            title="Score Breakdown - Top 10 Candidates",
+            barmode='group',
+            xaxis_tickangle=45
+        )
+        st.plotly_chart(fig2, use_container_width=True)
+        # Intent distribution
+        intent_counts = st.session_state.results['intent_label'].value_counts()
+        fig3 = px.pie(
+            values=intent_counts.values,
+            names=intent_counts.index,
+            title="Candidate Intent Distribution"
+        )
+        st.plotly_chart(fig3, use_container_width=True)
+        # Average metrics
+        col1, col2, col3, col4 = st.columns(4)
+        with col1:
+            st.metric("Avg Final Score", f"{st.session_state.results['final_score'].mean():.3f}")
+        with col2:
+            st.metric("Avg Cross-Encoder", f"{st.session_state.results['cross_encoder_score'].mean():.3f}")
+        with col3:
+            st.metric("Avg BM25", f"{st.session_state.results['bm25_score'].mean():.3f}")
+        with col4:
+            st.metric("Avg Intent", f"{st.session_state.results['intent_score'].mean():.3f}")
+# Cleanup Controls
+st.header("🧹 Cleanup")
+col1, col2 = st.columns(2)
+with col1:
+    if st.button("Clear Resumes Only"):
+        st.session_state.resume_texts = []
+        st.session_state.resume_filenames = []
+        st.session_state.results = None
+        st.success("✅ Resumes cleared")
+with col2:
+    if st.button("Reset Entire App"):
+        # Clear all session state
+        for key in list(st.session_state.keys()):
+            del st.session_state[key]
+        # Free GPU memory
+        if torch.cuda.is_available():
+            torch.cuda.empty_cache()
+        st.success("✅ App reset complete")
+        st.experimental_rerun()
+# Footer
+st.markdown("---")
+st.markdown(
+    """
+    <div style='text-align: center; color: #666; font-size: 0.8em;'>
+    🤖 Powered by BGE-Large-EN, MS-Marco-MiniLM, Qwen2-1.5B | Built with Streamlit
+    </div>
+    """,
+    unsafe_allow_html=True
+)

test_installation.py ADDED Viewed

	@@ -0,0 +1,99 @@

+#!/usr/bin/env python3
+"""
+Test script to verify AI Resume Screener installation
+"""
+import sys
+import importlib
+def test_import(module_name, package_name=None):
+    """Test if a module can be imported"""
+    try:
+        importlib.import_module(module_name)
+        print(f"✅ {package_name or module_name}")
+        return True
+    except ImportError as e:
+        print(f"❌ {package_name or module_name}: {e}")
+        return False
+def main():
+    print("🧪 Testing AI Resume Screener Installation\n")
+    # Core dependencies
+    print("📦 Core Dependencies:")
+    core_deps = [
+        ("streamlit", "Streamlit"),
+        ("pandas", "Pandas"),
+        ("numpy", "NumPy"),
+        ("plotly", "Plotly"),
+    ]
+    core_success = all(test_import(module, name) for module, name in core_deps)
+    # ML/AI dependencies
+    print("\n🤖 ML/AI Dependencies:")
+    ml_deps = [
+        ("sentence_transformers", "Sentence Transformers"),
+        ("transformers", "Transformers"),
+        ("torch", "PyTorch"),
+        ("faiss", "FAISS"),
+        ("rank_bm25", "Rank BM25"),
+        ("nltk", "NLTK"),
+    ]
+    ml_success = all(test_import(module, name) for module, name in ml_deps)
+    # File processing dependencies
+    print("\n📄 File Processing Dependencies:")
+    file_deps = [
+        ("pdfplumber", "PDF Plumber"),
+        ("PyPDF2", "PyPDF2"),
+        ("docx", "python-docx"),
+        ("datasets", "Hugging Face Datasets"),
+    ]
+    file_success = all(test_import(module, name) for module, name in file_deps)
+    # Optional dependencies
+    print("\n⚡ Optional Dependencies:")
+    optional_deps = [
+        ("accelerate", "Accelerate"),
+        ("bitsandbytes", "BitsAndBytes"),
+    ]
+    for module, name in optional_deps:
+        test_import(module, name)
+    # Summary
+    print("\n" + "="*50)
+    if core_success and ml_success and file_success:
+        print("🎉 All required dependencies are installed!")
+        print("✅ Ready to run AI Resume Screener")
+        # Test basic functionality
+        print("\n🔧 Testing basic functionality...")
+        try:
+            import pandas as pd
+            import numpy as np
+            from sentence_transformers import SentenceTransformer
+            # Test data creation
+            test_df = pd.DataFrame({'test': [1, 2, 3]})
+            test_array = np.array([1, 2, 3])
+            print("✅ Pandas and NumPy working")
+            print("✅ Installation test completed successfully!")
+        except Exception as e:
+            print(f"❌ Basic functionality test failed: {e}")
+    else:
+        print("❌ Some required dependencies are missing")
+        print("📝 Please install missing packages using:")
+        print("   pip install -r requirements.txt")
+    print("\n🚀 To run the application:")
+    print("   streamlit run src/streamlit_app.py")
+if __name__ == "__main__":
+    main()