Spaces:

frkhan
/

hf-agent-course-final-assignment

Paused

frkhan commited on 15 days ago

Commit

347d82c

1 Parent(s): 4b5a0cc

Enhance project structure and configuration for LangChain agent evaluation

- Updated README.md to include project features, installation, and usage instructions.
- Refactored app.py to support model selection and question filtering.
- Introduced configuration files for model and MCP server settings.
- Added Dockerfile and docker-compose.yml for containerized deployment.
- Improved requirements.txt for clarity and consistency.
- Created .dockerignore to exclude unnecessary files from Docker context.

Files changed (9) hide show

.dockerignore +14 -0
Dockerfile +16 -0
README.md +92 -4
app.py +58 -8
configurations/app-config.json +21 -0
configurations/mcp-server-config.json +19 -0
docker-compose.yml +42 -0
langchain_agent.py +22 -44
requirements.txt +3 -9

.dockerignore ADDED Viewed

	@@ -0,0 +1,14 @@

+.git
+.gitignore
+.venv
+__pycache__
+*.pyc
+*.pyo
+*.pyd
+.idea
+.vscode
+.dockerignore
+Dockerfile
+Dockerfile.dev
+docker-compose.yml
+README.md

Dockerfile ADDED Viewed

	@@ -0,0 +1,16 @@

+FROM python:3.11-slim
+WORKDIR /app
+RUN apt-get update && apt-get install -y nodejs npm && rm -rf /var/lib/apt/lists/*
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+RUN npx playwright install chrome
+COPY . .
+EXPOSE 7860
+CMD ["python", "app.py"]

README.md CHANGED Viewed

@@ -3,13 +3,101 @@ title: Template Final Assignment
 emoji: 🕵🏻‍♂️
 colorFrom: indigo
 colorTo: indigo
-sdk: gradio
-sdk_version: 5.25.2
-app_file: app.py
 pinned: false
 hf_oauth: true
 # optional, default duration is 8 hours/480 minutes. Max duration is 30 days/43200 minutes.
 hf_oauth_expiration_minutes: 480
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 emoji: 🕵🏻‍♂️
 colorFrom: indigo
 colorTo: indigo
+# sdk: gradio
+# sdk_version: 5.25.2
+# app_file: app.py
+sdk: docker
+app_port: 7860
 pinned: false
 hf_oauth: true
 # optional, default duration is 8 hours/480 minutes. Max duration is 30 days/43200 minutes.
 hf_oauth_expiration_minutes: 480
 ---
+Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
+# Agent Course Final Assignment by Hugging Face
+This project contains a Gradio application for evaluating a LangChain agent based on the GAIA (General AI Assistant) benchmark. The agent is designed to answer questions using a variety of tools, and its performance is scored by an external API.
+## Features
+-   **Gradio Interface**: An easy-to-use web interface for running the evaluation and viewing the results.
+-   **LangChain Agent**: A sophisticated agent built with LangChain, capable of using tools to answer questions.
+-   **Multi-Tool Integration**: The agent can interact with multiple tools, such as a browser (via Playwright) and a YouTube transcript fetcher.
+-   **Docker Support**: The entire application can be built and run using Docker and Docker Compose, ensuring a consistent environment.
+-   **Observability**: Integrated with Langfuse for tracing and monitoring the agent's behavior.
+## Installation
+1.  **Clone the repository:**
+    ```bash
+    git clone https://huggingface.co/spaces/hf-agent-course/final-assignment-template
+    cd final-assignment-template
+    ```
+2.  **Create a virtual environment and install dependencies:**
+    ```bash
+    python -m venv .venv
+    source .venv/bin/activate  # On Windows, use `.venv\Scripts\activate`
+    pip install -r requirements.txt
+    ```
+3.  **Install Playwright browsers:**
+    ```bash
+    npx playwright install
+    ```
+4.  **Set up environment variables:**
+    Create a `.env` file in the root of the project and add the following variables:
+    ```
+    HF_TOKEN=<your-hugging-face-token>
+    GOOGLE_API_KEY=<your-google-api-key>
+    LANGFUSE_PUBLIC_KEY=<your-langfuse-public-key>
+    LANGFUSE_SECRET_KEY=<your-langfuse-secret-key>
+    ```
+## Usage
+To run the Gradio application locally, use the following command:
+```bash
+python app.py
+```
+This will start a local web server, and you can access the application in your browser at `http://127.0.0.1:7860`.
+## Docker
+This project includes `Dockerfile` and `docker-compose.yml` for running the application in a containerized environment.
+### Build and Run with Docker Compose
+To build and run the application using Docker Compose, use the following command:
+```bash
+docker-compose up --build
+```
+This will build the Docker image and start the application. You can access the Gradio interface at `http://localhost:7860`.
+### Development Environment
+A `Dockerfile.dev` is also provided for development purposes. To build and run the development environment, use the following command:
+```bash
+docker-compose -f docker-compose.yml -f docker-compose.dev.yml up --build
+```
+This will mount the local code into the container, allowing for live reloading of changes.
+## Contributing
+Contributions are welcome! Please feel free to submit a pull request or open an issue.

app.py CHANGED Viewed

@@ -7,6 +7,7 @@ import pandas as pd
 from langfuse import observe
 from langchain_agent import LangChainAgent
 from dotenv import load_dotenv
 load_dotenv()
@@ -14,8 +15,16 @@ load_dotenv()
 # --- Constants ---
 DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
 @observe()
-async def run_and_submit_all( profile: gr.OAuthProfile | None):
     """
     Fetches all questions, runs the BasicAgent on them, submits all answers,
     and displays the results.
@@ -42,7 +51,7 @@ async def run_and_submit_all( profile: gr.OAuthProfile | None):
         print(f"Error instantiating agent: {e}")
         return f"Error initializing agent: {e}", None
     # In the case of an app running as a hugging Face space, this link points toward your codebase ( usefull for others so please keep it public)
-    agent_code = f"https://huggingface.co/spaces/frkhan/hf-agent-course-final-assignment/tree/main"
     print(agent_code)
     # 2. Fetch Questions
@@ -66,11 +75,15 @@ async def run_and_submit_all( profile: gr.OAuthProfile | None):
         print(f"An unexpected error occurred fetching questions: {e}")
         return f"An unexpected error occurred fetching questions: {e}", None
     # 3. Run your Agent
     results_log = []
     answers_payload = []
     print(f"Running agent on {len(questions_data)} questions...")
-    for item in questions_data[:]:
         task_id = item.get("task_id")
         question_text = item.get("question")
         file_name = item.get("file_name")
@@ -99,7 +112,7 @@ async def run_and_submit_all( profile: gr.OAuthProfile | None):
                 print(f"Error reading file {file_name}: {e}")
         try:
-            submitted_answer = await agent(question_text)
             answers_payload.append({"task_id": task_id, "submitted_answer": submitted_answer})
             results_log.append({"Task ID": task_id, "Question": question_text, "Submitted Answer": submitted_answer})
         except Exception as e:
@@ -161,6 +174,27 @@ async def run_and_submit_all( profile: gr.OAuthProfile | None):
         results_df = pd.DataFrame(results_log)
         return status_message, results_df
 # --- Build Gradio Interface using Blocks ---
 with gr.Blocks() as demo:
@@ -171,7 +205,9 @@ with gr.Blocks() as demo:
         1.  Please clone this space, then modify the code to define your agent's logic, the tools, the necessary packages, etc ...
         2.  Log in to your Hugging Face account using the button below. This uses your HF username for submission.
-        3.  Click 'Run Evaluation & Submit All Answers' to fetch questions, run your agent, submit answers, and see the score.
         ---
         **Disclaimers:**
@@ -182,14 +218,28 @@ with gr.Blocks() as demo:
     gr.LoginButton()
-    run_button = gr.Button("Run Evaluation & Submit All Answers")
     status_output = gr.Textbox(label="Run Status / Submission Result", lines=5, interactive=False)
-    # Removed max_rows=10 from DataFrame constructor
     results_table = gr.DataFrame(label="Questions and Agent Answers", wrap=True)
     run_button.click(
         fn=run_and_submit_all,
         outputs=[status_output, results_table]
     )
@@ -215,4 +265,4 @@ if __name__ == "__main__":
     print("-"*(60 + len(" App Starting ")) + "\n")
     print("Launching Gradio Interface for Basic Agent Evaluation...")
-    demo.launch(debug=True, share=False, server_name="0.0.0.0")

 from langfuse import observe
 from langchain_agent import LangChainAgent
 from dotenv import load_dotenv
+import json
 load_dotenv()
 # --- Constants ---
 DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
+# --- Model Definitions ---
+def load_model_config():
+    with open('configurations/app-config.json', 'r') as f:
+        config = json.load(f)
+    return config.get('model_config', {})
+AVAILABLE_MODELS = load_model_config()
 @observe()
+async def run_and_submit_all(model_provider: str, model_name: str, selected_questions: list, profile: gr.OAuthProfile | None):
     """
     Fetches all questions, runs the BasicAgent on them, submits all answers,
     and displays the results.
         print(f"Error instantiating agent: {e}")
         return f"Error initializing agent: {e}", None
     # In the case of an app running as a hugging Face space, this link points toward your codebase ( usefull for others so please keep it public)
+    agent_code = f"https://huggingface.co/spaces/{space_id}/tree/main"
     print(agent_code)
     # 2. Fetch Questions
         print(f"An unexpected error occurred fetching questions: {e}")
         return f"An unexpected error occurred fetching questions: {e}", None
+    # Filter questions
+    if "All" not in selected_questions:
+        questions_data = [q for q in questions_data if q['task_id'] in selected_questions]
     # 3. Run your Agent
     results_log = []
     answers_payload = []
     print(f"Running agent on {len(questions_data)} questions...")
+    for item in questions_data:
         task_id = item.get("task_id")
         question_text = item.get("question")
         file_name = item.get("file_name")
                 print(f"Error reading file {file_name}: {e}")
         try:
+            submitted_answer = await agent(question_text, model_name, model_provider)
             answers_payload.append({"task_id": task_id, "submitted_answer": submitted_answer})
             results_log.append({"Task ID": task_id, "Question": question_text, "Submitted Answer": submitted_answer})
         except Exception as e:
         results_df = pd.DataFrame(results_log)
         return status_message, results_df
+def get_questions():
+    api_url = DEFAULT_API_URL
+    questions_url = f"{api_url}/questions"
+    try:
+        response = requests.get(questions_url, timeout=15)
+        response.raise_for_status()
+        questions_data = response.json()
+        formatted_questions = [("All", "All")]
+        for index, q in enumerate(questions_data):
+            task_id = q.get('task_id')
+            question_text = q.get('question', '')
+            if task_id is not None:
+                label = f"{index + 1} - {question_text[:20]}..."
+                print(f"Generated label for task_id {task_id}: {label}") # Debug print
+                formatted_questions.append((label, task_id))
+        return formatted_questions
+    except Exception as e:
+        print(f"Error fetching questions for UI: {e}")
+        return [("All", "All")]
 # --- Build Gradio Interface using Blocks ---
 with gr.Blocks() as demo:
         1.  Please clone this space, then modify the code to define your agent's logic, the tools, the necessary packages, etc ...
         2.  Log in to your Hugging Face account using the button below. This uses your HF username for submission.
+        3.  Select the model provider and model to use.
+        4.  Select the questions to run (or "All").
+        5.  Click 'Run Evaluation & Submit All Answers' to fetch questions, run your agent, submit answers, and see the score.
         ---
         **Disclaimers:**
     gr.LoginButton()
+    with gr.Row():
+        providers = list(AVAILABLE_MODELS.keys())
+        default_provider = providers[0] if providers else None
+        model_provider_dd = gr.Dropdown(label="Model Provider", choices=providers, value=default_provider)
+        model_name_dd = gr.Dropdown(label="Model Name", choices=AVAILABLE_MODELS.get(default_provider, []))
+    def update_models(provider):
+        models = AVAILABLE_MODELS.get(provider, [])
+        return gr.Dropdown(choices=models, value=models[0] if models else None)
+    model_provider_dd.change(fn=update_models, inputs=model_provider_dd, outputs=model_name_dd)
+    question_selection = gr.CheckboxGroup(label="Select Questions to Run", choices=get_questions(), value=["All"])
+    run_button = gr.Button("Run Evaluation & Submit Selected Answers")
     status_output = gr.Textbox(label="Run Status / Submission Result", lines=5, interactive=False)
     results_table = gr.DataFrame(label="Questions and Agent Answers", wrap=True)
     run_button.click(
         fn=run_and_submit_all,
+        inputs=[model_provider_dd, model_name_dd, question_selection],
         outputs=[status_output, results_table]
     )
     print("-"*(60 + len(" App Starting ")) + "\n")
     print("Launching Gradio Interface for Basic Agent Evaluation...")
+    demo.launch(debug=False, share=False, server_name="0.0.0.0")

configurations/app-config.json ADDED Viewed

	@@ -0,0 +1,21 @@

+{
+    "model_config": {
+        "nvidia": [
+            "deepseek-ai/deepseek-v3.1",
+            "deepseek-ai/deepseek-v3.1-terminus",
+            "minimaxai/minimax-m2",
+            "mistralai/mistral-nemotron",
+            "qwen/qwen3-next-80b-a3b-instruct",
+            "qwen/qwen3-next-80b-a3b-thinking",
+            "moonshotai/kimi-k2-instruct-0905",
+            "nvidia/llama-3.3-nemotron-super-49b-v1.5"
+        ],
+        "google_genai": [
+            "gemini-2.0-flash",
+            "gemini-2.0-flash-lite",
+            "gemini-2.5-flash",
+            "gemini-2.5-flash-lite",
+            "gemini-2.5-pro"
+        ]
+    }
+}

configurations/mcp-server-config.json ADDED Viewed

	@@ -0,0 +1,19 @@

+{
+    "playwright_mcp":{
+        "transport": "stdio",
+        "command": "npx",
+        "args": [
+            "@playwright/mcp@latest",
+            "--headless",
+            "--isolated",
+            "--no-sandbox"
+        ]
+    },
+    "youtube_transcript_mcp":{
+        "transport": "stdio",
+        "command": "python",
+        "args": [
+            "mcp-servers/youtube-transcript.py"
+        ]
+    }
+}

docker-compose.yml ADDED Viewed

	@@ -0,0 +1,42 @@

+services:
+  hf-final-assignment-prod:
+    build:
+      context: .
+      dockerfile: Dockerfile
+    ports:
+      - "12500:7860"
+    environment:
+      - NVIDIA_API_KEY=${NVIDIA_API_KEY}  # Load this key from .env or manually add the secret
+      - GOOGLE_API_KEY=${GOOGLE_API_KEY}  # Load this key from .env or manually add the secret
+      - LANGFUSE_PUBLIC_KEY=${LANGFUSE_PUBLIC_KEY} # Load this key from .env or manually add the secret
+      - LANGFUSE_SECRET_KEY=${LANGFUSE_SECRET_KEY} # Load this key from .env or manually add the secret
+      - LANGFUSE_HOST=${LANGFUSE_HOST} # Load this key from .env or manually add the secret
+      - HF_TOKEN=${HF_TOKEN} # Load this key from .env or manually add the secret
+    # volumes:
+    #   - .:/app
+    restart: unless-stopped
+    networks:
+      - app-network
+  hf-final-assignment-dev:
+    build:
+      context: .
+      dockerfile: Dockerfile
+    ports:
+      - "12501:7860"
+    environment:
+      - NVIDIA_API_KEY=${NVIDIA_API_KEY}  # Load this key from .env or manually add the secret
+      - GOOGLE_API_KEY=${GOOGLE_API_KEY}  # Load this key from .env or manually add the secret
+      - LANGFUSE_PUBLIC_KEY=${LANGFUSE_PUBLIC_KEY} # Load this key from .env or manually add the secret
+      - LANGFUSE_SECRET_KEY=${LANGFUSE_SECRET_KEY} # Load this key from .env or manually add the secret
+      - LANGFUSE_HOST=${LANGFUSE_HOST} # Load this key from .env or manually add the secret
+      - HF_TOKEN=${HF_TOKEN} # Load this key from .env or manually add the secret
+    restart: unless-stopped
+    volumes:
+      - .:/app
+    networks:
+      - app-network
+networks:
+  app-network:
+    driver: bridge

langchain_agent.py CHANGED Viewed

@@ -8,6 +8,7 @@ from langchain_openai import ChatOpenAI
 from dotenv import load_dotenv
 from langfuse import observe
 from langfuse.langchain import CallbackHandler
@@ -47,52 +48,32 @@ class LangChainAgent:
     @observe()
-    async def __call__(self, question: str) -> str:
         print(f"Agent received question (first 50 chars): {question[:50]}...")
-        client = MultiServerMCPClient({
-            "playwright_mcp":{
-                "transport": "stdio",
-                "command": "npx",
-                "args": [
-                    "@playwright/mcp@latest",
-                    # "--headless"
-                ]
-            },
-            "youtube_transcript_mcp":{
-                "transport": "stdio",
-                "command": "python",
-                "args": [
-                    "mcp-servers/youtube-transcript.py"
-                ]
-            }
-        })
-        tools = await client.get_tools()
-        print(tools)
-        model_name = "gemini-2.0-flash"
-        model_provider = "google_genai" #google_genai
-        model = init_chat_model(model_name, model_provider=model_provider)
-        # # model_name = "deepseek-ai/deepseek-v3.1"
-        # # model_name = "deepseek-ai/deepseek-v3.1-terminus"
-        # # model_name = "minimaxai/minimax-m2"
-        # # model_name = "mistralai/mistral-nemotron"
-        # # model_name = "qwen/qwen3-next-80b-a3b-instruct"
-        # model_name = "qwen/qwen3-next-80b-a3b-thinking"
-        # # model_name = "moonshotai/kimi-k2-instruct-0905"
-        # model_name = "nvidia/llama-3.3-nemotron-super-49b-v1.5"
-        # # model_provider = "nvidia"
-        # model = ChatOpenAI(
-        #     model=model_name,
-        #     openai_api_key=os.getenv("NVIDIA_API_KEY"),
-        #     openai_api_base="https://integrate.api.nvidia.com/v1"
-        # )
         agent = create_agent(model, tools)
@@ -103,9 +84,6 @@ class LangChainAgent:
             ]
         },
         config={"callbacks": self.callbacks})
-        # print(f"Agent returning answer: {answer}")
         final_answer = self.extract_final_answer(answer)
         print(f"Extracted final answer: {final_answer}")

 from dotenv import load_dotenv
 from langfuse import observe
 from langfuse.langchain import CallbackHandler
+import json
     @observe()
+    async def __call__(self, question: str, model_name: str, model_provider: str) -> str:
         print(f"Agent received question (first 50 chars): {question[:50]}...")
+        with open("configurations/mcp-server-config.json", "r") as config_file:
+            mcp_config = json.load(config_file)
+        client = MultiServerMCPClient(mcp_config)
+        tools = await client.get_tools()
+        print(tools)
+        if model_provider == "google_genai":
+            model = init_chat_model(model_name, model_provider=model_provider)
+        elif model_provider == "nvidia":
+            model = ChatOpenAI(
+                model=model_name,
+                openai_api_key=os.getenv("NVIDIA_API_KEY"),
+                openai_api_base="https://integrate.api.nvidia.com/v1"
+            )
+        else:
+            # Default to nvidia if provider is not specified
+            model = ChatOpenAI(
+                model=model_name,
+                openai_api_key=os.getenv("NVIDIA_API_KEY"),
+                openai_api_base="https://integrate.api.nvidia.com/v1"
+            )
         agent = create_agent(model, tools)
             ]
         },
         config={"callbacks": self.callbacks})
         final_answer = self.extract_final_answer(answer)
         print(f"Extracted final answer: {final_answer}")

requirements.txt CHANGED Viewed

@@ -1,17 +1,11 @@
-gradio==5.49.1
-requests
-gradio[oauth]
 langfuse==3.8.1
-# smolagents[mcp]
-# smolagents[openai]
-# langchain-core==1.0.2
 langchain-mcp-adapters==0.1.12
 mcp==1.20.0
 langchain-google-genai==3.0.0
-# langchain-nvidia-ai-endpoints==0.3.19
 langchain==1.0.3
 openai==2.6.1
 langchain-deepseek==1.0.0
 langchain-openai==1.0.2
-youtube-transcript-api==1.2.3

+requests==2.32.5
+gradio[oauth]==5.49.1
 langfuse==3.8.1
 langchain-mcp-adapters==0.1.12
 mcp==1.20.0
 langchain-google-genai==3.0.0
 langchain==1.0.3
 openai==2.6.1
 langchain-deepseek==1.0.0
 langchain-openai==1.0.2
+youtube-transcript-api==1.2.3