farjadmalik commited on
Commit
73445ba
Β·
1 Parent(s): 5d24c5c

fix readme

Browse files
Files changed (1) hide show
  1. README.md +35 -20
README.md CHANGED
@@ -5,7 +5,7 @@ colorFrom: "yellow"
5
  colorTo: "purple"
6
  sdk: "gradio"
7
  python_version: "3.10"
8
- start_command: "bash start.sh"
9
  short_description: "A Gradio RAG app for querying the poetry of Allama Iqbal."
10
  tags:
11
  - rag
@@ -17,15 +17,18 @@ tags:
17
 
18
  # Iqbal Poetry RAG System
19
 
20
- A Retrieval-Augmented Generation (RAG) system for exploring and querying the poetry of Allama Iqbal. This project leverages vector search and large language models (LLMs) to answer questions about Iqbal's poetry, providing relevant poem excerpts as context.
 
 
21
 
22
  ---
23
 
24
  ## πŸš€ Hugging Face Spaces Ready
25
 
 
26
  This project is ready to be deployed as a [Hugging Face Space](https://huggingface.co/spaces). The configuration block above (in YAML) tells Hugging Face how to launch the app:
27
  - **sdk**: Uses Gradio for the web interface.
28
- - **app_file**: Entry point for the app (`app/main.py`).
29
  - **python_version**: Uses Python 3.10.
30
  - **short_description**: Shown in the Space's thumbnail.
31
  - **tags**: For discoverability.
@@ -36,7 +39,7 @@ To deploy, simply upload this repository to your Hugging Face account as a new S
36
 
37
  ## Features
38
 
39
- - **Semantic Search**: Retrieve the most relevant poems for a given question using vector embeddings.
40
  - **LLM-Powered Answers**: Generate answers using a language model, grounded in retrieved poem context.
41
  - **Gradio Interface**: User-friendly web interface powered by [Gradio](https://gradio.app/).
42
  - **Plug-and-Play Dataset**: The poetry dataset is already included in the repository, with all paths set up for immediate use.
@@ -52,6 +55,16 @@ To deploy, simply upload this repository to your Hugging Face account as a new S
52
 
53
  - Python 3.9+
54
  - [uv](https://github.com/astral-sh/uv) (a fast Python package installer, drop-in replacement for pip)
 
 
 
 
 
 
 
 
 
 
55
 
56
  ### 1. Clone the repository
57
 
@@ -76,7 +89,7 @@ The poetry dataset is already included in the repository, and all file paths are
76
  To launch the Gradio app locally:
77
 
78
  ```bash
79
- python app/main.py
80
  ```
81
 
82
  This will start a Gradio web interface in your browser, where you can enter your questions about Iqbal's poetry and receive contextually grounded answers.
@@ -88,33 +101,35 @@ This will start a Gradio web interface in your browser, where you can enter your
88
  ```
89
  iqbal_poetry_rag/
90
  β”‚
91
- β”œβ”€β”€ app/
92
- β”‚ β”œβ”€β”€ RAGSystem.py # Main RAG system class
93
- β”‚ β”œβ”€β”€ main.py # Entry point for the Gradio app
94
- β”‚ └── config.py # Configuration (thresholds, file paths, etc.)
95
  β”‚
96
  β”œβ”€β”€ rag/
97
- β”‚ β”œβ”€β”€ vector_store.py # Vector store initialization and building
98
- β”‚ β”œβ”€β”€ retriever.py # Retriever configuration
99
- β”‚ β”œβ”€β”€ llm.py # LLM initialization and prompt management
 
100
  β”‚
101
  β”œβ”€β”€ utils/
102
- β”‚ β”œβ”€β”€ error_handling.py # Error handling decorators
103
- β”‚ └── feedback_logger.py # (Optional) Feedback logging
104
  β”‚
105
- β”œβ”€β”€ dataset/
106
- β”‚ └── poems.json # Iqbal's poetry dataset (already included)
107
  β”‚
108
- β”œβ”€β”€ requirements.txt # Project dependencies
109
- └── README.md # This file
 
110
  ```
111
 
112
  ---
113
 
114
  ## Configuration
115
 
116
- Edit `app/config.py` to set:
117
-
 
118
  - `SCORE_THRESHOLD`: Minimum similarity score for retrieved poems.
119
  - `JSON_FILE_PATH`: Path to your poems data file (already set to the included dataset).
120
 
 
5
  colorTo: "purple"
6
  sdk: "gradio"
7
  python_version: "3.10"
8
+ start_command: "python app.py"
9
  short_description: "A Gradio RAG app for querying the poetry of Allama Iqbal."
10
  tags:
11
  - rag
 
17
 
18
  # Iqbal Poetry RAG System
19
 
20
+ A Retrieval-Augmented Generation (RAG) system for exploring and querying the poetry of Allama Iqbal. This project leverages vector search and large language models (LLMs) to answer questions about Iqbal's poetry, providing relevant poem excerpts as context.
21
+
22
+ Note: On first run your will need to set up the vector embeddings store so the set up and initialization can take a few hours dependings on the performance of your PC.
23
 
24
  ---
25
 
26
  ## πŸš€ Hugging Face Spaces Ready
27
 
28
+ ### In Progress:
29
  This project is ready to be deployed as a [Hugging Face Space](https://huggingface.co/spaces). The configuration block above (in YAML) tells Hugging Face how to launch the app:
30
  - **sdk**: Uses Gradio for the web interface.
31
+ - **app_file**: Entry point for the app (`app.py`).
32
  - **python_version**: Uses Python 3.10.
33
  - **short_description**: Shown in the Space's thumbnail.
34
  - **tags**: For discoverability.
 
39
 
40
  ## Features
41
 
42
+ - **Semantic Search**: Retrieve the most relevant poems and their themes for a given question using vector embeddings.
43
  - **LLM-Powered Answers**: Generate answers using a language model, grounded in retrieved poem context.
44
  - **Gradio Interface**: User-friendly web interface powered by [Gradio](https://gradio.app/).
45
  - **Plug-and-Play Dataset**: The poetry dataset is already included in the repository, with all paths set up for immediate use.
 
55
 
56
  - Python 3.9+
57
  - [uv](https://github.com/astral-sh/uv) (a fast Python package installer, drop-in replacement for pip)
58
+ - HuggingFace account (https://huggingface.co/) (to use pretrained models)
59
+ - Ollama (https://ollama.com/) (to create vector embeddings)
60
+
61
+ ```bash
62
+ # install Ollama
63
+ curl -sSfL https://ollama.ai/install.sh | sh
64
+
65
+ # pull a model
66
+ ollama pull llama3
67
+ ```
68
 
69
  ### 1. Clone the repository
70
 
 
89
  To launch the Gradio app locally:
90
 
91
  ```bash
92
+ python app.py
93
  ```
94
 
95
  This will start a Gradio web interface in your browser, where you can enter your questions about Iqbal's poetry and receive contextually grounded answers.
 
101
  ```
102
  iqbal_poetry_rag/
103
  β”‚
104
+ β”œβ”€β”€ interface/
105
+ β”‚ β”œβ”€β”€ RAGSystem.py # Main RAG system class
106
+ β”‚ β”œβ”€β”€ gradio_interface.py # Gradio app and its interface
107
+ β”‚ └── config.py # Configuration (thresholds, file paths, etc.)
108
  β”‚
109
  β”œβ”€β”€ rag/
110
+ β”‚ β”œβ”€β”€ vector_store.py # Vector store initialization and building
111
+ β”‚ β”œβ”€β”€ retriever.py # Retriever configuration
112
+ β”‚ β”œβ”€β”€ llm.py # LLM initialization and prompt management
113
+ β”‚ └── embeddings.py # Embedding functionality for the RAG system uses Ollama
114
  β”‚
115
  β”œβ”€β”€ utils/
116
+ β”‚ β”œβ”€β”€ error_handling.py # Error handling decorators
117
+ β”‚ └── feedback_logger.py # (Optional) Feedback logging
118
  β”‚
119
+ β”œβ”€β”€ data/ # Iqbal's poetry dataset (already included)
 
120
  β”‚
121
+ β”œβ”€β”€ requirements.txt # Project dependencies
122
+ β”œβ”€β”€ app.py # Entry point for the app
123
+ └── README.md # This file
124
  ```
125
 
126
  ---
127
 
128
  ## Configuration
129
 
130
+ Edit `interface/config.py` to set:
131
+ - `HUGGING_FACE_TOKEN`: Your personal huggingface token (this can be set up using dotenv. Create a .env file in the home folder and store it as
132
+ HUGGING_FACE_TOKEN = <YOUR_TOKEN>)
133
  - `SCORE_THRESHOLD`: Minimum similarity score for retrieved poems.
134
  - `JSON_FILE_PATH`: Path to your poems data file (already set to the included dataset).
135