Spaces:
Running
Running
| title: Ko-FreshQA Leaderboard | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: gradio | |
| app_file: app.py | |
| pinned: false | |
| license: apache-2.0 | |
| hf_oauth: true | |
| ## Ko-FreshQA Leaderboard | |
| νκ΅μ΄ FreshQA κΈ°λ° μλ νκ°/리λ보λ μμ€ν μ λλ€. μ°Έκ°μκ° μ λ‘λν CSVμ `model_response`λ₯Ό κΈ°μ€ λ°μ΄ν°μ λ§€μΉνκ³ , Upstage Solar λͺ¨λΈλ‘ Relaxed/Strict νκ°λ₯Ό μνν λ€ κ²°κ³Όλ₯Ό 리λ보λμ λ°μν©λλ€. Gradio UIλ‘ μ€νλ©λλ€. | |
| ### ν΅μ¬ κΈ°λ₯ | |
| - λ°μ΄ν°μ λ°°ν¬: DEV/TEST CSV λ€μ΄λ‘λ ν μ 곡 | |
| - μ μΆ λ° μλ νκ°: μ λ‘λλ CSVλ₯Ό λ³ν© β νκ° β μ§ν μ§κ³ β 리λ보λ λ°μ | |
| - μμΈ μ§ν: fact type, μ μ μ ν¨μ±(vp/fp), hop(one/multi), μ°λ(old/new), λλ©μΈλ³ μ νλ | |
| - μ μΆ μ ν(μ΅μ ): μ¬μ©μλ³ ν루 3ν μ ν κΈ°λ₯ (Hugging Face μ μ₯μ κΈ°λ°) | |
| --- | |
| ## λλ ν°λ¦¬ ꡬ쑰 | |
| - `app.py`: Gradio μ± μ΄κΈ°ν λ° ν κ΅¬μ± | |
| - `config.py`: νκ²½λ³μ λ‘λ λ° νμ μ€μ κ²μ¦ | |
| - `freshqa/` | |
| - `fresheval.py`: λ¨μΌ μν νκ° λ‘μ§ | |
| - `fresheval_parallel.py`: λ°μ΄ν°νλ μ λ³λ ¬ νκ° λνΌ | |
| - `freshqa_acc.py`: νκ° κ²°κ³Ό μ§κ³(μ νλ κ³μ° λ° λλ©μΈλ³ ν΅κ³) | |
| - `merge_csv_with_model_response.py`: κΈ°μ€ λ°μ΄ν°μ μ¬μ©μ CSV λ³ν© | |
| - `src/` | |
| - `submission_handler.py`: μ μΆλΆν° 리λ보λ λ°μκΉμ§ μ 체 μ€μΌμ€νΈλ μ΄μ | |
| - `submission_tracker.py`: μ μΆ μ΄λ ₯ μΆμ (HF repo κΈ°λ°, μ΅μ ) | |
| - `leaderboard_manager.py`: 리λ보λ CSV λ‘λ/μ μ₯/νμμ© μ 리 | |
| - `quick_csv_loader.py`, `hf_private_csv_loader.py`: HF Private repoμμ CSV λ‘λ μ νΈ | |
| - `api_key_rotator.py`, `utils.py`: μ νΈλ¦¬ν° | |
| - `ui/` | |
| - `leaderboard_tab.py`, `submission_tab.py`, `dataset_tab.py`, `styles.css` | |
| - `data/leaderboard_results.csv`: 리λ보λ λμ λ°μ΄ν° | |
| --- | |
| ## μꡬ μ¬ν | |
| - Python 3.10 | |
| - Upstage API ν€(λ¨μΌ λλ λ€μ€) | |
| - Hugging Face ν ν°(HF Private repo μ κ·Όμ©) | |
| - Hugging Face Dataset repo | |
| - κΈ°μ€ λ°μ΄ν°: `FRESHQA_DATA_REPO_ID` / `FRESHQA_DATA_FILENAME` | |
| - (μ΅μ ) μ μΆ μΆμ μ μ₯μ: `SUBMISSION_TRACKER_REPO_ID` | |
| - (μ΅μ ) 리λ보λλ₯Ό Hugging Face datasetμ λ°±μ νλ €λ©΄ `UPLOAD_LEADERBOARD_TO_HF=true` μ€μ | |
| μ€μΉ: | |
| ```bash | |
| python -m venv venv && source venv/bin/activate | |
| pip install -r requirements.txt | |
| ``` | |
| λλ Conda: | |
| ```bash | |
| conda env create -f environment.yml | |
| conda activate freshqa-leaderboard | |
| ``` | |
| --- | |
| ## νκ²½ λ³μ(.env) | |
| `env.example`λ₯Ό `.env`λ‘ λ³΅μ¬ ν κ° μ±μ°κΈ°: | |
| ```bash | |
| cp env.example .env | |
| ``` | |
| νμ/μ£Όμ λ³μ | |
| - HF_TOKEN | |
| - FRESHQA_DATA_REPO_ID | |
| - FRESHQA_DATA_FILENAME (κΈ°λ³Έκ°: ko-freshqa_2025_total.csv) | |
| - UPSTAGE_API_KEY λλ UPSTAGE_API_KEYS(μ½€λ§ κ΅¬λΆ) | |
| - ENABLE_SUBMISSION_LIMIT (κΈ°λ³Έ: true) | |
| - SUBMISSION_TRACKER_REPO_ID (μ μΆ μ ν μ¬μ© μ νμ) | |
| - UPLOAD_LEADERBOARD_TO_HF | |
| - true: 리λ보λλ₯Ό HF Private Datasetμλ λ°±μ (κΆμ₯: μ΄μ νκ²½) | |
| - false: λ‘컬 CSVμλ§ μ μ₯(κΆμ₯: λ‘컬 κ°λ°) | |
| κ²μ¦: μ± μμ μ `Config.validate_required_configs()`κ° λλ½λ νμ μ€μ μ κ²μ¬ν©λλ€. | |
| --- | |
| ## μ€ν | |
| λ‘컬: | |
| ```bash | |
| python app.py | |
| ``` | |
| κΈ°λ³Έ ν¬νΈ: 7860 | |
| Hugging Face Spaces: | |
| - νκ²½λ³μ `SPACE_ID`κ° μ‘΄μ¬νλ©΄ Spaces λͺ¨λλ‘ λμν©λλ€. | |
| Docker(μ΅μ ): | |
| - `Dockerfile`, `docker-compose.yml` μ 곡 (νμ μ μ€μ μ λ§κ² μμ ) | |
| --- | |
| ## μ¬μ© λ°©λ²(Gradio UI) | |
| 1) λ°μ΄ν°μ ν | |
| - DEV/TEST CSV λ€μ΄λ‘λ | |
| 2) μ μΆ λ° νκ° ν | |
| - μ λ‘λ: TEST CSVμ `model_response`κ° μ±μμ§ νμΌ | |
| - μ λ ₯: μ μΆμ μ΄λ¦, μ¬μ© λͺ¨λΈ, μ€λͺ | |
| - νκ°: Upstage Solar λͺ¨λΈλ‘ Relaxed/Strict λμ μν | |
| - μΆλ ₯: μ 체/μΈλΆ μ§νκ° κ³μ°λμ΄ λ¦¬λ보λμ λ°μ | |
| 3) 리λ보λ ν | |
| - μ μΆ κ²°κ³Όκ° `data/leaderboard_results.csv`μ λμ | |
| - (μ΅μ ) `UPLOAD_LEADERBOARD_TO_HF=true`μΈ κ²½μ° Hugging Face Datasetμλ | |
| `leaderboard_results.csv`λ‘ μλ μ λ‘λλ©λλ€. | |
| - κ²μ/μλ‘κ³ μΉ¨ κ°λ₯ | |
| --- | |
| ## λμ νλ¦(λ΄λΆ) | |
| 1) μ μΆ μ μ: `src/submission_handler.py::process_submission` | |
| 2) μ¬μ©μ CSV λ‘λ β κΈ°μ€ λ°μ΄ν°μ λ³ν©: | |
| - `freshqa/merge_csv_with_model_response.py::merge_dataframe_with_model_response_df` | |
| 3) νκ°: | |
| - `freshqa/fresheval_parallel.py::evaluate_dataframe` β `freshqa/fresheval.py::FreshEval` | |
| 4) μ νλ μ§κ³: | |
| - `freshqa/freshqa_acc.py::calculate_accuracy`, `process_freshqa_dataframe` | |
| 5) μ μ₯: | |
| - 리λ보λ: `src/leaderboard_manager.py::append_to_leaderboard_data` | |
| - (μ΅μ ) 리λ보λ HF μ μ₯μ λ°±μ : `UPLOAD_LEADERBOARD_TO_HF=true`μΌ λλ§ | |
| - (μ΅μ ) μ μΆ μ΄λ ₯: `src/submission_tracker.py` (ENABLE_SUBMISSION_LIMIT=true μΌ λλ§) | |
| μ£Όμ: `ENABLE_SUBMISSION_LIMIT=false`μΈ κ²½μ°, μ μΆ μ΄λ ₯ μΆμ μ© Hugging Face μ μ₯μ μ κ·Όμ μλνμ§ μλλ‘ μ½λκ° λ°μλμ΄ μμ΅λλ€. | |
| --- | |
| ## μ μΆ μ ν(μ΅μ ) | |
| - μ€μ : `ENABLE_SUBMISSION_LIMIT=true`(κΈ°λ³Έ) | |
| - μ μ₯μ: `SUBMISSION_TRACKER_REPO_ID`μ `user_submissions.json` κ΄λ¦¬ | |
| - λ‘μ§: | |
| - ν μ¬μ©μ ν루 3ν μ±κ³΅ μ μΆκΉμ§ μΉ΄μ΄νΈ | |
| - νκ΅ μκ° κΈ°μ€ 00:00μ μΌμ λ¨μλ‘ μΉ΄μ΄νΈ | |
| - λΉνμ±ν μ(HF μ μ₯μ μ κ·Ό μμ): `SubmissionHandler`κ° μΆμ κΈ°λ₯Ό μμ±νμ§ μμ | |
| --- | |
| ## νΈλ¬λΈμν | |
| - μμ μ βνμ μ€μ λλ½β μ€λ₯ | |
| - `.env`μμ `UPSTAGE_API_KEY(or KEYS)`, `HF_TOKEN`, `FRESHQA_DATA_REPO_ID` νμΈ | |
| - μ μΆ μ ν λΉνμ±νμΈλ° HF 404 κ²½κ³ κ° λ³΄μ | |
| - ν λ²μ μ `ENABLE_SUBMISSION_LIMIT=false`μΌ λ μ μΆ μΆμ κΈ°λ₯Ό μ΄κΈ°ννμ§ μλλ‘ μμ λ¨ | |
| - HF 404 (μ μΆ μ ν νμ±ν) | |
| - `SUBMISSION_TRACKER_REPO_ID` μ μ₯μμ `user_submissions.json`μ΄ μμΌλ©΄ μ΅μ΄ μ κ·Ό μ 404κ° λ μ μμ΅λλ€. νμΌμ λΉ JSON `{}`μΌλ‘ μμ±ν΄ λμΈμ. | |
| --- | |
| ## λΌμ΄μ μ€/μΆμ² | |
| - λ³Έ 리λ보λλ FreshQAμμ μκ°μ λ°μ μ μλμμ΅λλ€. | |
| λ¬Έμ μ¬νμ μ΄μλ‘ λ±λ‘ν΄ μ£ΌμΈμ. |