|
|
--- |
|
|
license: llama3 |
|
|
base_model: |
|
|
- meta-llama/Meta-Llama-3-8B-Instruct |
|
|
tags: |
|
|
- question_answering |
|
|
- fine_tuned |
|
|
- lora |
|
|
- explainability |
|
|
--- |
|
|
# Llama3-8B-Instruct Fine-tuned for QED (Question-Explanation-Data) |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) specifically adapted for the QED (Question-Explanation-Data) task. The model has been trained to provide structured explanations for question answering by generating three key components simultaneously: direct answers, supporting sentences, and referential entity mappings. |
|
|
|
|
|
## Task Overview |
|
|
|
|
|
The QED task, introduced in ["A Framework and Dataset for Explanations in Question Answering"](https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00398/106795/QED-A-Framework-and-Dataset-for-Explanations-in), requires models to: |
|
|
|
|
|
1. **Answer Extraction**: Identify the shortest span from a passage that directly answers a given question |
|
|
2. **Evidence Selection**: Select the single sentence from the passage that best entails or implies the answer |
|
|
3. **Referential Mapping**: Establish connections between entities mentioned in the question and their corresponding references in the selected sentence |
|
|
|
|
|
## Fine-tuning Details |
|
|
|
|
|
- **Base Model**: meta-llama/Meta-Llama-3-8B-Instruct |
|
|
- **Fine-tuning Method**: LoRA (Low-Rank Adaptation) with rank=16, alpha=32 |
|
|
- **Quantization**: 4-bit quantization for memory efficiency |
|
|
- **Training Strategy**: Few-shot learning with "random_two" example prompting |
|
|
- **Training Data**: Curated subset of QED training examples |
|
|
- **Output Format**: Structured JSON containing answer, selected_sentence, and referential_equalities |
|
|
|
|
|
## Performance Improvements |
|
|
|
|
|
Significant improvements over the base model on QED evaluation metrics: |
|
|
|
|
|
| Metric | Base Model (Zero-shot) | Fine-tuned Model | Improvement | |
|
|
|--------|------------------------|------------------|-------------| |
|
|
| Exact Match Accuracy | 0.9% | **11.8%** | **+10.9%** | |
|
|
| Answer Accuracy | 82.0% | **86.4%** | **+4.4%** | |
|
|
| All Mention F1 | 5.5% | **38.4%** | **+32.9%** | |
|
|
| Question Mention F1 | 6.0% | **47.6%** | **+41.6%** | |
|
|
| Context Mention F1 | 5.0% | **29.2%** | **+24.2%** | |
|
|
|
|
|
*Results based on 0.5 F1 overlap threshold, non-strict matching* |
|
|
|
|
|
## Training Code & Methodology |
|
|
|
|
|
This model was trained using our comprehensive QED fine-tuning framework available on GitHub: |
|
|
|
|
|
**π [QED Fine-Tuning Framework](https://github.com/denisrize/QED-LLM-ExplanationGeneration/tree/main)** |
|
|
|
|
|
## Usage |
|
|
|
|
|
The model expects input in a specific format and outputs structured JSON: |
|
|
|
|
|
```python |
|
|
# Input format |
|
|
prompt = """ |
|
|
Title: [Document Title] |
|
|
Question: [Your Question] |
|
|
Passage: [Context Passage] |
|
|
|
|
|
You are an expert at extracting answers and structured explanations from text. |
|
|
Your response MUST be **valid JSON only** (no extra commentary). |
|
|
|
|
|
Task |
|
|
==== |
|
|
Given: |
|
|
β’ a **title** for the passage, |
|
|
β’ a **question** about the passage, and |
|
|
β’ the **context passage** itself, |
|
|
|
|
|
produce an explanation object with three parts: |
|
|
|
|
|
1. "answer" β the **shortest span** from the passage that fully answers the question. |
|
|
2. "selected_sentence" β the **single sentence** in the passage that entails or implies the answer. |
|
|
3. "referential_equalities" β a list of mappings between phrases in the question and phrases in the selected sentence |
|
|
that refer to the **same real-world entity/event**. |
|
|
|
|
|
β’ Each mapping has two keys: |
|
|
- "question_reference": the exact phrase from the question (**must be a contiguous substring from the question, |
|
|
not from the context or title**). |
|
|
- "sentence_reference": the exact phrase from the selected sentence (**must be a contiguous substring from the selected sentence, |
|
|
not from the question or title**), or "" (empty string if the entire sentence is the referent). |
|
|
|
|
|
βΈ Use **""** for "sentence_reference" when the entity/event is not named by any specific phrase in the sentence β |
|
|
i.e. the entire sentence acts as the referent (a *bridge* to the whole sentence). |
|
|
This corresponds to the (start = end = -1) convention in the QED dataset. |
|
|
|
|
|
Output format |
|
|
============= |
|
|
Return **only** JSON in this exact schema: |
|
|
|
|
|
{ |
|
|
"answer": "<string from passage>", |
|
|
"selected_sentence": "<string from passage>", |
|
|
"referential_equalities": [ |
|
|
{ |
|
|
"question_reference": "<string from question only>", |
|
|
"sentence_reference": "<string from selected_sentence only, or "">", |
|
|
"bridge": "<false if not a bridge; otherwise, a string explaining the bridge connection, e.g., 'in', 'for', 'of', 'at', 'on'>" |
|
|
} |
|
|
... |
|
|
] |
|
|
} |
|
|
""" |
|
|
|
|
|
# Expected output format |
|
|
{ |
|
|
"answer": "<shortest span from passage>", |
|
|
"selected_sentence": "<sentence that entails the answer>", |
|
|
"referential_equalities": [ |
|
|
{ |
|
|
"question_reference": "<entity from question>", |
|
|
"sentence_reference": "<corresponding entity from sentence>", |
|
|
"bridge": false |
|
|
} |
|
|
] |
|
|
} |
|
|
``` |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
Evaluated on the QED development set with official metrics across multiple overlap thresholds (0.5-0.9). The model shows consistent improvements in all measured aspects of the QED task, particularly excelling at entity reference mapping and answer extraction. |
|
|
|
|
|
## Training Details |
|
|
|
|
|
- **Dataset**: QED training subset with careful example curation |
|
|
- **Learning Rate**: 5e-6 with warmup ratio of 0.2 |
|
|
- **Batch Size**: Effective batch size of 16 through gradient accumulation |
|
|
- **Optimizer**: Paged AdamW 8-bit for memory efficiency |
|
|
- **Evaluation**: Multi-threshold validation (0.5-0.9 F1 overlap) |
|
|
- **Epochs**: 3 Epochs |
|
|
|
|
|
## Applications |
|
|
|
|
|
This model is particularly suitable for: |
|
|
- Educational question answering systems requiring explanations |
|
|
- Research applications needing interpretable QA |
|
|
- Systems where answer provenance and entity tracking are important |
|
|
- Building more transparent and accountable AI assistants |
|
|
|
|
|
## Citation |
|
|
|
|
|
Please cite the original QED work when using this model: |
|
|
|
|
|
```bibtex |
|
|
@article{lamm2020qed, |
|
|
title={QED: A Framework and Dataset for Explanations in Question Answering}, |
|
|
author={Lamm, Matthew and Palomaki, Jennimaria and Alberti, Chris and Andor, Daniel and Chen, Eunsol and Devlin, Jacob and Michael, Julian}, |
|
|
journal={arXiv preprint arXiv:2010.13806}, |
|
|
year={2020} |
|
|
} |
|
|
``` |
|
|
``` |
|
|
|