|
|
--- |
|
|
language: |
|
|
- en |
|
|
license: mit |
|
|
library_name: peft |
|
|
tags: |
|
|
- reranking |
|
|
- information-retrieval |
|
|
- listwise |
|
|
- lora |
|
|
- peft |
|
|
- generative |
|
|
base_model: meta-llama/Llama-3.1-8B |
|
|
datasets: |
|
|
- abdoelsayed/DeAR-COT |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
# DeAR-8B-Reranker-Listwise-LoRA-v1 |
|
|
|
|
|
## Model Description |
|
|
|
|
|
**DeAR-8B-Reranker-Listwise-LoRA-v1** is a LoRA adapter for listwise neural reranking. This adapter enables generative document ranking with Chain-of-Thought reasoning while requiring only ~100MB storage. It achieves near full-model performance on complex ranking tasks. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Model Type:** LoRA Adapter for Listwise Reranking |
|
|
- **Base Model:** meta-llama/Llama-3.1-8B |
|
|
- **Adapter Size:** ~100MB |
|
|
- **Training Method:** LoRA with Supervised Fine-tuning + CoT |
|
|
- **LoRA Rank:** 16 |
|
|
- **LoRA Alpha:** 32 |
|
|
- **Framework:** LLaMA-Factory |
|
|
|
|
|
## Key Features |
|
|
|
|
|
β
**Lightweight:** Only 100MB vs 16GB full model |
|
|
β
**CoT Reasoning:** Generates ranking explanations |
|
|
β
**Listwise:** Considers document relationships |
|
|
β
**State-of-the-Art:** Outperforms GPT-4 on NovelEval |
|
|
β
**Efficient:** Faster training and deployment |
|
|
|
|
|
|
|
|
|
|
|
## Usage |
|
|
|
|
|
### Load with PEFT |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from transformers import AutoTokenizer |
|
|
from peft import AutoPeftModelForCausalLM |
|
|
|
|
|
# Load LoRA adapter (automatically loads base model) |
|
|
adapter_path = "abdoelsayed/dear-8b-reranker-listwise-lora-v1" |
|
|
dtype = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16 |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(adapter_path, use_fast=True) |
|
|
model = AutoPeftModelForCausalLM.from_pretrained( |
|
|
adapter_path, |
|
|
torch_dtype=dtype, |
|
|
device_map="auto", |
|
|
trust_remote_code=True, |
|
|
low_cpu_mem_usage=True |
|
|
) |
|
|
|
|
|
if tokenizer.pad_token is None: |
|
|
tokenizer.pad_token = tokenizer.eos_token |
|
|
|
|
|
# Prepare ranking prompt |
|
|
query = "When did Thomas Edison invent the light bulb?" |
|
|
documents = [ |
|
|
"Lightning strike at Seoul National University", |
|
|
"Thomas Edison tried to invent a device for car but failed", |
|
|
"Coffee is good for diet", |
|
|
"KEPCO fixes light problems", |
|
|
"Thomas Edison invented the light bulb in 1879", |
|
|
] |
|
|
|
|
|
doc_list = "\n".join([f"[{i}] {doc}" for i, doc in enumerate(documents)]) |
|
|
prompt = f"""I will provide you with {len(documents)} passages, each indicated by a number identifier []. |
|
|
Rank the passages based on their relevance to the search query: {query}. |
|
|
|
|
|
{doc_list} |
|
|
|
|
|
Search Query: {query}. |
|
|
Rank the passages above based on their relevance to the search query. Output the ranking as a list of numbers.""" |
|
|
|
|
|
# Generate ranking |
|
|
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2048) |
|
|
inputs = {k: v.to(model.device) for k, v in inputs.items()} |
|
|
|
|
|
with torch.no_grad(): |
|
|
outputs = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=50, |
|
|
temperature=0.7, |
|
|
do_sample=False, |
|
|
pad_token_id=tokenizer.pad_token_id |
|
|
) |
|
|
|
|
|
ranking = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True) |
|
|
print(f"Ranking: {ranking}") |
|
|
# Output: [4] > [1] > [0] > [3] > [2] |
|
|
``` |
|
|
|
|
|
### 4-bit Quantization (Low Memory) |
|
|
|
|
|
```python |
|
|
from peft import AutoPeftModelForCausalLM |
|
|
|
|
|
# Load with 4-bit quantization |
|
|
model = AutoPeftModelForCausalLM.from_pretrained( |
|
|
adapter_path, |
|
|
load_in_4bit=True, |
|
|
bnb_4bit_compute_dtype=torch.bfloat16, |
|
|
device_map="auto", |
|
|
trust_remote_code=True |
|
|
) |
|
|
``` |
|
|
|
|
|
### Complete Reranking Pipeline |
|
|
|
|
|
```python |
|
|
import re |
|
|
from typing import List |
|
|
|
|
|
class ListwiseLoRAReranker: |
|
|
def __init__(self, adapter_path: str): |
|
|
self.tokenizer = AutoTokenizer.from_pretrained(adapter_path, use_fast=True) |
|
|
self.model = AutoPeftModelForCausalLM.from_pretrained( |
|
|
adapter_path, |
|
|
torch_dtype=torch.bfloat16, |
|
|
device_map="auto", |
|
|
low_cpu_mem_usage=True |
|
|
) |
|
|
|
|
|
if self.tokenizer.pad_token is None: |
|
|
self.tokenizer.pad_token = self.tokenizer.eos_token |
|
|
|
|
|
def create_prompt(self, query: str, documents: List[str]) -> str: |
|
|
doc_list = "\n".join([f"[{i}] {doc[:300]}" for i, doc in enumerate(documents)]) |
|
|
return f"""I will provide you with {len(documents)} passages, each indicated by a number identifier []. |
|
|
Rank the passages based on their relevance to the search query: {query}. |
|
|
|
|
|
{doc_list} |
|
|
|
|
|
Search Query: {query}. |
|
|
Rank the passages above based on their relevance to the search query. Output the ranking as a list of numbers.""" |
|
|
|
|
|
def parse_ranking(self, text: str, num_docs: int) -> List[int]: |
|
|
numbers = re.findall(r'\[(\d+)\]', text) |
|
|
ranking = [int(n) for n in numbers if int(n) < num_docs] |
|
|
|
|
|
# Add missing docs |
|
|
for i in range(num_docs): |
|
|
if i not in ranking: |
|
|
ranking.append(i) |
|
|
|
|
|
return ranking[:num_docs] |
|
|
|
|
|
def rerank(self, query: str, documents: List[str]) -> List[int]: |
|
|
prompt = self.create_prompt(query, documents) |
|
|
inputs = self.tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2048) |
|
|
inputs = {k: v.to(self.model.device) for k, v in inputs.items()} |
|
|
|
|
|
with torch.no_grad(): |
|
|
outputs = self.model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=50, |
|
|
do_sample=False, |
|
|
pad_token_id=self.tokenizer.pad_token_id |
|
|
) |
|
|
|
|
|
output_text = self.tokenizer.decode( |
|
|
outputs[0][inputs['input_ids'].shape[1]:], |
|
|
skip_special_tokens=True |
|
|
) |
|
|
|
|
|
return self.parse_ranking(output_text, len(documents)) |
|
|
|
|
|
# Usage |
|
|
reranker = ListwiseLoRAReranker("abdoelsayed/dear-8b-reranker-listwise-lora-v1") |
|
|
ranking = reranker.rerank(query, documents) |
|
|
print(f"Ranked indices: {ranking}") |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### LoRA Configuration |
|
|
```yaml |
|
|
lora_rank: 16 |
|
|
lora_alpha: 32 |
|
|
target_modules: |
|
|
- q_proj |
|
|
- v_proj |
|
|
- k_proj |
|
|
- o_proj |
|
|
- gate_proj |
|
|
- up_proj |
|
|
- down_proj |
|
|
lora_dropout: 0.05 |
|
|
task_type: CAUSAL_LM |
|
|
``` |
|
|
|
|
|
### Training Setup |
|
|
- **Framework:** LLaMA-Factory |
|
|
- **Dataset:** [DeAR-COT](https://huggingface.co/datasets/abdoelsayed/DeAR-COT) |
|
|
- **Learning Rate:** 1e-5 |
|
|
- **Batch Size:** 4 |
|
|
- **Gradient Accumulation:** 4 |
|
|
- **Epochs:** 2 |
|
|
- **Max Length:** 2048 |
|
|
- **GPUs:** 4x A100 (80GB) |
|
|
- **Training Time:** ~24 hours (3x faster than full) |
|
|
- **Memory:** ~50GB per GPU |
|
|
|
|
|
## Advantages of LoRA |
|
|
|
|
|
| Feature | LoRA | Full Model | |
|
|
|---------|------|------------| |
|
|
| Storage | 100MB | 16GB | |
|
|
| Training Time | 24h | 72h | |
|
|
| Training Memory | 50GB | 70GB | |
|
|
| Performance | 99% | 100% | |
|
|
| Deployment | Fast | Slow | |
|
|
|
|
|
## Performance Comparison |
|
|
|
|
|
### TREC Deep Learning |
|
|
|
|
|
| Method | DL19 | DL20 | Avg | |
|
|
|--------|------|------|-----| |
|
|
| LoRA | 77.6 | 75.3 | 76.5 | |
|
|
| Full | 77.9 | 75.6 | 76.8 | |
|
|
| RankGPT-4 | 75.6 | 70.6 | 73.1 | |
|
|
|
|
|
### NovelEval |
|
|
|
|
|
| Method | NDCG@10 | |
|
|
|--------|---------| |
|
|
| **LoRA** | **90.6** | |
|
|
| Full | 91.0 | |
|
|
| GPT-4 | 87.9 | |
|
|
|
|
|
## When to Use |
|
|
|
|
|
**Best for:** |
|
|
- β
Resource-constrained environments |
|
|
- β
Multiple domain-specific versions |
|
|
- β
Fast experimentation |
|
|
- β
Complex reasoning queries |
|
|
|
|
|
**Use full model for:** |
|
|
- β Absolute maximum performance |
|
|
- β Single production deployment |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Slightly lower performance (-0.3 NDCG@10) |
|
|
- Still slower than pointwise models (~11s) |
|
|
- Limited to ~20-50 documents per query |
|
|
- Requires base model for inference |
|
|
|
|
|
## Related Models |
|
|
|
|
|
**Full Version:** |
|
|
- [DeAR-8B-Listwise](https://huggingface.co/abdoelsayed/dear-8b-reranker-listwise-v1) |
|
|
|
|
|
**Other LoRA:** |
|
|
- [DeAR-8B-RankNet-LoRA](https://huggingface.co/abdoelsayed/dear-8b-reranker-ranknet-lora-v1) |
|
|
- [DeAR-8B-CE-LoRA](https://huggingface.co/abdoelsayed/dear-8b-reranker-ce-lora-v1) |
|
|
|
|
|
**Resources:** |
|
|
- [DeAR-COT Dataset](https://huggingface.co/datasets/abdoelsayed/DeAR-COT) |
|
|
- [Teacher Model](https://huggingface.co/abdoelsayed/llama2-13b-rankllama-teacher) |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@article{abdallah2025dear, |
|
|
title={DeAR: Dual-Stage Document Reranking with Reasoning Agents via LLM Distillation}, |
|
|
author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Jatowt, Adam}, |
|
|
journal={arXiv preprint arXiv:2508.16998}, |
|
|
year={2025} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
MIT License |
|
|
|
|
|
## More Information |
|
|
|
|
|
- **GitHub:** [DataScienceUIBK/DeAR-Reranking](https://github.com/DataScienceUIBK/DeAR-Reranking) |
|
|
- **Paper:** [arXiv:2508.16998](https://arxiv.org/abs/2508.16998) |
|
|
- **Collection:** [DeAR Models](https://huggingface.co/collections/abdoelsayed/dear-reranking) |
|
|
|