|
|
--- |
|
|
language: |
|
|
- en |
|
|
license: mit |
|
|
library_name: peft |
|
|
tags: |
|
|
- reranking |
|
|
- information-retrieval |
|
|
- pointwise |
|
|
- lora |
|
|
- peft |
|
|
- ranknet |
|
|
base_model: meta-llama/Llama-3.1-8B |
|
|
datasets: |
|
|
- Tevatron/msmarco-passage |
|
|
- abdoelsayed/DeAR-COT |
|
|
pipeline_tag: text-classification |
|
|
--- |
|
|
|
|
|
# DeAR-8B-Reranker-RankNet-LoRA-v1 |
|
|
|
|
|
## Model Description |
|
|
|
|
|
**DeAR-8B-Reranker-RankNet-LoRA-v1** is a LoRA (Low-Rank Adaptation) adapter for neural reranking. This lightweight adapter can be applied to LLaMA-3.1-8B to create a reranker with minimal storage overhead. It achieves comparable performance to the full fine-tuned model while requiring only ~100MB of storage. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Model Type:** LoRA Adapter for Pointwise Reranking |
|
|
- **Base Model:** meta-llama/Llama-3.1-8B |
|
|
- **Adapter Size:** ~100MB (vs 16GB for full model) |
|
|
- **Training Method:** LoRA with RankNet Loss + Knowledge Distillation |
|
|
- **LoRA Rank:** 16 |
|
|
- **LoRA Alpha:** 32 |
|
|
- **Target Modules:** q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj |
|
|
|
|
|
## Key Features |
|
|
|
|
|
β
**Lightweight:** Only 100MB vs 16GB full model |
|
|
β
**Efficient Training:** Trains 3x faster than full fine-tuning |
|
|
β
**Easy Deployment:** Just load adapter on top of base model |
|
|
β
**Comparable Performance:** ~98% of full model performance |
|
|
β
**Memory Efficient:** Lower GPU memory during training |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Option 1: Load with PEFT (Recommended) |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
from peft import PeftModel, PeftConfig |
|
|
|
|
|
# Load LoRA adapter |
|
|
adapter_path = "abdoelsayed/dear-8b-reranker-ranknet-lora-v1" |
|
|
|
|
|
# Get base model from adapter config |
|
|
config = PeftConfig.from_pretrained(adapter_path) |
|
|
base_model_name = config.base_model_name_or_path |
|
|
|
|
|
# Load tokenizer |
|
|
tokenizer = AutoTokenizer.from_pretrained(base_model_name) |
|
|
if tokenizer.pad_token is None: |
|
|
tokenizer.pad_token = tokenizer.eos_token |
|
|
tokenizer.pad_token_id = tokenizer.eos_token_id |
|
|
|
|
|
# Load base model |
|
|
base_model = AutoModelForSequenceClassification.from_pretrained( |
|
|
base_model_name, |
|
|
num_labels=1, |
|
|
torch_dtype=torch.bfloat16 |
|
|
) |
|
|
|
|
|
# Load and merge LoRA adapter |
|
|
model = PeftModel.from_pretrained(base_model, adapter_path) |
|
|
model = model.merge_and_unload() # Merge adapter into base model |
|
|
|
|
|
model.eval().cuda() |
|
|
|
|
|
# Use the model |
|
|
query = "What is machine learning?" |
|
|
document = "Machine learning is a subset of artificial intelligence..." |
|
|
|
|
|
inputs = tokenizer( |
|
|
f"query: {query}", |
|
|
f"document: {document}", |
|
|
return_tensors="pt", |
|
|
truncation=True, |
|
|
max_length=228, |
|
|
padding="max_length" |
|
|
) |
|
|
inputs = {k: v.cuda() for k, v in inputs.items()} |
|
|
|
|
|
with torch.no_grad(): |
|
|
score = model(**inputs).logits.squeeze().item() |
|
|
|
|
|
print(f"Relevance score: {score}") |
|
|
``` |
|
|
|
|
|
### Option 2: Use Helper Function |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from typing import List, Tuple |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
from peft import PeftModel, PeftConfig |
|
|
|
|
|
def load_lora_ranker(adapter_path: str, device: str = "cuda"): |
|
|
"""Load LoRA adapter and merge with base model.""" |
|
|
# Get base model path from adapter config |
|
|
peft_config = PeftConfig.from_pretrained(adapter_path) |
|
|
base_model_name = peft_config.base_model_name_or_path |
|
|
|
|
|
# Load tokenizer |
|
|
tokenizer = AutoTokenizer.from_pretrained(base_model_name) |
|
|
if tokenizer.pad_token is None: |
|
|
tokenizer.pad_token = tokenizer.eos_token |
|
|
tokenizer.pad_token_id = tokenizer.eos_token_id |
|
|
tokenizer.padding_side = "right" |
|
|
|
|
|
# Load base model |
|
|
base_model = AutoModelForSequenceClassification.from_pretrained( |
|
|
base_model_name, |
|
|
num_labels=1, |
|
|
torch_dtype=torch.bfloat16 |
|
|
) |
|
|
|
|
|
# Load LoRA adapter and merge |
|
|
model = PeftModel.from_pretrained(base_model, adapter_path) |
|
|
model = model.merge_and_unload() |
|
|
|
|
|
model.eval().to(device) |
|
|
return tokenizer, model |
|
|
|
|
|
# Load model |
|
|
tokenizer, model = load_lora_ranker("abdoelsayed/dear-8b-reranker-ranknet-lora-v1") |
|
|
|
|
|
# Rerank documents |
|
|
@torch.inference_mode() |
|
|
def rerank(tokenizer, model, query: str, docs: List[Tuple[str, str]], batch_size: int = 64): |
|
|
"""Rerank documents for a query.""" |
|
|
device = next(model.parameters()).device |
|
|
scores = [] |
|
|
|
|
|
for i in range(0, len(docs), batch_size): |
|
|
batch = docs[i:i + batch_size] |
|
|
queries = [f"query: {query}"] * len(batch) |
|
|
documents = [f"document: {title} {text}" for title, text in batch] |
|
|
|
|
|
inputs = tokenizer( |
|
|
queries, |
|
|
documents, |
|
|
return_tensors="pt", |
|
|
truncation=True, |
|
|
max_length=228, |
|
|
padding=True |
|
|
) |
|
|
inputs = {k: v.to(device) for k, v in inputs.items()} |
|
|
|
|
|
logits = model(**inputs).logits.squeeze(-1) |
|
|
scores.extend(logits.cpu().tolist()) |
|
|
|
|
|
return sorted(enumerate(scores), key=lambda x: x[1], reverse=True) |
|
|
|
|
|
# Example |
|
|
query = "When did Thomas Edison invent the light bulb?" |
|
|
docs = [ |
|
|
("", "Thomas Edison invented the light bulb in 1879"), |
|
|
("", "Coffee is good for diet"), |
|
|
("", "Lightning strike at Seoul"), |
|
|
] |
|
|
|
|
|
ranking = rerank(tokenizer, model, query, docs) |
|
|
print(ranking) # [(0, 5.2), (2, -3.1), (1, -4.8)] |
|
|
``` |
|
|
|
|
|
### Using Without Merging (Memory Efficient) |
|
|
|
|
|
```python |
|
|
from peft import PeftModel, PeftConfig |
|
|
from transformers import AutoModelForSequenceClassification |
|
|
|
|
|
adapter_path = "abdoelsayed/dear-8b-reranker-ranknet-lora-v1" |
|
|
config = PeftConfig.from_pretrained(adapter_path) |
|
|
|
|
|
# Load base model |
|
|
base_model = AutoModelForSequenceClassification.from_pretrained( |
|
|
config.base_model_name_or_path, |
|
|
num_labels=1, |
|
|
torch_dtype=torch.bfloat16, |
|
|
device_map="auto" |
|
|
) |
|
|
|
|
|
# Load adapter (without merging) |
|
|
model = PeftModel.from_pretrained(base_model, adapter_path) |
|
|
model.eval() |
|
|
|
|
|
# Use model (adapter layers will be applied automatically) |
|
|
# ... same inference code as above ... |
|
|
``` |
|
|
|
|
|
## Performance |
|
|
|
|
|
| Benchmark | LoRA | Full Model | Difference | |
|
|
|-----------|------|------------|------------| |
|
|
| TREC DL19 | 74.2 | 74.5 | -0.3 | |
|
|
| TREC DL20 | 72.5 | 72.8 | -0.3 | |
|
|
| BEIR (Avg) | 44.9 | 45.2 | -0.3 | |
|
|
| MS MARCO | 68.6 | 68.9 | -0.3 | |
|
|
|
|
|
β
**98% of full model performance with only 0.6% of the storage!** |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### LoRA Configuration |
|
|
```python |
|
|
lora_config = { |
|
|
"r": 16, # LoRA rank |
|
|
"lora_alpha": 32, # Scaling factor |
|
|
"target_modules": [ |
|
|
"q_proj", "v_proj", "k_proj", "o_proj", |
|
|
"gate_proj", "up_proj", "down_proj" |
|
|
], |
|
|
"lora_dropout": 0.05, |
|
|
"bias": "none", |
|
|
"task_type": "SEQ_CLS" |
|
|
} |
|
|
``` |
|
|
|
|
|
### Training Hyperparameters |
|
|
```python |
|
|
training_args = { |
|
|
"learning_rate": 1e-4, # Higher than full fine-tuning |
|
|
"batch_size": 4, # Larger batch possible due to lower memory |
|
|
"gradient_accumulation": 2, |
|
|
"epochs": 2, |
|
|
"warmup_ratio": 0.1, |
|
|
"weight_decay": 0.01, |
|
|
"max_length": 228, |
|
|
"bf16": True |
|
|
} |
|
|
``` |
|
|
|
|
|
### Hardware |
|
|
- **GPUs:** 4x NVIDIA A100 (40GB) |
|
|
- **Training Time:** ~12 hours (3x faster than full model) |
|
|
- **Memory Usage:** ~28GB per GPU (vs ~38GB for full) |
|
|
- **Trainable Parameters:** 67M (0.8% of total) |
|
|
|
|
|
## Advantages of LoRA Version |
|
|
|
|
|
| Aspect | LoRA | Full Model | |
|
|
|--------|------|------------| |
|
|
| Storage | 100MB | 16GB | |
|
|
| Training Time | 12h | 36h | |
|
|
| Training Memory | 28GB | 38GB | |
|
|
| Performance | 98% | 100% | |
|
|
| Loading Time | Fast | Slow | |
|
|
| Easy Updates | β
Yes | β No | |
|
|
|
|
|
## When to Use LoRA vs Full Model |
|
|
|
|
|
**Use LoRA when:** |
|
|
- β
Storage is limited |
|
|
- β
Training multiple domain-specific versions |
|
|
- β
Need fast iteration/experimentation |
|
|
- β
0.3 NDCG@10 difference is acceptable |
|
|
|
|
|
**Use Full Model when:** |
|
|
- β Maximum performance required |
|
|
- β Storage not a concern |
|
|
- β Single production deployment |
|
|
|
|
|
## Fine-tuning on Your Data |
|
|
|
|
|
```python |
|
|
from peft import LoraConfig, get_peft_model, TaskType |
|
|
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer |
|
|
|
|
|
# Load base model |
|
|
base_model = AutoModelForSequenceClassification.from_pretrained( |
|
|
"meta-llama/Llama-3.1-8B", |
|
|
num_labels=1 |
|
|
) |
|
|
|
|
|
# Configure LoRA |
|
|
lora_config = LoraConfig( |
|
|
task_type=TaskType.SEQ_CLS, |
|
|
r=16, |
|
|
lora_alpha=32, |
|
|
target_modules=["q_proj", "v_proj", "k_proj", "o_proj"], |
|
|
lora_dropout=0.05, |
|
|
bias="none", |
|
|
) |
|
|
|
|
|
# Apply LoRA |
|
|
model = get_peft_model(base_model, lora_config) |
|
|
model.print_trainable_parameters() |
|
|
# Output: trainable params: 67M || all params: 8B || trainable%: 0.8% |
|
|
|
|
|
# Train |
|
|
training_args = TrainingArguments( |
|
|
output_dir="./lora-finetuned", |
|
|
learning_rate=1e-4, |
|
|
per_device_train_batch_size=8, |
|
|
num_train_epochs=3, |
|
|
bf16=True, |
|
|
) |
|
|
|
|
|
trainer = Trainer( |
|
|
model=model, |
|
|
args=training_args, |
|
|
train_dataset=your_dataset, |
|
|
) |
|
|
|
|
|
trainer.train() |
|
|
|
|
|
# Save only the LoRA adapter |
|
|
model.save_pretrained("./lora-adapter") |
|
|
``` |
|
|
|
|
|
## Model Files |
|
|
|
|
|
This adapter contains: |
|
|
- `adapter_config.json` - LoRA configuration |
|
|
- `adapter_model.safetensors` or `adapter_model.bin` - Adapter weights (~100MB) |
|
|
- `README.md` - This documentation |
|
|
|
|
|
## Related Models |
|
|
|
|
|
**Full Model:** |
|
|
- [DeAR-8B-RankNet](https://huggingface.co/abdoelsayed/dear-8b-reranker-ranknet-v1) - Full fine-tuned version |
|
|
|
|
|
**Other LoRA Adapters:** |
|
|
- [DeAR-8B-CE-LoRA](https://huggingface.co/abdoelsayed/dear-8b-reranker-ce-lora-v1) - Binary Cross-Entropy |
|
|
- [DeAR-8B-Listwise-LoRA](https://huggingface.co/abdoelsayed/dear-8b-reranker-listwise-lora-v1) - Listwise ranking |
|
|
|
|
|
**Resources:** |
|
|
- [DeAR-COT Dataset](https://huggingface.co/datasets/abdoelsayed/DeAR-COT) |
|
|
- [Teacher Model](https://huggingface.co/abdoelsayed/llama2-13b-rankllama-teacher) |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@article{abdallah2025dear, |
|
|
title={DeAR: Dual-Stage Document Reranking with Reasoning Agents via LLM Distillation}, |
|
|
author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Jatowt, Adam}, |
|
|
journal={arXiv preprint arXiv:2508.16998}, |
|
|
year={2025} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
MIT License |
|
|
|
|
|
## More Information |
|
|
|
|
|
- **GitHub:** [DataScienceUIBK/DeAR-Reranking](https://github.com/DataScienceUIBK/DeAR-Reranking) |
|
|
- **Paper:** [arXiv:2508.16998](https://arxiv.org/abs/2508.16998) |
|
|
- **Collection:** [DeAR Models](https://huggingface.co/collections/abdoelsayed/dear-reranking) |
|
|
|