dear-8b-reranker-listwise-lora-v1 / README.md

abdoelsayed

Update README.md

400a0bd verified about 2 months ago

preview code

raw

history blame contribute delete

8.42 kB

metadata

language:
  - en
license: mit
library_name: peft
tags:
  - reranking
  - information-retrieval
  - listwise
  - lora
  - peft
  - generative
base_model: meta-llama/Llama-3.1-8B
datasets:
  - abdoelsayed/DeAR-COT
pipeline_tag: text-generation

DeAR-8B-Reranker-Listwise-LoRA-v1

Model Description

DeAR-8B-Reranker-Listwise-LoRA-v1 is a LoRA adapter for listwise neural reranking. This adapter enables generative document ranking with Chain-of-Thought reasoning while requiring only ~100MB storage. It achieves near full-model performance on complex ranking tasks.

Model Details

Model Type: LoRA Adapter for Listwise Reranking
Base Model: meta-llama/Llama-3.1-8B
Adapter Size: ~100MB
Training Method: LoRA with Supervised Fine-tuning + CoT
LoRA Rank: 16
LoRA Alpha: 32
Framework: LLaMA-Factory

Key Features

✅ Lightweight: Only 100MB vs 16GB full model
✅ CoT Reasoning: Generates ranking explanations
✅ Listwise: Considers document relationships
✅ State-of-the-Art: Outperforms GPT-4 on NovelEval
✅ Efficient: Faster training and deployment

Usage

Load with PEFT

import torch
from transformers import AutoTokenizer
from peft import AutoPeftModelForCausalLM

# Load LoRA adapter (automatically loads base model)
adapter_path = "abdoelsayed/dear-8b-reranker-listwise-lora-v1"
dtype = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16

tokenizer = AutoTokenizer.from_pretrained(adapter_path, use_fast=True)
model = AutoPeftModelForCausalLM.from_pretrained(
    adapter_path,
    torch_dtype=dtype,
    device_map="auto",
    trust_remote_code=True,
    low_cpu_mem_usage=True
)

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Prepare ranking prompt
query = "When did Thomas Edison invent the light bulb?"
documents = [
    "Lightning strike at Seoul National University",
    "Thomas Edison tried to invent a device for car but failed",
    "Coffee is good for diet",
    "KEPCO fixes light problems",
    "Thomas Edison invented the light bulb in 1879",
]

doc_list = "\n".join([f"[{i}] {doc}" for i, doc in enumerate(documents)])
prompt = f"""I will provide you with {len(documents)} passages, each indicated by a number identifier [].
Rank the passages based on their relevance to the search query: {query}.

{doc_list}

Search Query: {query}.
Rank the passages above based on their relevance to the search query. Output the ranking as a list of numbers."""

# Generate ranking
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2048)
inputs = {k: v.to(model.device) for k, v in inputs.items()}

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=50,
        temperature=0.7,
        do_sample=False,
        pad_token_id=tokenizer.pad_token_id
    )

ranking = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(f"Ranking: {ranking}")
# Output: [4] > [1] > [0] > [3] > [2]

4-bit Quantization (Low Memory)

from peft import AutoPeftModelForCausalLM

# Load with 4-bit quantization
model = AutoPeftModelForCausalLM.from_pretrained(
    adapter_path,
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

Complete Reranking Pipeline

import re
from typing import List

class ListwiseLoRAReranker:
    def __init__(self, adapter_path: str):
        self.tokenizer = AutoTokenizer.from_pretrained(adapter_path, use_fast=True)
        self.model = AutoPeftModelForCausalLM.from_pretrained(
            adapter_path,
            torch_dtype=torch.bfloat16,
            device_map="auto",
            low_cpu_mem_usage=True
        )
        
        if self.tokenizer.pad_token is None:
            self.tokenizer.pad_token = self.tokenizer.eos_token
    
    def create_prompt(self, query: str, documents: List[str]) -> str:
        doc_list = "\n".join([f"[{i}] {doc[:300]}" for i, doc in enumerate(documents)])
        return f"""I will provide you with {len(documents)} passages, each indicated by a number identifier [].
Rank the passages based on their relevance to the search query: {query}.

{doc_list}

Search Query: {query}.
Rank the passages above based on their relevance to the search query. Output the ranking as a list of numbers."""
    
    def parse_ranking(self, text: str, num_docs: int) -> List[int]:
        numbers = re.findall(r'\[(\d+)\]', text)
        ranking = [int(n) for n in numbers if int(n) < num_docs]
        
        # Add missing docs
        for i in range(num_docs):
            if i not in ranking:
                ranking.append(i)
        
        return ranking[:num_docs]
    
    def rerank(self, query: str, documents: List[str]) -> List[int]:
        prompt = self.create_prompt(query, documents)
        inputs = self.tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2048)
        inputs = {k: v.to(self.model.device) for k, v in inputs.items()}
        
        with torch.no_grad():
            outputs = self.model.generate(
                **inputs,
                max_new_tokens=50,
                do_sample=False,
                pad_token_id=self.tokenizer.pad_token_id
            )
        
        output_text = self.tokenizer.decode(
            outputs[0][inputs['input_ids'].shape[1]:],
            skip_special_tokens=True
        )
        
        return self.parse_ranking(output_text, len(documents))

# Usage
reranker = ListwiseLoRAReranker("abdoelsayed/dear-8b-reranker-listwise-lora-v1")
ranking = reranker.rerank(query, documents)
print(f"Ranked indices: {ranking}")

Training Details

LoRA Configuration

lora_rank: 16
lora_alpha: 32
target_modules:
  - q_proj
  - v_proj
  - k_proj
  - o_proj
  - gate_proj
  - up_proj
  - down_proj
lora_dropout: 0.05
task_type: CAUSAL_LM

Training Setup

Framework: LLaMA-Factory
Dataset: DeAR-COT
Learning Rate: 1e-5
Batch Size: 4
Gradient Accumulation: 4
Epochs: 2
Max Length: 2048
GPUs: 4x A100 (80GB)
Training Time: ~24 hours (3x faster than full)
Memory: ~50GB per GPU

Advantages of LoRA

Feature	LoRA	Full Model
Storage	100MB	16GB
Training Time	24h	72h
Training Memory	50GB	70GB
Performance	99%	100%
Deployment	Fast	Slow

Performance Comparison

TREC Deep Learning

Method	DL19	DL20	Avg
LoRA	77.6	75.3	76.5
Full	77.9	75.6	76.8
RankGPT-4	75.6	70.6	73.1

NovelEval

Method	NDCG@10
LoRA	90.6
Full	91.0
GPT-4	87.9

When to Use

Best for:

✅ Resource-constrained environments
✅ Multiple domain-specific versions
✅ Fast experimentation
✅ Complex reasoning queries

Use full model for:

❌ Absolute maximum performance
❌ Single production deployment

Limitations

Slightly lower performance (-0.3 NDCG@10)
Still slower than pointwise models (~11s)
Limited to ~20-50 documents per query
Requires base model for inference

Related Models

Full Version:

DeAR-8B-Listwise

Other LoRA:

Resources:

Citation

@article{abdallah2025dear,
  title={DeAR: Dual-Stage Document Reranking with Reasoning Agents via LLM Distillation},
  author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Jatowt, Adam},
  journal={arXiv preprint arXiv:2508.16998},
  year={2025}
}

License

MIT License

More Information

GitHub: DataScienceUIBK/DeAR-Reranking
Paper: arXiv:2508.16998
Collection: DeAR Models