dear-8b-reranker-listwise-lora-v1 / README.md

Update README.md

400a0bd verified 2 months ago

8.42 kB

	---
	language:
	- en
	license: mit
	library_name: peft
	tags:
	- reranking
	- information-retrieval
	- listwise
	- lora
	- peft
	- generative
	base_model: meta-llama/Llama-3.1-8B
	datasets:
	- abdoelsayed/DeAR-COT
	pipeline_tag: text-generation
	---

	# DeAR-8B-Reranker-Listwise-LoRA-v1

	## Model Description

	DeAR-8B-Reranker-Listwise-LoRA-v1 is a LoRA adapter for listwise neural reranking. This adapter enables generative document ranking with Chain-of-Thought reasoning while requiring only ~100MB storage. It achieves near full-model performance on complex ranking tasks.

	## Model Details

	- Model Type: LoRA Adapter for Listwise Reranking
	- Base Model: meta-llama/Llama-3.1-8B
	- Adapter Size: ~100MB
	- Training Method: LoRA with Supervised Fine-tuning + CoT
	- LoRA Rank: 16
	- LoRA Alpha: 32
	- Framework: LLaMA-Factory

	## Key Features

	✅ Lightweight: Only 100MB vs 16GB full model
	✅ CoT Reasoning: Generates ranking explanations
	✅ Listwise: Considers document relationships
	✅ State-of-the-Art: Outperforms GPT-4 on NovelEval
	✅ Efficient: Faster training and deployment



	## Usage

	### Load with PEFT

	```python
	import torch
	from transformers import AutoTokenizer
	from peft import AutoPeftModelForCausalLM

	# Load LoRA adapter (automatically loads base model)
	adapter_path = "abdoelsayed/dear-8b-reranker-listwise-lora-v1"
	dtype = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16

	tokenizer = AutoTokenizer.from_pretrained(adapter_path, use_fast=True)
	model = AutoPeftModelForCausalLM.from_pretrained(
	adapter_path,
	torch_dtype=dtype,
	device_map="auto",
	trust_remote_code=True,
	low_cpu_mem_usage=True
	)

	if tokenizer.pad_token is None:
	tokenizer.pad_token = tokenizer.eos_token

	# Prepare ranking prompt
	query = "When did Thomas Edison invent the light bulb?"
	documents = [
	"Lightning strike at Seoul National University",
	"Thomas Edison tried to invent a device for car but failed",
	"Coffee is good for diet",
	"KEPCO fixes light problems",
	"Thomas Edison invented the light bulb in 1879",
	]

	doc_list = "\n".join([f"[{i}] {doc}" for i, doc in enumerate(documents)])
	prompt = f"""I will provide you with {len(documents)} passages, each indicated by a number identifier [].
	Rank the passages based on their relevance to the search query: {query}.

	{doc_list}

	Search Query: {query}.
	Rank the passages above based on their relevance to the search query. Output the ranking as a list of numbers."""

	# Generate ranking
	inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2048)
	inputs = {k: v.to(model.device) for k, v in inputs.items()}

	with torch.no_grad():
	outputs = model.generate(
	**inputs,
	max_new_tokens=50,
	temperature=0.7,
	do_sample=False,
	pad_token_id=tokenizer.pad_token_id
	)

	ranking = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
	print(f"Ranking: {ranking}")
	# Output: [4] > [1] > [0] > [3] > [2]
	```

	### 4-bit Quantization (Low Memory)

	```python
	from peft import AutoPeftModelForCausalLM

	# Load with 4-bit quantization
	model = AutoPeftModelForCausalLM.from_pretrained(
	adapter_path,
	load_in_4bit=True,
	bnb_4bit_compute_dtype=torch.bfloat16,
	device_map="auto",
	trust_remote_code=True
	)
	```

	### Complete Reranking Pipeline

	```python
	import re
	from typing import List

	class ListwiseLoRAReranker:
	def __init__(self, adapter_path: str):
	self.tokenizer = AutoTokenizer.from_pretrained(adapter_path, use_fast=True)
	self.model = AutoPeftModelForCausalLM.from_pretrained(
	adapter_path,
	torch_dtype=torch.bfloat16,
	device_map="auto",
	low_cpu_mem_usage=True
	)

	if self.tokenizer.pad_token is None:
	self.tokenizer.pad_token = self.tokenizer.eos_token

	def create_prompt(self, query: str, documents: List[str]) -> str:
	doc_list = "\n".join([f"[{i}] {doc[:300]}" for i, doc in enumerate(documents)])
	return f"""I will provide you with {len(documents)} passages, each indicated by a number identifier [].
	Rank the passages based on their relevance to the search query: {query}.

	{doc_list}

	Search Query: {query}.
	Rank the passages above based on their relevance to the search query. Output the ranking as a list of numbers."""

	def parse_ranking(self, text: str, num_docs: int) -> List[int]:
	numbers = re.findall(r'\[(\d+)\]', text)
	ranking = [int(n) for n in numbers if int(n) < num_docs]

	# Add missing docs
	for i in range(num_docs):
	if i not in ranking:
	ranking.append(i)

	return ranking[:num_docs]

	def rerank(self, query: str, documents: List[str]) -> List[int]:
	prompt = self.create_prompt(query, documents)
	inputs = self.tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2048)
	inputs = {k: v.to(self.model.device) for k, v in inputs.items()}

	with torch.no_grad():
	outputs = self.model.generate(
	**inputs,
	max_new_tokens=50,
	do_sample=False,
	pad_token_id=self.tokenizer.pad_token_id
	)

	output_text = self.tokenizer.decode(
	outputs[0][inputs['input_ids'].shape[1]:],
	skip_special_tokens=True
	)

	return self.parse_ranking(output_text, len(documents))

	# Usage
	reranker = ListwiseLoRAReranker("abdoelsayed/dear-8b-reranker-listwise-lora-v1")
	ranking = reranker.rerank(query, documents)
	print(f"Ranked indices: {ranking}")
	```

	## Training Details

	### LoRA Configuration
	```yaml
	lora_rank: 16
	lora_alpha: 32
	target_modules:
	- q_proj
	- v_proj
	- k_proj
	- o_proj
	- gate_proj
	- up_proj
	- down_proj
	lora_dropout: 0.05
	task_type: CAUSAL_LM
	```

	### Training Setup
	- Framework: LLaMA-Factory
	- Dataset: [DeAR-COT](https://huggingface.co/datasets/abdoelsayed/DeAR-COT)
	- Learning Rate: 1e-5
	- Batch Size: 4
	- Gradient Accumulation: 4
	- Epochs: 2
	- Max Length: 2048
	- GPUs: 4x A100 (80GB)
	- Training Time: ~24 hours (3x faster than full)
	- Memory: ~50GB per GPU

	## Advantages of LoRA

	\| Feature \| LoRA \| Full Model \|
	\|---------\|------\|------------\|
	\| Storage \| 100MB \| 16GB \|
	\| Training Time \| 24h \| 72h \|
	\| Training Memory \| 50GB \| 70GB \|
	\| Performance \| 99% \| 100% \|
	\| Deployment \| Fast \| Slow \|

	## Performance Comparison

	### TREC Deep Learning

	\| Method \| DL19 \| DL20 \| Avg \|
	\|--------\|------\|------\|-----\|
	\| LoRA \| 77.6 \| 75.3 \| 76.5 \|
	\| Full \| 77.9 \| 75.6 \| 76.8 \|
	\| RankGPT-4 \| 75.6 \| 70.6 \| 73.1 \|

	### NovelEval

	\| Method \| NDCG@10 \|
	\|--------\|---------\|
	\| LoRA \| 90.6 \|
	\| Full \| 91.0 \|
	\| GPT-4 \| 87.9 \|

	## When to Use

	Best for:
	- ✅ Resource-constrained environments
	- ✅ Multiple domain-specific versions
	- ✅ Fast experimentation
	- ✅ Complex reasoning queries

	Use full model for:
	- ❌ Absolute maximum performance
	- ❌ Single production deployment

	## Limitations

	- Slightly lower performance (-0.3 NDCG@10)
	- Still slower than pointwise models (~11s)
	- Limited to ~20-50 documents per query
	- Requires base model for inference

	## Related Models

	Full Version:
	- [DeAR-8B-Listwise](https://huggingface.co/abdoelsayed/dear-8b-reranker-listwise-v1)

	Other LoRA:
	- [DeAR-8B-RankNet-LoRA](https://huggingface.co/abdoelsayed/dear-8b-reranker-ranknet-lora-v1)
	- [DeAR-8B-CE-LoRA](https://huggingface.co/abdoelsayed/dear-8b-reranker-ce-lora-v1)

	Resources:
	- [DeAR-COT Dataset](https://huggingface.co/datasets/abdoelsayed/DeAR-COT)
	- [Teacher Model](https://huggingface.co/abdoelsayed/llama2-13b-rankllama-teacher)

	## Citation

	```bibtex
	@article{abdallah2025dear,
	title={DeAR: Dual-Stage Document Reranking with Reasoning Agents via LLM Distillation},
	author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Jatowt, Adam},
	journal={arXiv preprint arXiv:2508.16998},
	year={2025}
	}
	```

	## License

	MIT License

	## More Information

	- GitHub: [DataScienceUIBK/DeAR-Reranking](https://github.com/DataScienceUIBK/DeAR-Reranking)
	- Paper: [arXiv:2508.16998](https://arxiv.org/abs/2508.16998)
	- Collection: [DeAR Models](https://huggingface.co/collections/abdoelsayed/dear-reranking)