File size: 8,415 Bytes
31fe04b
400a0bd
 
 
31fe04b
400a0bd
 
 
 
 
 
 
 
 
 
 
31fe04b
 
400a0bd
31fe04b
400a0bd
31fe04b
400a0bd
31fe04b
 
 
400a0bd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31fe04b
 
 
400a0bd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
---
language:
- en
license: mit
library_name: peft
tags:
- reranking
- information-retrieval
- listwise
- lora
- peft
- generative
base_model: meta-llama/Llama-3.1-8B
datasets:
- abdoelsayed/DeAR-COT
pipeline_tag: text-generation
---

# DeAR-8B-Reranker-Listwise-LoRA-v1

## Model Description

**DeAR-8B-Reranker-Listwise-LoRA-v1** is a LoRA adapter for listwise neural reranking. This adapter enables generative document ranking with Chain-of-Thought reasoning while requiring only ~100MB storage. It achieves near full-model performance on complex ranking tasks.

## Model Details

- **Model Type:** LoRA Adapter for Listwise Reranking
- **Base Model:** meta-llama/Llama-3.1-8B
- **Adapter Size:** ~100MB
- **Training Method:** LoRA with Supervised Fine-tuning + CoT
- **LoRA Rank:** 16
- **LoRA Alpha:** 32
- **Framework:** LLaMA-Factory

## Key Features

βœ… **Lightweight:** Only 100MB vs 16GB full model  
βœ… **CoT Reasoning:** Generates ranking explanations  
βœ… **Listwise:** Considers document relationships  
βœ… **State-of-the-Art:** Outperforms GPT-4 on NovelEval  
βœ… **Efficient:** Faster training and deployment  



## Usage

### Load with PEFT

```python
import torch
from transformers import AutoTokenizer
from peft import AutoPeftModelForCausalLM

# Load LoRA adapter (automatically loads base model)
adapter_path = "abdoelsayed/dear-8b-reranker-listwise-lora-v1"
dtype = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16

tokenizer = AutoTokenizer.from_pretrained(adapter_path, use_fast=True)
model = AutoPeftModelForCausalLM.from_pretrained(
    adapter_path,
    torch_dtype=dtype,
    device_map="auto",
    trust_remote_code=True,
    low_cpu_mem_usage=True
)

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Prepare ranking prompt
query = "When did Thomas Edison invent the light bulb?"
documents = [
    "Lightning strike at Seoul National University",
    "Thomas Edison tried to invent a device for car but failed",
    "Coffee is good for diet",
    "KEPCO fixes light problems",
    "Thomas Edison invented the light bulb in 1879",
]

doc_list = "\n".join([f"[{i}] {doc}" for i, doc in enumerate(documents)])
prompt = f"""I will provide you with {len(documents)} passages, each indicated by a number identifier [].
Rank the passages based on their relevance to the search query: {query}.

{doc_list}

Search Query: {query}.
Rank the passages above based on their relevance to the search query. Output the ranking as a list of numbers."""

# Generate ranking
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2048)
inputs = {k: v.to(model.device) for k, v in inputs.items()}

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=50,
        temperature=0.7,
        do_sample=False,
        pad_token_id=tokenizer.pad_token_id
    )

ranking = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(f"Ranking: {ranking}")
# Output: [4] > [1] > [0] > [3] > [2]
```

### 4-bit Quantization (Low Memory)

```python
from peft import AutoPeftModelForCausalLM

# Load with 4-bit quantization
model = AutoPeftModelForCausalLM.from_pretrained(
    adapter_path,
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)
```

### Complete Reranking Pipeline

```python
import re
from typing import List

class ListwiseLoRAReranker:
    def __init__(self, adapter_path: str):
        self.tokenizer = AutoTokenizer.from_pretrained(adapter_path, use_fast=True)
        self.model = AutoPeftModelForCausalLM.from_pretrained(
            adapter_path,
            torch_dtype=torch.bfloat16,
            device_map="auto",
            low_cpu_mem_usage=True
        )
        
        if self.tokenizer.pad_token is None:
            self.tokenizer.pad_token = self.tokenizer.eos_token
    
    def create_prompt(self, query: str, documents: List[str]) -> str:
        doc_list = "\n".join([f"[{i}] {doc[:300]}" for i, doc in enumerate(documents)])
        return f"""I will provide you with {len(documents)} passages, each indicated by a number identifier [].
Rank the passages based on their relevance to the search query: {query}.

{doc_list}

Search Query: {query}.
Rank the passages above based on their relevance to the search query. Output the ranking as a list of numbers."""
    
    def parse_ranking(self, text: str, num_docs: int) -> List[int]:
        numbers = re.findall(r'\[(\d+)\]', text)
        ranking = [int(n) for n in numbers if int(n) < num_docs]
        
        # Add missing docs
        for i in range(num_docs):
            if i not in ranking:
                ranking.append(i)
        
        return ranking[:num_docs]
    
    def rerank(self, query: str, documents: List[str]) -> List[int]:
        prompt = self.create_prompt(query, documents)
        inputs = self.tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2048)
        inputs = {k: v.to(self.model.device) for k, v in inputs.items()}
        
        with torch.no_grad():
            outputs = self.model.generate(
                **inputs,
                max_new_tokens=50,
                do_sample=False,
                pad_token_id=self.tokenizer.pad_token_id
            )
        
        output_text = self.tokenizer.decode(
            outputs[0][inputs['input_ids'].shape[1]:],
            skip_special_tokens=True
        )
        
        return self.parse_ranking(output_text, len(documents))

# Usage
reranker = ListwiseLoRAReranker("abdoelsayed/dear-8b-reranker-listwise-lora-v1")
ranking = reranker.rerank(query, documents)
print(f"Ranked indices: {ranking}")
```

## Training Details

### LoRA Configuration
```yaml
lora_rank: 16
lora_alpha: 32
target_modules:
  - q_proj
  - v_proj
  - k_proj
  - o_proj
  - gate_proj
  - up_proj
  - down_proj
lora_dropout: 0.05
task_type: CAUSAL_LM
```

### Training Setup
- **Framework:** LLaMA-Factory
- **Dataset:** [DeAR-COT](https://huggingface.co/datasets/abdoelsayed/DeAR-COT)
- **Learning Rate:** 1e-5
- **Batch Size:** 4
- **Gradient Accumulation:** 4
- **Epochs:** 2
- **Max Length:** 2048
- **GPUs:** 4x A100 (80GB)
- **Training Time:** ~24 hours (3x faster than full)
- **Memory:** ~50GB per GPU

## Advantages of LoRA

| Feature | LoRA | Full Model |
|---------|------|------------|
| Storage | 100MB | 16GB |
| Training Time | 24h | 72h |
| Training Memory | 50GB | 70GB |
| Performance | 99% | 100% |
| Deployment | Fast | Slow |

## Performance Comparison

### TREC Deep Learning

| Method | DL19 | DL20 | Avg |
|--------|------|------|-----|
| LoRA | 77.6 | 75.3 | 76.5 |
| Full | 77.9 | 75.6 | 76.8 |
| RankGPT-4 | 75.6 | 70.6 | 73.1 |

### NovelEval

| Method | NDCG@10 |
|--------|---------|
| **LoRA** | **90.6** |
| Full | 91.0 |
| GPT-4 | 87.9 |

## When to Use

**Best for:**
- βœ… Resource-constrained environments
- βœ… Multiple domain-specific versions
- βœ… Fast experimentation
- βœ… Complex reasoning queries

**Use full model for:**
- ❌ Absolute maximum performance
- ❌ Single production deployment

## Limitations

- Slightly lower performance (-0.3 NDCG@10)
- Still slower than pointwise models (~11s)
- Limited to ~20-50 documents per query
- Requires base model for inference

## Related Models

**Full Version:**
- [DeAR-8B-Listwise](https://huggingface.co/abdoelsayed/dear-8b-reranker-listwise-v1)

**Other LoRA:**
- [DeAR-8B-RankNet-LoRA](https://huggingface.co/abdoelsayed/dear-8b-reranker-ranknet-lora-v1)
- [DeAR-8B-CE-LoRA](https://huggingface.co/abdoelsayed/dear-8b-reranker-ce-lora-v1)

**Resources:**
- [DeAR-COT Dataset](https://huggingface.co/datasets/abdoelsayed/DeAR-COT)
- [Teacher Model](https://huggingface.co/abdoelsayed/llama2-13b-rankllama-teacher)

## Citation

```bibtex
@article{abdallah2025dear,
  title={DeAR: Dual-Stage Document Reranking with Reasoning Agents via LLM Distillation},
  author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Jatowt, Adam},
  journal={arXiv preprint arXiv:2508.16998},
  year={2025}
}
```

## License

MIT License

## More Information

- **GitHub:** [DataScienceUIBK/DeAR-Reranking](https://github.com/DataScienceUIBK/DeAR-Reranking)
- **Paper:** [arXiv:2508.16998](https://arxiv.org/abs/2508.16998)
- **Collection:** [DeAR Models](https://huggingface.co/collections/abdoelsayed/dear-reranking)