File size: 4,867 Bytes
f1c59c2
7769e94
 
 
f1c59c2
7769e94
 
 
 
 
 
 
 
 
 
 
 
f1c59c2
 
7769e94
f1c59c2
7769e94
f1c59c2
7769e94
f1c59c2
 
 
7769e94
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f1c59c2
 
 
7769e94
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
---
language:
- en
license: mit
library_name: peft
tags:
- reranking
- information-retrieval
- pointwise
- lora
- peft
- binary-cross-entropy
base_model: meta-llama/Llama-3.1-8B
datasets:
- Tevatron/msmarco-passage
- abdoelsayed/DeAR-COT
pipeline_tag: text-classification
---

# DeAR-8B-Reranker-CE-LoRA-v1

## Model Description

**DeAR-8B-Reranker-CE-LoRA-v1** is a LoRA (Low-Rank Adaptation) adapter for neural reranking trained with Binary Cross-Entropy loss. This lightweight adapter requires only ~100MB of storage and can be applied to LLaMA-3.1-8B to achieve near full-model performance with minimal overhead.

## Model Details

- **Model Type:** LoRA Adapter for Pointwise Reranking
- **Base Model:** meta-llama/Llama-3.1-8B
- **Adapter Size:** ~100MB
- **Training Method:** LoRA with Binary Cross-Entropy + Knowledge Distillation
- **LoRA Rank:** 16
- **LoRA Alpha:** 32
- **Trainable Parameters:** 67M (0.8% of total)

## Key Features

✅ **Ultra Lightweight:** Only 100MB storage  
✅ **Efficient:** 3x faster training than full fine-tuning  
✅ **High Performance:** 98% of full model accuracy  
✅ **Easy Integration:** Simple adapter loading  
✅ **Classification-based:** Binary relevance prediction  


## Usage

### Load and Use

```python
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel, PeftConfig

# Load LoRA adapter
adapter_path = "abdoelsayed/dear-8b-reranker-ce-lora-v1"
config = PeftConfig.from_pretrained(adapter_path)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Load base model
base_model = AutoModelForSequenceClassification.from_pretrained(
    config.base_model_name_or_path,
    num_labels=1,
    torch_dtype=torch.bfloat16
)

# Load and merge LoRA
model = PeftModel.from_pretrained(base_model, adapter_path)
model = model.merge_and_unload()
model.eval().cuda()

# Score query-document pair
query = "What is machine learning?"
document = "Machine learning is a subset of artificial intelligence..."

inputs = tokenizer(
    f"query: {query}",
    f"document: {document}",
    return_tensors="pt",
    truncation=True,
    max_length=228,
    padding="max_length"
)
inputs = {k: v.cuda() for k, v in inputs.items()}

with torch.no_grad():
    score = model(**inputs).logits.squeeze().item()
print(f"Relevance score: {score}")
```

### Batch Reranking

```python
@torch.inference_mode()
def rerank(tokenizer, model, query: str, documents, batch_size=64):
    scores = []
    device = next(model.parameters()).device
    
    for i in range(0, len(documents), batch_size):
        batch = documents[i:i + batch_size]
        queries = [f"query: {query}"] * len(batch)
        docs = [f"document: {title} {text}" for title, text in batch]
        
        inputs = tokenizer(queries, docs, return_tensors="pt", 
                         truncation=True, max_length=228, padding=True)
        inputs = {k: v.to(device) for k, v in inputs.items()}
        
        logits = model(**inputs).logits.squeeze(-1)
        scores.extend(logits.cpu().tolist())
    
    return sorted(enumerate(scores), key=lambda x: x[1], reverse=True)
```

## Training Details

### LoRA Configuration
```python
{
    "r": 16,
    "lora_alpha": 32,
    "target_modules": ["q_proj", "v_proj", "k_proj", "o_proj", 
                       "gate_proj", "up_proj", "down_proj"],
    "lora_dropout": 0.05,
    "bias": "none",
    "task_type": "SEQ_CLS"
}
```

### Training Hyperparameters
- **Learning Rate:** 1e-4
- **Batch Size:** 4
- **Gradient Accumulation:** 2
- **Epochs:** 2
- **Hardware:** 4x A100 (40GB)
- **Training Time:** ~12 hours
- **Memory:** ~28GB per GPU

## Advantages

| Feature | LoRA | Full Model |
|---------|------|------------|
| Storage | 100MB | 16GB |
| Training Time | 12h | 34h |
| Performance | 98% | 100% |
| Memory | 28GB | 38GB |

## Related Models

- [DeAR-8B-CE](https://huggingface.co/abdoelsayed/dear-8b-reranker-ce-v1) - Full model
- [DeAR-8B-RankNet-LoRA](https://huggingface.co/abdoelsayed/dear-8b-reranker-ranknet-lora-v1) - RankNet variant
- [Teacher Model](https://huggingface.co/abdoelsayed/llama2-13b-rankllama-teacher)

## Citation

```bibtex
@article{abdallah2025dear,
  title={DeAR: Dual-Stage Document Reranking with Reasoning Agents via LLM Distillation},
  author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Jatowt, Adam},
  journal={arXiv preprint arXiv:2508.16998},
  year={2025}
}
```

## License

MIT License

## More Information

- **GitHub:** [DataScienceUIBK/DeAR-Reranking](https://github.com/DataScienceUIBK/DeAR-Reranking)
- **Paper:** [arXiv:2508.16998](https://arxiv.org/abs/2508.16998)
- **Collection:** [DeAR Models](https://huggingface.co/collections/abdoelsayed/dear-reranking)