---
language:
- en
license: mit
library_name: transformers
tags:
- reranking
- information-retrieval
- pointwise
- ranknet
- efficient
- llama
base_model: meta-llama/Llama-3.2-3B
datasets:
- Tevatron/msmarco-passage
- abdoelsayed/DeAR-COT
pipeline_tag: text-classification
---

# DeAR-3B-Reranker-RankNet-v1

## Model Description

**DeAR-3B-Reranker-RankNet-v1** is a 3B parameter efficient neural reranker trained with RankNet loss and knowledge distillation. This model offers the best speed-performance tradeoff in the DeAR family, achieving competitive results with significantly faster inference than larger models.

## Model Details

- **Model Type:** Pointwise Reranker (Sequence Classification)
- **Base Model:** LLaMA-3.2-3B
- **Parameters:** 3 billion
- **Training Method:** Knowledge Distillation + RankNet Loss
- **Teacher Model:** [LLaMA2-13B-RankLLaMA](https://huggingface.co/abdoelsayed/llama2-13b-rankllama-teacher)
- **Training Data:** MS MARCO + DeAR-COT
- **Precision:** BFloat16

## Key Features

✅ **Ultra Fast:** 1.5s inference (1.5x faster than 8B models)  
✅ **Efficient:** Runs on single 16GB GPU  
✅ **Strong Performance:** Competitive with larger models  
✅ **Low Latency:** Ideal for production deployments  
✅ **Small Footprint:** Only 6GB model size  


**Speed-Performance Tradeoff:** 95% accuracy at 1.5x speed!

## Usage

### Quick Start

```python
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load model
model_path = "abdoelsayed/dear-3b-reranker-ranknet-v1"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForSequenceClassification.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16
)
model.eval().cuda()

# Score a query-document pair
query = "What is machine learning?"
document = "Machine learning is a subset of artificial intelligence..."

inputs = tokenizer(
    f"query: {query}",
    f"document: {document}",
    return_tensors="pt",
    truncation=True,
    max_length=228,
    padding="max_length"
)
inputs = {k: v.cuda() for k, v in inputs.items()}

with torch.no_grad():
    score = model(**inputs).logits.squeeze().item()
    
print(f"Relevance score: {score}")
```

### Batch Reranking

```python
from typing import List, Tuple

@torch.inference_mode()
def rerank(tokenizer, model, query: str, docs: List[Tuple[str, str]], batch_size: int = 64):
    """
    Rerank documents for a query.
    
    Args:
        docs: List of (title, text) tuples
    
    Returns:
        List of (index, score) sorted by relevance
    """
    device = next(model.parameters()).device
    scores = []
    
    for i in range(0, len(docs), batch_size):
        batch = docs[i:i + batch_size]
        queries = [f"query: {query}"] * len(batch)
        documents = [f"document: {title} {text}" for title, text in batch]
        
        inputs = tokenizer(
            queries,
            documents,
            return_tensors="pt",
            truncation=True,
            max_length=228,
            padding=True
        )
        inputs = {k: v.to(device) for k, v in inputs.items()}
        
        logits = model(**inputs).logits.squeeze(-1)
        scores.extend(logits.cpu().tolist())
    
    return sorted(enumerate(scores), key=lambda x: x[1], reverse=True)


# Example
query = "When did Thomas Edison invent the light bulb?"
docs = [
    ("", "Lightning strike at Seoul National University"),
    ("", "Thomas Edison tried to invent a device for car but failed"),
    ("", "Coffee is good for diet"),
    ("", "KEPCO fixes light problems"),
    ("", "Thomas Edison invented the light bulb in 1879"),
]

ranking = rerank(tokenizer, model, query, docs)
print(ranking)
# DeAR-P-3B-RL Output:
# [(4, -1.3046875), (1, -5.125), (3, -6.3125), (0, -6.4375), (2, -6.96875)]
```

## Training Details

### Training Configuration
```python
{
    "base_model": "meta-llama/Llama-3.2-3B",
    "teacher_model": "abdoelsayed/llama2-13b-rankllama-teacher",
    "loss": "RankNet",
    "distillation": {
        "temperature": 2.0,
        "alpha": 0.1
    },
    "learning_rate": 1e-4,
    "batch_size": 4,
    "gradient_accumulation": 2,
    "epochs": 2,
    "max_length": 228,
    "bf16": true
}
```

### Hardware
- **GPUs:** 4x NVIDIA A100 (40GB)
- **Training Time:** ~18 hours (2x faster than 8B)
- **Memory Usage:** ~24GB per GPU
- **Framework:** DeepSpeed ZeRO Stage 2

### Loss Function

**RankNet Loss** with Knowledge Distillation:
```
L_total = (1 - α) * L_RankNet + α * L_KD
where α = 0.1, temperature = 2.0
```

## Evaluation Results

### TREC Deep Learning

| Dataset | NDCG@10 | NDCG@20 | MRR@10 | MAP |
|---------|---------|---------|--------|-----|
| DL19 | 71.2 | 67.8 | 84.5 | 42.1 |
| DL20 | 69.4 | 66.2 | 82.3 | 40.5 |

### BEIR Benchmark

| Dataset | NDCG@10 |
|---------|---------|
| MS MARCO | 65.8 |
| NQ | 49.2 |
| HotpotQA | 58.4 |
| FiQA | 44.1 |
| ArguAna | 56.2 |
| SciFact | 70.8 |
| TREC-COVID | 82.3 |
| NFCorpus | 37.6 |
| **Average** | **42.1** |

### Efficiency Metrics

| Metric | Value |
|--------|-------|
| Inference Time (100 docs) | 1.5s |
| Throughput | ~67 docs/sec |
| GPU Memory (inference) | 12GB |
| Model Size (BF16) | 6GB |

## Comparison

### vs. Larger Models

| Model | Size | DL19 | DL20 | BEIR | Speed (s) |
|-------|------|------|------|------|-----------|
| **DeAR-3B-RL** | 3B | 71.2 | 69.4 | 42.1 | **1.5** |
| DeAR-8B-RL | 8B | 74.5 | 72.8 | 45.2 | 2.2 |
| Teacher-13B | 13B | 73.8 | 71.2 | 44.8 | 5.8 |
| MonoT5-3B | 3B | 71.8 | 68.9 | 43.5 | 3.5 |

**Key Insight:** Similar accuracy to MonoT5-3B with 2.3x faster inference!

### Speed-Accuracy Tradeoff

```
Accuracy: 95% of 8B model performance
Speed: 1.5x faster
Memory: 50% less GPU memory
Size: 38% smaller on disk
```

## Model Architecture

```
Input: "query: [Q] [SEP] document: [D]"
    ↓
LLaMA-3.2-3B Encoder (24 layers)
    ↓
[CLS] Token Representation
    ↓
Linear Classification Head
    ↓
Relevance Score
```

## When to Use This Model

**Best for:**
- ✅ Production deployments requiring low latency
- ✅ Resource-constrained environments
- ✅ Large-scale reranking (millions of queries)
- ✅ Cost-sensitive applications
- ✅ Single GPU inference

**Consider 8B models for:**
- ❌ Maximum accuracy required
- ❌ Research benchmarks
- ❌ GPU resources not a constraint

## Deployment Recommendations

### Production Setup

```python
# Optimize for inference
model = AutoModelForSequenceClassification.from_pretrained(
    "abdoelsayed/dear-3b-reranker-ranknet-v1",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
model.eval()

# Enable torch.compile for 20% speedup (PyTorch 2.0+)
model = torch.compile(model, mode="reduce-overhead")
```

### Batch Processing

For maximum throughput:
- Use batch size 64-128
- Enable mixed precision (bf16)
- Use torch.compile()
- Consider ONNX export for CPU deployment

## Limitations

1. **Accuracy:** ~3 NDCG@10 points lower than 8B models
2. **Complex Queries:** May struggle with nuanced queries
3. **Document Length:** Same 196 token limit as larger models
4. **Language:** English only

## Fine-tuning

```python
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer

model = AutoModelForSequenceClassification.from_pretrained(
    "abdoelsayed/dear-3b-reranker-ranknet-v1"
)

training_args = TrainingArguments(
    output_dir="./finetuned-3b",
    learning_rate=5e-6,  # Lower for fine-tuning
    per_device_train_batch_size=8,
    num_train_epochs=2,
    bf16=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=your_dataset,
)

trainer.train()
```

## Related Models

**DeAR 3B Family:**
- [DeAR-3B-CE](https://huggingface.co/abdoelsayed/dear-3b-reranker-ce-v1) - Binary Cross-Entropy variant
- [DeAR-3B-RankNet-LoRA](https://huggingface.co/abdoelsayed/dear-3b-reranker-ranknet-lora-v1) - LoRA adapter

**Larger Models:**
- [DeAR-8B-RankNet](https://huggingface.co/abdoelsayed/dear-8b-reranker-ranknet-v1) - Better accuracy

**Resources:**
- [Teacher Model](https://huggingface.co/abdoelsayed/llama2-13b-rankllama-teacher)
- [DeAR-COT Dataset](https://huggingface.co/datasets/abdoelsayed/DeAR-COT)

## Citation

```bibtex
@article{abdallah2025dear,
  title={DeAR: Dual-Stage Document Reranking with Reasoning Agents via LLM Distillation},
  author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Jatowt, Adam},
  journal={arXiv preprint arXiv:2508.16998},
  year={2025}
}
```

## License

MIT License

## More Information

- **GitHub:** [DataScienceUIBK/DeAR-Reranking](https://github.com/DataScienceUIBK/DeAR-Reranking)
- **Paper:** [arXiv:2508.16998](https://arxiv.org/abs/2508.16998)
- **Collection:** [DeAR Models](https://huggingface.co/collections/abdoelsayed/dear-reranking)