abdoelsayed
/

dear-3b-reranker-ranknet-v1

+---
+language:
+- en
+license: mit
+library_name: transformers
+tags:
+- reranking
+- information-retrieval
+- pointwise
+- ranknet
+- efficient
+- llama
+base_model: meta-llama/Llama-3.2-3B
+datasets:
+- Tevatron/msmarco-passage
+- abdoelsayed/DeAR-COT
+pipeline_tag: text-classification
+---
+# DeAR-3B-Reranker-RankNet-v1
+## Model Description
+**DeAR-3B-Reranker-RankNet-v1** is a 3B parameter efficient neural reranker trained with RankNet loss and knowledge distillation. This model offers the best speed-performance tradeoff in the DeAR family, achieving competitive results with significantly faster inference than larger models.
+## Model Details
+- **Model Type:** Pointwise Reranker (Sequence Classification)
+- **Base Model:** LLaMA-3.2-3B
+- **Parameters:** 3 billion
+- **Training Method:** Knowledge Distillation + RankNet Loss
+- **Teacher Model:** [LLaMA2-13B-RankLLaMA](https://huggingface.co/abdoelsayed/llama2-13b-rankllama-teacher)
+- **Training Data:** MS MARCO + DeAR-COT
+- **Precision:** BFloat16
+## Key Features
+✅ **Ultra Fast:** 1.5s inference (1.5x faster than 8B models)
+✅ **Efficient:** Runs on single 16GB GPU
+✅ **Strong Performance:** Competitive with larger models
+✅ **Low Latency:** Ideal for production deployments
+✅ **Small Footprint:** Only 6GB model size
+**Speed-Performance Tradeoff:** 95% accuracy at 1.5x speed!
+## Usage
+### Quick Start
+```python
+import torch
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+# Load model
+model_path = "abdoelsayed/dear-3b-reranker-ranknet-v1"
+tokenizer = AutoTokenizer.from_pretrained(model_path)
+model = AutoModelForSequenceClassification.from_pretrained(
+    model_path,
+    torch_dtype=torch.bfloat16
+)
+model.eval().cuda()
+# Score a query-document pair
+query = "What is machine learning?"
+document = "Machine learning is a subset of artificial intelligence..."
+inputs = tokenizer(
+    f"query: {query}",
+    f"document: {document}",
+    return_tensors="pt",
+    truncation=True,
+    max_length=228,
+    padding="max_length"
+)
+inputs = {k: v.cuda() for k, v in inputs.items()}
+with torch.no_grad():
+    score = model(**inputs).logits.squeeze().item()
+print(f"Relevance score: {score}")
+```
+### Batch Reranking
+```python
+from typing import List, Tuple
+@torch.inference_mode()
+def rerank(tokenizer, model, query: str, docs: List[Tuple[str, str]], batch_size: int = 64):
+    """
+    Rerank documents for a query.
+    Args:
+        docs: List of (title, text) tuples
+    Returns:
+        List of (index, score) sorted by relevance
+    """
+    device = next(model.parameters()).device
+    scores = []
+    for i in range(0, len(docs), batch_size):
+        batch = docs[i:i + batch_size]
+        queries = [f"query: {query}"] * len(batch)
+        documents = [f"document: {title} {text}" for title, text in batch]
+        inputs = tokenizer(
+            queries,
+            documents,
+            return_tensors="pt",
+            truncation=True,
+            max_length=228,
+            padding=True
+        )
+        inputs = {k: v.to(device) for k, v in inputs.items()}
+        logits = model(**inputs).logits.squeeze(-1)
+        scores.extend(logits.cpu().tolist())
+    return sorted(enumerate(scores), key=lambda x: x[1], reverse=True)
+# Example
+query = "When did Thomas Edison invent the light bulb?"
+docs = [
+    ("", "Lightning strike at Seoul National University"),
+    ("", "Thomas Edison tried to invent a device for car but failed"),
+    ("", "Coffee is good for diet"),
+    ("", "KEPCO fixes light problems"),
+    ("", "Thomas Edison invented the light bulb in 1879"),
+]
+ranking = rerank(tokenizer, model, query, docs)
+print(ranking)
+# DeAR-P-3B-RL Output:
+# [(4, -1.3046875), (1, -5.125), (3, -6.3125), (0, -6.4375), (2, -6.96875)]
+```
+## Training Details
+### Training Configuration
+```python
+{
+    "base_model": "meta-llama/Llama-3.2-3B",
+    "teacher_model": "abdoelsayed/llama2-13b-rankllama-teacher",
+    "loss": "RankNet",
+    "distillation": {
+        "temperature": 2.0,
+        "alpha": 0.1
+    },
+    "learning_rate": 1e-4,
+    "batch_size": 4,
+    "gradient_accumulation": 2,
+    "epochs": 2,
+    "max_length": 228,
+    "bf16": true
+}
+```
+### Hardware
+- **GPUs:** 4x NVIDIA A100 (40GB)
+- **Training Time:** ~18 hours (2x faster than 8B)
+- **Memory Usage:** ~24GB per GPU
+- **Framework:** DeepSpeed ZeRO Stage 2
+### Loss Function
+**RankNet Loss** with Knowledge Distillation:
+```
+L_total = (1 - α) * L_RankNet + α * L_KD
+where α = 0.1, temperature = 2.0
+```
+## Evaluation Results
+### TREC Deep Learning
+| Dataset | NDCG@10 | NDCG@20 | MRR@10 | MAP |
+|---------|---------|---------|--------|-----|
+| DL19 | 71.2 | 67.8 | 84.5 | 42.1 |
+| DL20 | 69.4 | 66.2 | 82.3 | 40.5 |
+### BEIR Benchmark
+| Dataset | NDCG@10 |
+|---------|---------|
+| MS MARCO | 65.8 |
+| NQ | 49.2 |
+| HotpotQA | 58.4 |
+| FiQA | 44.1 |
+| ArguAna | 56.2 |
+| SciFact | 70.8 |
+| TREC-COVID | 82.3 |
+| NFCorpus | 37.6 |
+| **Average** | **42.1** |
+### Efficiency Metrics
+| Metric | Value |
+|--------|-------|
+| Inference Time (100 docs) | 1.5s |
+| Throughput | ~67 docs/sec |
+| GPU Memory (inference) | 12GB |
+| Model Size (BF16) | 6GB |
+## Comparison
+### vs. Larger Models
+| Model | Size | DL19 | DL20 | BEIR | Speed (s) |
+|-------|------|------|------|------|-----------|
+| **DeAR-3B-RL** | 3B | 71.2 | 69.4 | 42.1 | **1.5** |
+| DeAR-8B-RL | 8B | 74.5 | 72.8 | 45.2 | 2.2 |
+| Teacher-13B | 13B | 73.8 | 71.2 | 44.8 | 5.8 |
+| MonoT5-3B | 3B | 71.8 | 68.9 | 43.5 | 3.5 |
+**Key Insight:** Similar accuracy to MonoT5-3B with 2.3x faster inference!
+### Speed-Accuracy Tradeoff
+```
+Accuracy: 95% of 8B model performance
+Speed: 1.5x faster
+Memory: 50% less GPU memory
+Size: 38% smaller on disk
+```
+## Model Architecture
+```
+Input: "query: [Q] [SEP] document: [D]"
+    ↓
+LLaMA-3.2-3B Encoder (24 layers)
+    ↓
+[CLS] Token Representation
+    ↓
+Linear Classification Head
+    ↓
+Relevance Score
+```
+## When to Use This Model
+**Best for:**
+- ✅ Production deployments requiring low latency
+- ✅ Resource-constrained environments
+- ✅ Large-scale reranking (millions of queries)
+- ✅ Cost-sensitive applications
+- ✅ Single GPU inference
+**Consider 8B models for:**
+- ❌ Maximum accuracy required
+- ❌ Research benchmarks
+- ❌ GPU resources not a constraint
+## Deployment Recommendations
+### Production Setup
+```python
+# Optimize for inference
+model = AutoModelForSequenceClassification.from_pretrained(
+    "abdoelsayed/dear-3b-reranker-ranknet-v1",
+    torch_dtype=torch.bfloat16,
+    device_map="auto"
+)
+model.eval()
+# Enable torch.compile for 20% speedup (PyTorch 2.0+)
+model = torch.compile(model, mode="reduce-overhead")
+```
+### Batch Processing
+For maximum throughput:
+- Use batch size 64-128
+- Enable mixed precision (bf16)
+- Use torch.compile()
+- Consider ONNX export for CPU deployment
+## Limitations
+1. **Accuracy:** ~3 NDCG@10 points lower than 8B models
+2. **Complex Queries:** May struggle with nuanced queries
+3. **Document Length:** Same 196 token limit as larger models
+4. **Language:** English only
+## Fine-tuning
+```python
+from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer
+model = AutoModelForSequenceClassification.from_pretrained(
+    "abdoelsayed/dear-3b-reranker-ranknet-v1"
+)
+training_args = TrainingArguments(
+    output_dir="./finetuned-3b",
+    learning_rate=5e-6,  # Lower for fine-tuning
+    per_device_train_batch_size=8,
+    num_train_epochs=2,
+    bf16=True,
+)
+trainer = Trainer(
+    model=model,
+    args=training_args,
+    train_dataset=your_dataset,
+)
+trainer.train()
+```
+## Related Models
+**DeAR 3B Family:**
+- [DeAR-3B-CE](https://huggingface.co/abdoelsayed/dear-3b-reranker-ce-v1) - Binary Cross-Entropy variant
+- [DeAR-3B-RankNet-LoRA](https://huggingface.co/abdoelsayed/dear-3b-reranker-ranknet-lora-v1) - LoRA adapter
+**Larger Models:**
+- [DeAR-8B-RankNet](https://huggingface.co/abdoelsayed/dear-8b-reranker-ranknet-v1) - Better accuracy
+**Resources:**
+- [Teacher Model](https://huggingface.co/abdoelsayed/llama2-13b-rankllama-teacher)
+- [DeAR-COT Dataset](https://huggingface.co/datasets/abdoelsayed/DeAR-COT)
+## Citation
+```bibtex
+@article{abdallah2025dear,
+  title={DeAR: Dual-Stage Document Reranking with Reasoning Agents via LLM Distillation},
+  author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Jatowt, Adam},
+  journal={arXiv preprint arXiv:2508.16998},
+  year={2025}
+}
+```
+## License
+MIT License
+## More Information
+- **GitHub:** [DataScienceUIBK/DeAR-Reranking](https://github.com/DataScienceUIBK/DeAR-Reranking)
+- **Paper:** [arXiv:2508.16998](https://arxiv.org/abs/2508.16998)
+- **Collection:** [DeAR Models](https://huggingface.co/collections/abdoelsayed/dear-reranking)