--- language: - en license: mit library_name: transformers tags: - reranking - information-retrieval - pointwise - ranknet - efficient - llama base_model: meta-llama/Llama-3.2-3B datasets: - Tevatron/msmarco-passage - abdoelsayed/DeAR-COT pipeline_tag: text-classification --- # DeAR-3B-Reranker-RankNet-v1 ## Model Description **DeAR-3B-Reranker-RankNet-v1** is a 3B parameter efficient neural reranker trained with RankNet loss and knowledge distillation. This model offers the best speed-performance tradeoff in the DeAR family, achieving competitive results with significantly faster inference than larger models. ## Model Details - **Model Type:** Pointwise Reranker (Sequence Classification) - **Base Model:** LLaMA-3.2-3B - **Parameters:** 3 billion - **Training Method:** Knowledge Distillation + RankNet Loss - **Teacher Model:** [LLaMA2-13B-RankLLaMA](https://huggingface.co/abdoelsayed/llama2-13b-rankllama-teacher) - **Training Data:** MS MARCO + DeAR-COT - **Precision:** BFloat16 ## Key Features ✅ **Ultra Fast:** 1.5s inference (1.5x faster than 8B models) ✅ **Efficient:** Runs on single 16GB GPU ✅ **Strong Performance:** Competitive with larger models ✅ **Low Latency:** Ideal for production deployments ✅ **Small Footprint:** Only 6GB model size **Speed-Performance Tradeoff:** 95% accuracy at 1.5x speed! ## Usage ### Quick Start ```python import torch from transformers import AutoTokenizer, AutoModelForSequenceClassification # Load model model_path = "abdoelsayed/dear-3b-reranker-ranknet-v1" tokenizer = AutoTokenizer.from_pretrained(model_path) model = AutoModelForSequenceClassification.from_pretrained( model_path, torch_dtype=torch.bfloat16 ) model.eval().cuda() # Score a query-document pair query = "What is machine learning?" document = "Machine learning is a subset of artificial intelligence..." inputs = tokenizer( f"query: {query}", f"document: {document}", return_tensors="pt", truncation=True, max_length=228, padding="max_length" ) inputs = {k: v.cuda() for k, v in inputs.items()} with torch.no_grad(): score = model(**inputs).logits.squeeze().item() print(f"Relevance score: {score}") ``` ### Batch Reranking ```python from typing import List, Tuple @torch.inference_mode() def rerank(tokenizer, model, query: str, docs: List[Tuple[str, str]], batch_size: int = 64): """ Rerank documents for a query. Args: docs: List of (title, text) tuples Returns: List of (index, score) sorted by relevance """ device = next(model.parameters()).device scores = [] for i in range(0, len(docs), batch_size): batch = docs[i:i + batch_size] queries = [f"query: {query}"] * len(batch) documents = [f"document: {title} {text}" for title, text in batch] inputs = tokenizer( queries, documents, return_tensors="pt", truncation=True, max_length=228, padding=True ) inputs = {k: v.to(device) for k, v in inputs.items()} logits = model(**inputs).logits.squeeze(-1) scores.extend(logits.cpu().tolist()) return sorted(enumerate(scores), key=lambda x: x[1], reverse=True) # Example query = "When did Thomas Edison invent the light bulb?" docs = [ ("", "Lightning strike at Seoul National University"), ("", "Thomas Edison tried to invent a device for car but failed"), ("", "Coffee is good for diet"), ("", "KEPCO fixes light problems"), ("", "Thomas Edison invented the light bulb in 1879"), ] ranking = rerank(tokenizer, model, query, docs) print(ranking) # DeAR-P-3B-RL Output: # [(4, -1.3046875), (1, -5.125), (3, -6.3125), (0, -6.4375), (2, -6.96875)] ``` ## Training Details ### Training Configuration ```python { "base_model": "meta-llama/Llama-3.2-3B", "teacher_model": "abdoelsayed/llama2-13b-rankllama-teacher", "loss": "RankNet", "distillation": { "temperature": 2.0, "alpha": 0.1 }, "learning_rate": 1e-4, "batch_size": 4, "gradient_accumulation": 2, "epochs": 2, "max_length": 228, "bf16": true } ``` ### Hardware - **GPUs:** 4x NVIDIA A100 (40GB) - **Training Time:** ~18 hours (2x faster than 8B) - **Memory Usage:** ~24GB per GPU - **Framework:** DeepSpeed ZeRO Stage 2 ### Loss Function **RankNet Loss** with Knowledge Distillation: ``` L_total = (1 - α) * L_RankNet + α * L_KD where α = 0.1, temperature = 2.0 ``` ## Evaluation Results ### TREC Deep Learning | Dataset | NDCG@10 | NDCG@20 | MRR@10 | MAP | |---------|---------|---------|--------|-----| | DL19 | 71.2 | 67.8 | 84.5 | 42.1 | | DL20 | 69.4 | 66.2 | 82.3 | 40.5 | ### BEIR Benchmark | Dataset | NDCG@10 | |---------|---------| | MS MARCO | 65.8 | | NQ | 49.2 | | HotpotQA | 58.4 | | FiQA | 44.1 | | ArguAna | 56.2 | | SciFact | 70.8 | | TREC-COVID | 82.3 | | NFCorpus | 37.6 | | **Average** | **42.1** | ### Efficiency Metrics | Metric | Value | |--------|-------| | Inference Time (100 docs) | 1.5s | | Throughput | ~67 docs/sec | | GPU Memory (inference) | 12GB | | Model Size (BF16) | 6GB | ## Comparison ### vs. Larger Models | Model | Size | DL19 | DL20 | BEIR | Speed (s) | |-------|------|------|------|------|-----------| | **DeAR-3B-RL** | 3B | 71.2 | 69.4 | 42.1 | **1.5** | | DeAR-8B-RL | 8B | 74.5 | 72.8 | 45.2 | 2.2 | | Teacher-13B | 13B | 73.8 | 71.2 | 44.8 | 5.8 | | MonoT5-3B | 3B | 71.8 | 68.9 | 43.5 | 3.5 | **Key Insight:** Similar accuracy to MonoT5-3B with 2.3x faster inference! ### Speed-Accuracy Tradeoff ``` Accuracy: 95% of 8B model performance Speed: 1.5x faster Memory: 50% less GPU memory Size: 38% smaller on disk ``` ## Model Architecture ``` Input: "query: [Q] [SEP] document: [D]" ↓ LLaMA-3.2-3B Encoder (24 layers) ↓ [CLS] Token Representation ↓ Linear Classification Head ↓ Relevance Score ``` ## When to Use This Model **Best for:** - ✅ Production deployments requiring low latency - ✅ Resource-constrained environments - ✅ Large-scale reranking (millions of queries) - ✅ Cost-sensitive applications - ✅ Single GPU inference **Consider 8B models for:** - ❌ Maximum accuracy required - ❌ Research benchmarks - ❌ GPU resources not a constraint ## Deployment Recommendations ### Production Setup ```python # Optimize for inference model = AutoModelForSequenceClassification.from_pretrained( "abdoelsayed/dear-3b-reranker-ranknet-v1", torch_dtype=torch.bfloat16, device_map="auto" ) model.eval() # Enable torch.compile for 20% speedup (PyTorch 2.0+) model = torch.compile(model, mode="reduce-overhead") ``` ### Batch Processing For maximum throughput: - Use batch size 64-128 - Enable mixed precision (bf16) - Use torch.compile() - Consider ONNX export for CPU deployment ## Limitations 1. **Accuracy:** ~3 NDCG@10 points lower than 8B models 2. **Complex Queries:** May struggle with nuanced queries 3. **Document Length:** Same 196 token limit as larger models 4. **Language:** English only ## Fine-tuning ```python from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer model = AutoModelForSequenceClassification.from_pretrained( "abdoelsayed/dear-3b-reranker-ranknet-v1" ) training_args = TrainingArguments( output_dir="./finetuned-3b", learning_rate=5e-6, # Lower for fine-tuning per_device_train_batch_size=8, num_train_epochs=2, bf16=True, ) trainer = Trainer( model=model, args=training_args, train_dataset=your_dataset, ) trainer.train() ``` ## Related Models **DeAR 3B Family:** - [DeAR-3B-CE](https://huggingface.co/abdoelsayed/dear-3b-reranker-ce-v1) - Binary Cross-Entropy variant - [DeAR-3B-RankNet-LoRA](https://huggingface.co/abdoelsayed/dear-3b-reranker-ranknet-lora-v1) - LoRA adapter **Larger Models:** - [DeAR-8B-RankNet](https://huggingface.co/abdoelsayed/dear-8b-reranker-ranknet-v1) - Better accuracy **Resources:** - [Teacher Model](https://huggingface.co/abdoelsayed/llama2-13b-rankllama-teacher) - [DeAR-COT Dataset](https://huggingface.co/datasets/abdoelsayed/DeAR-COT) ## Citation ```bibtex @article{abdallah2025dear, title={DeAR: Dual-Stage Document Reranking with Reasoning Agents via LLM Distillation}, author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Jatowt, Adam}, journal={arXiv preprint arXiv:2508.16998}, year={2025} } ``` ## License MIT License ## More Information - **GitHub:** [DataScienceUIBK/DeAR-Reranking](https://github.com/DataScienceUIBK/DeAR-Reranking) - **Paper:** [arXiv:2508.16998](https://arxiv.org/abs/2508.16998) - **Collection:** [DeAR Models](https://huggingface.co/collections/abdoelsayed/dear-reranking)