--- language: - en license: mit library_name: transformers tags: - reranking - information-retrieval - pointwise - binary-cross-entropy - llama base_model: meta-llama/Llama-3.1-8B datasets: - Tevatron/msmarco-passage - abdoelsayed/DeAR-COT pipeline_tag: text-classification --- # DeAR-8B-Reranker-CE-v1 ## Model Description **DeAR-8B-Reranker-CE-v1** is an 8B parameter neural reranker trained with Binary Cross-Entropy loss and knowledge distillation. This model uses a classification-based approach to document reranking and is optimized for both accuracy and inference speed. ## Model Details - **Model Type:** Pointwise Reranker (Binary Classification) - **Base Model:** LLaMA-3.1-8B - **Parameters:** 8 billion - **Training Method:** Knowledge Distillation + Binary Cross-Entropy Loss - **Teacher Model:** [LLaMA2-13B-RankLLaMA](https://huggingface.co/abdoelsayed/llama2-13b-rankllama-teacher) - **Training Data:** MS MARCO - **Precision:** BFloat16 ## Key Features ✅ **Classification-based:** Binary relevance prediction with probabilistic outputs ✅ **Fast Inference:** 2.2s average latency on standard GPU ✅ **Strong Baseline:** Competitive performance across benchmarks ✅ **CoT Enhanced:** Trained with Chain-of-Thought reasoning from teacher ## Performance | Benchmark | NDCG@10 | |-----------|---------| | TREC DL19 | 73.9 | | TREC DL20 | 72.1 | | BEIR (Avg) | 44.8 | | MS MARCO Dev | 68.5 | ## Usage ### Quick Start ```python import torch from transformers import AutoTokenizer, AutoModelForSequenceClassification # Load model model_path = "abdoelsayed/dear-8b-reranker-ce-v1" tokenizer = AutoTokenizer.from_pretrained(model_path) model = AutoModelForSequenceClassification.from_pretrained( model_path, torch_dtype=torch.bfloat16 ) model.eval().cuda() # Score a query-document pair query = "What is llama?" document = "The llama is a domesticated South American camelid..." inputs = tokenizer( f"query: {query}", f"document: {document}", return_tensors="pt", truncation=True, max_length=228, padding="max_length" ) inputs = {k: v.cuda() for k, v in inputs.items()} with torch.no_grad(): score = model(**inputs).logits.squeeze().item() print(f"Relevance score: {score}") ``` ### Complete Reranking Example ```python import torch from typing import List, Tuple from transformers import AutoTokenizer, AutoModelForSequenceClassification def load_reranker(model_path: str, device: str = "cuda"): """Load the reranker model and tokenizer.""" tokenizer = AutoTokenizer.from_pretrained(model_path) model = AutoModelForSequenceClassification.from_pretrained( model_path, torch_dtype=torch.bfloat16 ) # Configure padding token if tokenizer.pad_token is None: tokenizer.pad_token = tokenizer.eos_token tokenizer.pad_token_id = tokenizer.eos_token_id tokenizer.padding_side = "right" model.eval() model.to(device) return tokenizer, model @torch.inference_mode() def rerank( tokenizer, model, query: str, documents: List[Tuple[str, str]], # (title, text) batch_size: int = 64 ) -> List[Tuple[int, float]]: """ Rerank documents for a query. Returns: List of (doc_index, score) sorted by relevance (descending) """ device = next(model.parameters()).device scores = [] for i in range(0, len(documents), batch_size): batch = documents[i:i + batch_size] # Prepare batch queries = [f"query: {query}"] * len(batch) docs = [f"document: {title} {text}" for title, text in batch] inputs = tokenizer( queries, docs, return_tensors="pt", truncation=True, max_length=228, padding=True, return_attention_mask=True ) inputs = {k: v.to(device) for k, v in inputs.items()} # Score batch logits = model(**inputs).logits.squeeze(-1) scores.extend(logits.cpu().tolist()) # Rank by score ranked = sorted(enumerate(scores), key=lambda x: x[1], reverse=True) return ranked # Example tokenizer, model = load_reranker("abdoelsayed/dear-8b-reranker-ce-v1") query = "When did Thomas Edison invent the light bulb?" documents = [ ("", "Lightning strike at Seoul National University"), ("", "Thomas Edison tried to invent a device for car but failed"), ("", "Coffee is good for diet"), ("", "KEPCO fixes light problems"), ("", "Thomas Edison invented the light bulb in 1879"), ] ranking = rerank(tokenizer, model, query, documents) print(ranking) # Output: [(4, -2.015625), (1, -5.6875), (2, -6.375), (0, -6.5), (3, -6.78125)] # Document at index 4 is most relevant ``` ## Training Details ### Training Data - **Primary Dataset:** MS MARCO Passage Ranking (~8M pairs) - **CoT Dataset:** [DeAR-COT](https://huggingface.co/datasets/abdoelsayed/DeAR-COT) - **Teacher Annotations:** Soft labels from 13B teacher model ### Training Configuration ```python { "base_model": "meta-llama/Llama-3.1-8B", "teacher_model": "abdoelsayed/llama2-13b-rankllama-teacher", "loss": "Binary Cross-Entropy", "distillation": { "temperature": 2.0, "alpha": 0.1 }, "optimizer": "AdamW", "learning_rate": 1e-4, "batch_size": 2, "gradient_accumulation": 2, "epochs": 2, "max_length": 228, "q_max_len": 32, "p_max_len": 196, "warmup_ratio": 0.1, "weight_decay": 0.01, "bf16": true } ``` ### Hardware - **GPUs:** 4x NVIDIA A100 (40GB) - **Training Time:** ~34 hours - **Framework:** DeepSpeed ZeRO Stage 2 - **Memory Usage:** ~38GB per GPU ### Loss Function **Binary Cross-Entropy** with Knowledge Distillation: ```python L_total = (1 - α) * BCE(y_pred, y_true) + α * KL(σ(z_s/T), σ(z_t/T)) where: - BCE: Binary cross-entropy loss - KL: KL divergence - z_s: Student logits - z_t: Teacher logits - T: Temperature (2.0) - α: Distillation weight (0.1) - σ: Sigmoid function ``` ## Evaluation Results ### TREC Deep Learning | Dataset | NDCG@10 | NDCG@20 | MRR@10 | MAP | |---------|---------|---------|--------|-----| | DL19 | 73.90 | 69.82 | 87.3 | 44.92 | | DL20 | 72.10 | 68.45 | 85.1 | 42.67 | ### BEIR Benchmark | Dataset | NDCG@10 | NDCG@100 | |---------|---------|----------| | MS MARCO | 68.5 | 75.2 | | NQ | 51.8 | 69.4 | | HotpotQA | 61.2 | 74.8 | | FiQA | 46.8 | 62.3 | | ArguAna | 58.9 | 71.5 | | SciFact | 73.1 | 82.6 | | TREC-COVID | 84.7 | 88.3 | | NFCorpus | 39.4 | 51.7 | | **Average** | **44.8** | **68.2** | ### Efficiency Metrics | Metric | Value | |--------|-------| | Inference Time (batch=64) | 2.2s | | Throughput | ~45 docs/sec | | GPU Memory (inference) | 18GB | | Model Size (BF16) | 16GB | ## Comparison | Model | Loss | DL19 | DL20 | BEIR Avg | Speed (s) | |-------|------|------|------|----------|-----------| | **DeAR-8B-CE** | BCE | 73.9 | 72.1 | 44.8 | 2.2 | | **DeAR-8B-RankNet** | RankNet | 74.5 | 72.8 | 45.2 | 2.2 | | MonoT5-3B | - | 71.8 | 68.9 | 43.5 | 3.5 | | Teacher-13B | - | 73.8 | 71.2 | 44.8 | 5.8 | **Key Observations:** - Slightly lower performance than RankNet variant - Identical inference speed - More stable training (simpler loss) - Better for binary relevance tasks ## Model Architecture ``` Input Format: "query: [QUERY] document: [TITLE] [TEXT]" ↓ Tokenization (max_length=228) ↓ LLaMA-3.1-8B Transformer ↓ [CLS] Token Pooling ↓ Linear(hidden_size → 1) ↓ Sigmoid (optional) ↓ Relevance Score ``` ## When to Use This Model **Best for:** - ✅ Binary relevance classification - ✅ Large-scale reranking (fast inference) - ✅ General-purpose IR tasks - ✅ Resource-constrained environments **Consider alternatives for:** - ❌ Listwise ranking (use DeAR-8B-Listwise) - ❌ Maximum performance (use RankNet variant) - ❌ Extreme low-latency (use 3B models) ## Limitations 1. **Document Truncation:** Limited to 196 tokens per document 2. **Query Length:** Optimal for queries ≤32 tokens 3. **Language:** English only 4. **Domain:** Trained on MS MARCO (web documents) 5. **Pointwise:** Does not model inter-document dependencies ## Bias and Ethical Considerations - **Training Data Bias:** Inherits biases from MS MARCO dataset - **Representation Bias:** May perform differently across demographics - **Language Bias:** Optimized for English; other languages not evaluated - **Domain Bias:** Best performance on web-style documents **Recommendations:** - Evaluate fairness for your specific use case - Test on diverse query sets - Monitor for biased ranking patterns - Consider domain-specific fine-tuning ## Fine-tuning To fine-tune on your own data: ```python from transformers import AutoModelForSequenceClassification, Trainer, TrainingArguments model = AutoModelForSequenceClassification.from_pretrained( "abdoelsayed/dear-8b-reranker-ce-v1", num_labels=1 ) training_args = TrainingArguments( output_dir="./finetuned-model", learning_rate=5e-6, # Lower LR for fine-tuning per_device_train_batch_size=4, num_train_epochs=1, bf16=True, logging_steps=100, ) trainer = Trainer( model=model, args=training_args, train_dataset=your_dataset, ) trainer.train() ``` ## Related Models **DeAR Family (8B):** - [DeAR-8B-RankNet](https://huggingface.co/abdoelsayed/dear-8b-reranker-ranknet-v1) - RankNet loss variant - [DeAR-8B-Listwise](https://huggingface.co/abdoelsayed/dear-8b-reranker-listwise-v1) - Generative listwise reranker - [DeAR-8B-CE-LoRA](https://huggingface.co/abdoelsayed/dear-8b-reranker-ce-lora-v1) - LoRA adapter version **Other Sizes:** - [DeAR-3B-CE](https://huggingface.co/abdoelsayed/dear-3b-reranker-ce-v1) - Faster 3B variant **Resources:** - [Teacher Model](https://huggingface.co/abdoelsayed/llama2-13b-rankllama-teacher) - [DeAR-COT Dataset](https://huggingface.co/datasets/abdoelsayed/DeAR-COT) ## Citation ```bibtex @article{abdallah2025dear, title={DeAR: Dual-Stage Document Reranking with Reasoning Agents via LLM Distillation}, author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Jatowt, Adam}, journal={arXiv preprint arXiv:2508.16998}, year={2025} } ``` ## License MIT License ## More Information - **GitHub:** [DataScienceUIBK/DeAR-Reranking](https://github.com/DataScienceUIBK/DeAR-Reranking) - **Paper:** [arXiv:2508.16998](https://arxiv.org/abs/2508.16998) - **Collection:** [DeAR Model Collection](https://huggingface.co/collections/abdoelsayed/dear-reranking)