abdoelsayed
/

dear-8b-reranker-ce-v1

+---
+language:
+- en
+license: mit
+library_name: transformers
+tags:
+- reranking
+- information-retrieval
+- pointwise
+- binary-cross-entropy
+- llama
+base_model: meta-llama/Llama-3.1-8B
+datasets:
+- Tevatron/msmarco-passage
+- abdoelsayed/DeAR-COT
+pipeline_tag: text-classification
+---
+# DeAR-8B-Reranker-CE-v1
+## Model Description
+**DeAR-8B-Reranker-CE-v1** is an 8B parameter neural reranker trained with Binary Cross-Entropy loss and knowledge distillation. This model uses a classification-based approach to document reranking and is optimized for both accuracy and inference speed.
+## Model Details
+- **Model Type:** Pointwise Reranker (Binary Classification)
+- **Base Model:** LLaMA-3.1-8B
+- **Parameters:** 8 billion
+- **Training Method:** Knowledge Distillation + Binary Cross-Entropy Loss
+- **Teacher Model:** [LLaMA2-13B-RankLLaMA](https://huggingface.co/abdoelsayed/llama2-13b-rankllama-teacher)
+- **Training Data:** MS MARCO
+- **Precision:** BFloat16
+## Key Features
+✅ **Classification-based:** Binary relevance prediction with probabilistic outputs
+✅ **Fast Inference:** 2.2s average latency on standard GPU
+✅ **Strong Baseline:** Competitive performance across benchmarks
+✅ **CoT Enhanced:** Trained with Chain-of-Thought reasoning from teacher
+## Performance
+| Benchmark | NDCG@10 |
+|-----------|---------|
+| TREC DL19 | 73.9 |
+| TREC DL20 | 72.1 |
+| BEIR (Avg) | 44.8 |
+| MS MARCO Dev | 68.5 |
+## Usage
+### Quick Start
+```python
+import torch
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+# Load model
+model_path = "abdoelsayed/dear-8b-reranker-ce-v1"
+tokenizer = AutoTokenizer.from_pretrained(model_path)
+model = AutoModelForSequenceClassification.from_pretrained(
+    model_path,
+    torch_dtype=torch.bfloat16
+)
+model.eval().cuda()
+# Score a query-document pair
+query = "What is llama?"
+document = "The llama is a domesticated South American camelid..."
+inputs = tokenizer(
+    f"query: {query}",
+    f"document: {document}",
+    return_tensors="pt",
+    truncation=True,
+    max_length=228,
+    padding="max_length"
+)
+inputs = {k: v.cuda() for k, v in inputs.items()}
+with torch.no_grad():
+    score = model(**inputs).logits.squeeze().item()
+print(f"Relevance score: {score}")
+```
+### Complete Reranking Example
+```python
+import torch
+from typing import List, Tuple
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+def load_reranker(model_path: str, device: str = "cuda"):
+    """Load the reranker model and tokenizer."""
+    tokenizer = AutoTokenizer.from_pretrained(model_path)
+    model = AutoModelForSequenceClassification.from_pretrained(
+        model_path,
+        torch_dtype=torch.bfloat16
+    )
+    # Configure padding token
+    if tokenizer.pad_token is None:
+        tokenizer.pad_token = tokenizer.eos_token
+        tokenizer.pad_token_id = tokenizer.eos_token_id
+    tokenizer.padding_side = "right"
+    model.eval()
+    model.to(device)
+    return tokenizer, model
+@torch.inference_mode()
+def rerank(
+    tokenizer,
+    model,
+    query: str,
+    documents: List[Tuple[str, str]],  # (title, text)
+    batch_size: int = 64
+) -> List[Tuple[int, float]]:
+    """
+    Rerank documents for a query.
+    Returns:
+        List of (doc_index, score) sorted by relevance (descending)
+    """
+    device = next(model.parameters()).device
+    scores = []
+    for i in range(0, len(documents), batch_size):
+        batch = documents[i:i + batch_size]
+        # Prepare batch
+        queries = [f"query: {query}"] * len(batch)
+        docs = [f"document: {title} {text}" for title, text in batch]
+        inputs = tokenizer(
+            queries,
+            docs,
+            return_tensors="pt",
+            truncation=True,
+            max_length=228,
+            padding=True,
+            return_attention_mask=True
+        )
+        inputs = {k: v.to(device) for k, v in inputs.items()}
+        # Score batch
+        logits = model(**inputs).logits.squeeze(-1)
+        scores.extend(logits.cpu().tolist())
+    # Rank by score
+    ranked = sorted(enumerate(scores), key=lambda x: x[1], reverse=True)
+    return ranked
+# Example
+tokenizer, model = load_reranker("abdoelsayed/dear-8b-reranker-ce-v1")
+query = "When did Thomas Edison invent the light bulb?"
+documents = [
+    ("", "Lightning strike at Seoul National University"),
+    ("", "Thomas Edison tried to invent a device for car but failed"),
+    ("", "Coffee is good for diet"),
+    ("", "KEPCO fixes light problems"),
+    ("", "Thomas Edison invented the light bulb in 1879"),
+]
+ranking = rerank(tokenizer, model, query, documents)
+print(ranking)
+# Output: [(4, -2.015625), (1, -5.6875), (2, -6.375), (0, -6.5), (3, -6.78125)]
+# Document at index 4 is most relevant
+```
+## Training Details
+### Training Data
+- **Primary Dataset:** MS MARCO Passage Ranking (~8M pairs)
+- **CoT Dataset:** [DeAR-COT](https://huggingface.co/datasets/abdoelsayed/DeAR-COT)
+- **Teacher Annotations:** Soft labels from 13B teacher model
+### Training Configuration
+```python
+{
+    "base_model": "meta-llama/Llama-3.1-8B",
+    "teacher_model": "abdoelsayed/llama2-13b-rankllama-teacher",
+    "loss": "Binary Cross-Entropy",
+    "distillation": {
+        "temperature": 2.0,
+        "alpha": 0.1
+    },
+    "optimizer": "AdamW",
+    "learning_rate": 1e-4,
+    "batch_size": 2,
+    "gradient_accumulation": 2,
+    "epochs": 2,
+    "max_length": 228,
+    "q_max_len": 32,
+    "p_max_len": 196,
+    "warmup_ratio": 0.1,
+    "weight_decay": 0.01,
+    "bf16": true
+}
+```
+### Hardware
+- **GPUs:** 4x NVIDIA A100 (40GB)
+- **Training Time:** ~34 hours
+- **Framework:** DeepSpeed ZeRO Stage 2
+- **Memory Usage:** ~38GB per GPU
+### Loss Function
+**Binary Cross-Entropy** with Knowledge Distillation:
+```python
+L_total = (1 - α) * BCE(y_pred, y_true) + α * KL(σ(z_s/T), σ(z_t/T))
+where:
+- BCE: Binary cross-entropy loss
+- KL: KL divergence
+- z_s: Student logits
+- z_t: Teacher logits
+- T: Temperature (2.0)
+- α: Distillation weight (0.1)
+- σ: Sigmoid function
+```
+## Evaluation Results
+### TREC Deep Learning
+| Dataset | NDCG@10 | NDCG@20 | MRR@10 | MAP |
+|---------|---------|---------|--------|-----|
+| DL19 | 73.90 | 69.82 | 87.3 | 44.92 |
+| DL20 | 72.10 | 68.45 | 85.1 | 42.67 |
+### BEIR Benchmark
+| Dataset | NDCG@10 | NDCG@100 |
+|---------|---------|----------|
+| MS MARCO | 68.5 | 75.2 |
+| NQ | 51.8 | 69.4 |
+| HotpotQA | 61.2 | 74.8 |
+| FiQA | 46.8 | 62.3 |
+| ArguAna | 58.9 | 71.5 |
+| SciFact | 73.1 | 82.6 |
+| TREC-COVID | 84.7 | 88.3 |
+| NFCorpus | 39.4 | 51.7 |
+| **Average** | **44.8** | **68.2** |
+### Efficiency Metrics
+| Metric | Value |
+|--------|-------|
+| Inference Time (batch=64) | 2.2s |
+| Throughput | ~45 docs/sec |
+| GPU Memory (inference) | 18GB |
+| Model Size (BF16) | 16GB |
+## Comparison
+| Model | Loss | DL19 | DL20 | BEIR Avg | Speed (s) |
+|-------|------|------|------|----------|-----------|
+| **DeAR-8B-CE** | BCE | 73.9 | 72.1 | 44.8 | 2.2 |
+| **DeAR-8B-RankNet** | RankNet | 74.5 | 72.8 | 45.2 | 2.2 |
+| MonoT5-3B | - | 71.8 | 68.9 | 43.5 | 3.5 |
+| Teacher-13B | - | 73.8 | 71.2 | 44.8 | 5.8 |
+**Key Observations:**
+- Slightly lower performance than RankNet variant
+- Identical inference speed
+- More stable training (simpler loss)
+- Better for binary relevance tasks
+## Model Architecture
+```
+Input Format: "query: [QUERY] document: [TITLE] [TEXT]"
+    ↓
+Tokenization (max_length=228)
+    ↓
+LLaMA-3.1-8B Transformer
+    ↓
+[CLS] Token Pooling
+    ↓
+Linear(hidden_size → 1)
+    ↓
+Sigmoid (optional)
+    ↓
+Relevance Score
+```
+## When to Use This Model
+**Best for:**
+- ✅ Binary relevance classification
+- ✅ Large-scale reranking (fast inference)
+- ✅ General-purpose IR tasks
+- ✅ Resource-constrained environments
+**Consider alternatives for:**
+- ❌ Listwise ranking (use DeAR-8B-Listwise)
+- ❌ Maximum performance (use RankNet variant)
+- ❌ Extreme low-latency (use 3B models)
+## Limitations
+1. **Document Truncation:** Limited to 196 tokens per document
+2. **Query Length:** Optimal for queries ≤32 tokens
+3. **Language:** English only
+4. **Domain:** Trained on MS MARCO (web documents)
+5. **Pointwise:** Does not model inter-document dependencies
+## Bias and Ethical Considerations
+- **Training Data Bias:** Inherits biases from MS MARCO dataset
+- **Representation Bias:** May perform differently across demographics
+- **Language Bias:** Optimized for English; other languages not evaluated
+- **Domain Bias:** Best performance on web-style documents
+**Recommendations:**
+- Evaluate fairness for your specific use case
+- Test on diverse query sets
+- Monitor for biased ranking patterns
+- Consider domain-specific fine-tuning
+## Fine-tuning
+To fine-tune on your own data:
+```python
+from transformers import AutoModelForSequenceClassification, Trainer, TrainingArguments
+model = AutoModelForSequenceClassification.from_pretrained(
+    "abdoelsayed/dear-8b-reranker-ce-v1",
+    num_labels=1
+)
+training_args = TrainingArguments(
+    output_dir="./finetuned-model",
+    learning_rate=5e-6,  # Lower LR for fine-tuning
+    per_device_train_batch_size=4,
+    num_train_epochs=1,
+    bf16=True,
+    logging_steps=100,
+)
+trainer = Trainer(
+    model=model,
+    args=training_args,
+    train_dataset=your_dataset,
+)
+trainer.train()
+```
+## Related Models
+**DeAR Family (8B):**
+- [DeAR-8B-RankNet](https://huggingface.co/abdoelsayed/dear-8b-reranker-ranknet-v1) - RankNet loss variant
+- [DeAR-8B-Listwise](https://huggingface.co/abdoelsayed/dear-8b-reranker-listwise-v1) - Generative listwise reranker
+- [DeAR-8B-CE-LoRA](https://huggingface.co/abdoelsayed/dear-8b-reranker-ce-lora-v1) - LoRA adapter version
+**Other Sizes:**
+- [DeAR-3B-CE](https://huggingface.co/abdoelsayed/dear-3b-reranker-ce-v1) - Faster 3B variant
+**Resources:**
+- [Teacher Model](https://huggingface.co/abdoelsayed/llama2-13b-rankllama-teacher)
+- [DeAR-COT Dataset](https://huggingface.co/datasets/abdoelsayed/DeAR-COT)
+## Citation
+```bibtex
+@article{abdallah2025dear,
+  title={DeAR: Dual-Stage Document Reranking with Reasoning Agents via LLM Distillation},
+  author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Jatowt, Adam},
+  journal={arXiv preprint arXiv:2508.16998},
+  year={2025}
+}
+```
+## License
+MIT License
+## More Information
+- **GitHub:** [DataScienceUIBK/DeAR-Reranking](https://github.com/DataScienceUIBK/DeAR-Reranking)
+- **Paper:** [arXiv:2508.16998](https://arxiv.org/abs/2508.16998)
+- **Collection:** [DeAR Model Collection](https://huggingface.co/collections/abdoelsayed/dear-reranking)