---
language:
- en
license: mit
library_name: transformers
tags:
- reranking
- information-retrieval
- pointwise
- binary-cross-entropy
- llama
base_model: meta-llama/Llama-3.1-8B
datasets:
- Tevatron/msmarco-passage
- abdoelsayed/DeAR-COT
pipeline_tag: text-classification
---

# DeAR-8B-Reranker-CE-v1

## Model Description

**DeAR-8B-Reranker-CE-v1** is an 8B parameter neural reranker trained with Binary Cross-Entropy loss and knowledge distillation. This model uses a classification-based approach to document reranking and is optimized for both accuracy and inference speed.

## Model Details

- **Model Type:** Pointwise Reranker (Binary Classification)
- **Base Model:** LLaMA-3.1-8B
- **Parameters:** 8 billion
- **Training Method:** Knowledge Distillation + Binary Cross-Entropy Loss
- **Teacher Model:** [LLaMA2-13B-RankLLaMA](https://huggingface.co/abdoelsayed/llama2-13b-rankllama-teacher)
- **Training Data:** MS MARCO 
- **Precision:** BFloat16

## Key Features

✅ **Classification-based:** Binary relevance prediction with probabilistic outputs  
✅ **Fast Inference:** 2.2s average latency on standard GPU  
✅ **Strong Baseline:** Competitive performance across benchmarks  
✅ **CoT Enhanced:** Trained with Chain-of-Thought reasoning from teacher  

## Performance

| Benchmark | NDCG@10 |
|-----------|---------|
| TREC DL19 | 73.9 |
| TREC DL20 | 72.1 |
| BEIR (Avg) | 44.8 |
| MS MARCO Dev | 68.5 |

## Usage

### Quick Start

```python
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load model
model_path = "abdoelsayed/dear-8b-reranker-ce-v1"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForSequenceClassification.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16
)
model.eval().cuda()

# Score a query-document pair
query = "What is llama?"
document = "The llama is a domesticated South American camelid..."

inputs = tokenizer(
    f"query: {query}",
    f"document: {document}",
    return_tensors="pt",
    truncation=True,
    max_length=228,
    padding="max_length"
)
inputs = {k: v.cuda() for k, v in inputs.items()}

with torch.no_grad():
    score = model(**inputs).logits.squeeze().item()
    
print(f"Relevance score: {score}")
```

### Complete Reranking Example

```python
import torch
from typing import List, Tuple
from transformers import AutoTokenizer, AutoModelForSequenceClassification

def load_reranker(model_path: str, device: str = "cuda"):
    """Load the reranker model and tokenizer."""
    tokenizer = AutoTokenizer.from_pretrained(model_path)
    model = AutoModelForSequenceClassification.from_pretrained(
        model_path,
        torch_dtype=torch.bfloat16
    )
    
    # Configure padding token
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token
        tokenizer.pad_token_id = tokenizer.eos_token_id
    tokenizer.padding_side = "right"
    
    model.eval()
    model.to(device)
    return tokenizer, model

@torch.inference_mode()
def rerank(
    tokenizer,
    model,
    query: str,
    documents: List[Tuple[str, str]],  # (title, text)
    batch_size: int = 64
) -> List[Tuple[int, float]]:
    """
    Rerank documents for a query.
    
    Returns:
        List of (doc_index, score) sorted by relevance (descending)
    """
    device = next(model.parameters()).device
    scores = []
    
    for i in range(0, len(documents), batch_size):
        batch = documents[i:i + batch_size]
        
        # Prepare batch
        queries = [f"query: {query}"] * len(batch)
        docs = [f"document: {title} {text}" for title, text in batch]
        
        inputs = tokenizer(
            queries,
            docs,
            return_tensors="pt",
            truncation=True,
            max_length=228,
            padding=True,
            return_attention_mask=True
        )
        inputs = {k: v.to(device) for k, v in inputs.items()}
        
        # Score batch
        logits = model(**inputs).logits.squeeze(-1)
        scores.extend(logits.cpu().tolist())
    
    # Rank by score
    ranked = sorted(enumerate(scores), key=lambda x: x[1], reverse=True)
    return ranked


# Example
tokenizer, model = load_reranker("abdoelsayed/dear-8b-reranker-ce-v1")

query = "When did Thomas Edison invent the light bulb?"
documents = [
    ("", "Lightning strike at Seoul National University"),
    ("", "Thomas Edison tried to invent a device for car but failed"),
    ("", "Coffee is good for diet"),
    ("", "KEPCO fixes light problems"),
    ("", "Thomas Edison invented the light bulb in 1879"),
]

ranking = rerank(tokenizer, model, query, documents)
print(ranking)
# Output: [(4, -2.015625), (1, -5.6875), (2, -6.375), (0, -6.5), (3, -6.78125)]
# Document at index 4 is most relevant
```

## Training Details

### Training Data
- **Primary Dataset:** MS MARCO Passage Ranking (~8M pairs)
- **CoT Dataset:** [DeAR-COT](https://huggingface.co/datasets/abdoelsayed/DeAR-COT)
- **Teacher Annotations:** Soft labels from 13B teacher model

### Training Configuration
```python
{
    "base_model": "meta-llama/Llama-3.1-8B",
    "teacher_model": "abdoelsayed/llama2-13b-rankllama-teacher",
    "loss": "Binary Cross-Entropy",
    "distillation": {
        "temperature": 2.0,
        "alpha": 0.1
    },
    "optimizer": "AdamW",
    "learning_rate": 1e-4,
    "batch_size": 2,
    "gradient_accumulation": 2,
    "epochs": 2,
    "max_length": 228,
    "q_max_len": 32,
    "p_max_len": 196,
    "warmup_ratio": 0.1,
    "weight_decay": 0.01,
    "bf16": true
}
```

### Hardware
- **GPUs:** 4x NVIDIA A100 (40GB)
- **Training Time:** ~34 hours
- **Framework:** DeepSpeed ZeRO Stage 2
- **Memory Usage:** ~38GB per GPU

### Loss Function

**Binary Cross-Entropy** with Knowledge Distillation:

```python
L_total = (1 - α) * BCE(y_pred, y_true) + α * KL(σ(z_s/T), σ(z_t/T))

where:
- BCE: Binary cross-entropy loss
- KL: KL divergence
- z_s: Student logits
- z_t: Teacher logits
- T: Temperature (2.0)
- α: Distillation weight (0.1)
- σ: Sigmoid function
```

## Evaluation Results

### TREC Deep Learning

| Dataset | NDCG@10 | NDCG@20 | MRR@10 | MAP |
|---------|---------|---------|--------|-----|
| DL19 | 73.90 | 69.82 | 87.3 | 44.92 |
| DL20 | 72.10 | 68.45 | 85.1 | 42.67 |

### BEIR Benchmark

| Dataset | NDCG@10 | NDCG@100 |
|---------|---------|----------|
| MS MARCO | 68.5 | 75.2 |
| NQ | 51.8 | 69.4 |
| HotpotQA | 61.2 | 74.8 |
| FiQA | 46.8 | 62.3 |
| ArguAna | 58.9 | 71.5 |
| SciFact | 73.1 | 82.6 |
| TREC-COVID | 84.7 | 88.3 |
| NFCorpus | 39.4 | 51.7 |
| **Average** | **44.8** | **68.2** |

### Efficiency Metrics

| Metric | Value |
|--------|-------|
| Inference Time (batch=64) | 2.2s |
| Throughput | ~45 docs/sec |
| GPU Memory (inference) | 18GB |
| Model Size (BF16) | 16GB |

## Comparison

| Model | Loss | DL19 | DL20 | BEIR Avg | Speed (s) |
|-------|------|------|------|----------|-----------|
| **DeAR-8B-CE** | BCE | 73.9 | 72.1 | 44.8 | 2.2 |
| **DeAR-8B-RankNet** | RankNet | 74.5 | 72.8 | 45.2 | 2.2 |
| MonoT5-3B | - | 71.8 | 68.9 | 43.5 | 3.5 |
| Teacher-13B | - | 73.8 | 71.2 | 44.8 | 5.8 |

**Key Observations:**
- Slightly lower performance than RankNet variant
- Identical inference speed
- More stable training (simpler loss)
- Better for binary relevance tasks

## Model Architecture

```
Input Format: "query: [QUERY] document: [TITLE] [TEXT]"
    ↓
Tokenization (max_length=228)
    ↓
LLaMA-3.1-8B Transformer
    ↓
[CLS] Token Pooling
    ↓
Linear(hidden_size → 1)
    ↓
Sigmoid (optional)
    ↓
Relevance Score
```

## When to Use This Model

**Best for:**
- ✅ Binary relevance classification
- ✅ Large-scale reranking (fast inference)
- ✅ General-purpose IR tasks
- ✅ Resource-constrained environments

**Consider alternatives for:**
- ❌ Listwise ranking (use DeAR-8B-Listwise)
- ❌ Maximum performance (use RankNet variant)
- ❌ Extreme low-latency (use 3B models)

## Limitations

1. **Document Truncation:** Limited to 196 tokens per document
2. **Query Length:** Optimal for queries ≤32 tokens
3. **Language:** English only
4. **Domain:** Trained on MS MARCO (web documents)
5. **Pointwise:** Does not model inter-document dependencies

## Bias and Ethical Considerations

- **Training Data Bias:** Inherits biases from MS MARCO dataset
- **Representation Bias:** May perform differently across demographics
- **Language Bias:** Optimized for English; other languages not evaluated
- **Domain Bias:** Best performance on web-style documents

**Recommendations:**
- Evaluate fairness for your specific use case
- Test on diverse query sets
- Monitor for biased ranking patterns
- Consider domain-specific fine-tuning

## Fine-tuning

To fine-tune on your own data:

```python
from transformers import AutoModelForSequenceClassification, Trainer, TrainingArguments

model = AutoModelForSequenceClassification.from_pretrained(
    "abdoelsayed/dear-8b-reranker-ce-v1",
    num_labels=1
)

training_args = TrainingArguments(
    output_dir="./finetuned-model",
    learning_rate=5e-6,  # Lower LR for fine-tuning
    per_device_train_batch_size=4,
    num_train_epochs=1,
    bf16=True,
    logging_steps=100,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=your_dataset,
)

trainer.train()
```

## Related Models

**DeAR Family (8B):**
- [DeAR-8B-RankNet](https://huggingface.co/abdoelsayed/dear-8b-reranker-ranknet-v1) - RankNet loss variant
- [DeAR-8B-Listwise](https://huggingface.co/abdoelsayed/dear-8b-reranker-listwise-v1) - Generative listwise reranker
- [DeAR-8B-CE-LoRA](https://huggingface.co/abdoelsayed/dear-8b-reranker-ce-lora-v1) - LoRA adapter version

**Other Sizes:**
- [DeAR-3B-CE](https://huggingface.co/abdoelsayed/dear-3b-reranker-ce-v1) - Faster 3B variant

**Resources:**
- [Teacher Model](https://huggingface.co/abdoelsayed/llama2-13b-rankllama-teacher)
- [DeAR-COT Dataset](https://huggingface.co/datasets/abdoelsayed/DeAR-COT)

## Citation

```bibtex
@article{abdallah2025dear,
  title={DeAR: Dual-Stage Document Reranking with Reasoning Agents via LLM Distillation},
  author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Jatowt, Adam},
  journal={arXiv preprint arXiv:2508.16998},
  year={2025}
}
```

## License

MIT License

## More Information

- **GitHub:** [DataScienceUIBK/DeAR-Reranking](https://github.com/DataScienceUIBK/DeAR-Reranking)
- **Paper:** [arXiv:2508.16998](https://arxiv.org/abs/2508.16998)
- **Collection:** [DeAR Model Collection](https://huggingface.co/collections/abdoelsayed/dear-reranking)