--- language: - en license: mit library_name: peft tags: - reranking - information-retrieval - pointwise - lora - peft - binary-cross-entropy base_model: meta-llama/Llama-3.1-8B datasets: - Tevatron/msmarco-passage - abdoelsayed/DeAR-COT pipeline_tag: text-classification --- # DeAR-8B-Reranker-CE-LoRA-v1 ## Model Description **DeAR-8B-Reranker-CE-LoRA-v1** is a LoRA (Low-Rank Adaptation) adapter for neural reranking trained with Binary Cross-Entropy loss. This lightweight adapter requires only ~100MB of storage and can be applied to LLaMA-3.1-8B to achieve near full-model performance with minimal overhead. ## Model Details - **Model Type:** LoRA Adapter for Pointwise Reranking - **Base Model:** meta-llama/Llama-3.1-8B - **Adapter Size:** ~100MB - **Training Method:** LoRA with Binary Cross-Entropy + Knowledge Distillation - **LoRA Rank:** 16 - **LoRA Alpha:** 32 - **Trainable Parameters:** 67M (0.8% of total) ## Key Features ✅ **Ultra Lightweight:** Only 100MB storage ✅ **Efficient:** 3x faster training than full fine-tuning ✅ **High Performance:** 98% of full model accuracy ✅ **Easy Integration:** Simple adapter loading ✅ **Classification-based:** Binary relevance prediction ## Usage ### Load and Use ```python import torch from transformers import AutoTokenizer, AutoModelForSequenceClassification from peft import PeftModel, PeftConfig # Load LoRA adapter adapter_path = "abdoelsayed/dear-8b-reranker-ce-lora-v1" config = PeftConfig.from_pretrained(adapter_path) # Load tokenizer tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path) if tokenizer.pad_token is None: tokenizer.pad_token = tokenizer.eos_token # Load base model base_model = AutoModelForSequenceClassification.from_pretrained( config.base_model_name_or_path, num_labels=1, torch_dtype=torch.bfloat16 ) # Load and merge LoRA model = PeftModel.from_pretrained(base_model, adapter_path) model = model.merge_and_unload() model.eval().cuda() # Score query-document pair query = "What is machine learning?" document = "Machine learning is a subset of artificial intelligence..." inputs = tokenizer( f"query: {query}", f"document: {document}", return_tensors="pt", truncation=True, max_length=228, padding="max_length" ) inputs = {k: v.cuda() for k, v in inputs.items()} with torch.no_grad(): score = model(**inputs).logits.squeeze().item() print(f"Relevance score: {score}") ``` ### Batch Reranking ```python @torch.inference_mode() def rerank(tokenizer, model, query: str, documents, batch_size=64): scores = [] device = next(model.parameters()).device for i in range(0, len(documents), batch_size): batch = documents[i:i + batch_size] queries = [f"query: {query}"] * len(batch) docs = [f"document: {title} {text}" for title, text in batch] inputs = tokenizer(queries, docs, return_tensors="pt", truncation=True, max_length=228, padding=True) inputs = {k: v.to(device) for k, v in inputs.items()} logits = model(**inputs).logits.squeeze(-1) scores.extend(logits.cpu().tolist()) return sorted(enumerate(scores), key=lambda x: x[1], reverse=True) ``` ## Training Details ### LoRA Configuration ```python { "r": 16, "lora_alpha": 32, "target_modules": ["q_proj", "v_proj", "k_proj", "o_proj", "gate_proj", "up_proj", "down_proj"], "lora_dropout": 0.05, "bias": "none", "task_type": "SEQ_CLS" } ``` ### Training Hyperparameters - **Learning Rate:** 1e-4 - **Batch Size:** 4 - **Gradient Accumulation:** 2 - **Epochs:** 2 - **Hardware:** 4x A100 (40GB) - **Training Time:** ~12 hours - **Memory:** ~28GB per GPU ## Advantages | Feature | LoRA | Full Model | |---------|------|------------| | Storage | 100MB | 16GB | | Training Time | 12h | 34h | | Performance | 98% | 100% | | Memory | 28GB | 38GB | ## Related Models - [DeAR-8B-CE](https://huggingface.co/abdoelsayed/dear-8b-reranker-ce-v1) - Full model - [DeAR-8B-RankNet-LoRA](https://huggingface.co/abdoelsayed/dear-8b-reranker-ranknet-lora-v1) - RankNet variant - [Teacher Model](https://huggingface.co/abdoelsayed/llama2-13b-rankllama-teacher) ## Citation ```bibtex @article{abdallah2025dear, title={DeAR: Dual-Stage Document Reranking with Reasoning Agents via LLM Distillation}, author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Jatowt, Adam}, journal={arXiv preprint arXiv:2508.16998}, year={2025} } ``` ## License MIT License ## More Information - **GitHub:** [DataScienceUIBK/DeAR-Reranking](https://github.com/DataScienceUIBK/DeAR-Reranking) - **Paper:** [arXiv:2508.16998](https://arxiv.org/abs/2508.16998) - **Collection:** [DeAR Models](https://huggingface.co/collections/abdoelsayed/dear-reranking)