File size: 3,114 Bytes

bf2def1
 
8fbe23a
 
 
 
 
 
 
 
bf2def1
 
76bacf2
bf2def1
9e9a447
bf2def1
76bacf2
 
 
bf2def1
76bacf2
bf2def1
1971d0f
76bacf2
5975f60
45795a9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b085fda
45795a9
 
 
 
 
 
76bacf2
9882043
45795a9
 
 
 
 
 
 
 
 
bf2def1
45795a9
 
 
 
76bacf2
bf2def1
6cd98f4
 
 
 
 
 
 
 
 
 
76bacf2
bf2def1
1971d0f
e74a2a5
7f8fb46
 
 
 
 
 
 
 
 
 
 
 
 
 
76bacf2
bf2def1
1971d0f
76bacf2
bf2def1
76bacf2
bf2def1
1971d0f
76bacf2
 
 
d449f40
76bacf2
d449f40
 
bf2def1
76bacf2

---
library_name: transformers
license: mit
datasets:
- rajpurkar/squad_v2
language:
- en
base_model:
- answerdotai/ModernBERT-base
pipeline_tag: question-answering
---

# ModernBERT-base-squad2

ModernBERT fine-tuned for extractive question answering tasks. Use this model to extract specific spans of text within a given context that directly answer questions.

- Base Model: `answerdotai/ModernBERT-base`
- Fine-tuned on: SQuAD 2.0 dataset
- Use: Extractive question answering

---

# Usage
```python
from transformers import AutoTokenizer, AutoModel
import torch.nn.functional as F
import torch

def predict_answers(batch, model, tokenizer, device):
    inputs = tokenizer(
        [item["question"] for item in batch],
        [item["context"] for item in batch],
        return_tensors="pt",
        max_length=512,
        truncation=True,
        padding="max_length",
    ).to(device)

    with torch.no_grad():
        outputs = model(**inputs)

    start_probs = F.softmax(outputs.start_logits, dim=-1)
    end_probs = F.softmax(outputs.end_logits, dim=-1)
    start_indices = torch.argmax(start_probs, dim=-1)
    end_indices = torch.argmax(end_probs, dim=-1)

    return [
        (
            tokenizer.decode(inputs["input_ids"][i][start:end + 1], skip_special_tokens=True),
            (start_probs[i, start] * end_probs[i, end]).item(),
        )
        for i, (start, end) in enumerate(zip(start_indices, end_indices))
    ]

model_id = "smangla/ModernBERT-base-squad2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModel.from_pretrained(model_id, trust_remote_code=True)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

batch = [
    {"question": "What is the capital of France?", "context": "Paris is the capital of France."},
    {"question": "Who wrote Hamlet?", "context": "William Shakespeare wrote the play Hamlet."},
]

results = predict_answers(batch, model, tokenizer, device)

for i, (answer, prob) in enumerate(results):
    print(f"Question {i + 1}: {batch[i]['question']}")
    print(f"Answer: {answer}")
    print(f"Probability: {prob:.4f}")
```

Output:
```
Question 1: What is the capital of France?
Answer: Paris
Probability: 0.9929
Question 2: Who wrote Hamlet?
Answer: William Shakespeare
Probability: 0.9995
```

---

# Metrics
Evaluation results using the official evaluation script on SQuAD 2.0 dev set:

```json
{
  "exact": 80.29141750189505,
  "f1": 83.22890970115323,
  "total": 11873,
  "HasAns_exact": 72.08164642375169,
  "HasAns_f1": 77.96505480462089,
  "HasAns_total": 5928,
  "NoAns_exact": 88.47771236333053,
  "NoAns_f1": 88.47771236333053,
  "NoAns_total": 5945
}
```
---

# Limitations
- The model solely extracts answers from the input context to generate answers and does not use external knowledge.

---

# Training Details
- **Dataset:** SQuAD 2.0 ([https://huggingface.co/datasets/rajpurkar/squad_v2](https://huggingface.co/datasets/rajpurkar/squad_v2))
- **Epochs:** 4
- **Batch Size:** 32
- **Scheduler**: Linear
- **Learning Rate:** 5e-5
- **Weight decay:** 0.01
- **Warmup ratio:** 0.6

---