File size: 1,967 Bytes
673b974 8eeaf64 0528e7c 8eeaf64 673b974 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
---
language:
- en
tags:
- fact-checking
- misinformation-detection
- bert
- modernbert
datasets:
- FELM
- FEVER
- HaluEval
- LIAR
metrics:
- accuracy
- f1
---
# ModernBERT Fact-Checking Model
## Model Description
This is a fine-tuned ModernBERT model for binary fact-checking classification, trained on consolidated datasets from multiple authoritative sources. The model determines whether a given claim is likely to be true (label 1) or false (label 0).
**Base Model:** [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base)
## Intended Uses
### Primary Use
- Automated fact-checking systems
- Misinformation detection pipelines
- Content moderation tools
### Out-of-Scope Uses
- Multilingual fact-checking (English only)
- Medical/legal claim verification
- Highly domain-specific claims
### How to use
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("your-username/modernbert-factchecking")
model = AutoModelForSequenceClassification.from_pretrained("your-username/modernbert-factchecking")
inputs = tokenizer("Your claim to verify here", return_tensors="pt", truncation=True, max_length=512)
outputs = model(**inputs)
predictions = torch.softmax(outputs.logits, dim=-1)
```
## Training Data
The model was trained on a combination of four datasets:
| Dataset | Samples | Domain |
|---------|---------|--------|
| FELM | 34,000 | General claims |
| FEVER | 145,000 | Wikipedia-based claims |
| HaluEval | 12,000 | QA hallucination detection |
| LIAR | 12,800 | Political claims |
**Total training samples:** ~203,800
## Training Procedure
### Hyperparameters
- Learning Rate: 5e-5
- Batch Size: 32
- Epochs: 1
- Max Sequence Length: 512 tokens
- Optimizer: adamw_torch_fused
### Preprocessing
All datasets were converted to a standardized format:
```python
{
"text": "full claim text",
"label": 0.0 or 1.0,
"source": "dataset_name"
} |