insuperabile
/

modernbert-factcheck

misinformation-detection

Model card Files Files and versions

insuperabile commited on Jun 22

Commit

673b974

·

verified ·

1 Parent(s): 8e2229c

Create README.md

Files changed (1) hide show

README.md +68 -0

README.md ADDED Viewed

	@@ -0,0 +1,68 @@

+---
+language:
+- en
+tags:
+- fact-checking
+- misinformation-detection
+- bert
+- modernbert
+datasets:
+- FELM
+- FEVER
+- HaluEval
+- LIAR
+metrics:
+- accuracy
+- f1
+---
+# ModernBERT Fact-Checking Model
+## Model Description
+This is a fine-tuned ModernBERT model for binary fact-checking classification, trained on consolidated datasets from multiple authoritative sources. The model determines whether a given claim is likely to be true (label 1) or false (label 0).
+**Base Model:** [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base)
+## Intended Uses
+### Primary Use
+- Automated fact-checking systems
+- Misinformation detection pipelines
+- Content moderation tools
+### Out-of-Scope Uses
+- Multilingual fact-checking (English only)
+- Medical/legal claim verification
+- Highly domain-specific claims
+## Training Data
+The model was trained on a combination of four datasets:
+| Dataset | Samples | Domain |
+|---------|---------|--------|
+| FELM | 34,000 | General claims |
+| FEVER | 145,000 | Wikipedia-based claims |
+| HaluEval | 12,000 | QA hallucination detection |
+| LIAR | 12,800 | Political claims |
+**Total training samples:** ~203,800
+## Training Procedure
+### Hyperparameters
+- Learning Rate: 5e-5
+- Batch Size: 32
+- Epochs: 1
+- Max Sequence Length: 512 tokens
+- Optimizer: adamw_torch_fused
+### Preprocessing
+All datasets were converted to a standardized format:
+```python
+{
+  "text": "full claim text",
+  "label": 0.0 or 1.0,
+  "source": "dataset_name"
+}