insuperabile commited on
Commit
673b974
·
verified ·
1 Parent(s): 8e2229c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +68 -0
README.md ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - fact-checking
6
+ - misinformation-detection
7
+ - bert
8
+ - modernbert
9
+ datasets:
10
+ - FELM
11
+ - FEVER
12
+ - HaluEval
13
+ - LIAR
14
+ metrics:
15
+ - accuracy
16
+ - f1
17
+ ---
18
+
19
+ # ModernBERT Fact-Checking Model
20
+
21
+ ## Model Description
22
+
23
+ This is a fine-tuned ModernBERT model for binary fact-checking classification, trained on consolidated datasets from multiple authoritative sources. The model determines whether a given claim is likely to be true (label 1) or false (label 0).
24
+
25
+ **Base Model:** [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base)
26
+
27
+ ## Intended Uses
28
+
29
+ ### Primary Use
30
+ - Automated fact-checking systems
31
+ - Misinformation detection pipelines
32
+ - Content moderation tools
33
+
34
+ ### Out-of-Scope Uses
35
+ - Multilingual fact-checking (English only)
36
+ - Medical/legal claim verification
37
+ - Highly domain-specific claims
38
+
39
+ ## Training Data
40
+
41
+ The model was trained on a combination of four datasets:
42
+
43
+ | Dataset | Samples | Domain |
44
+ |---------|---------|--------|
45
+ | FELM | 34,000 | General claims |
46
+ | FEVER | 145,000 | Wikipedia-based claims |
47
+ | HaluEval | 12,000 | QA hallucination detection |
48
+ | LIAR | 12,800 | Political claims |
49
+
50
+ **Total training samples:** ~203,800
51
+
52
+ ## Training Procedure
53
+
54
+ ### Hyperparameters
55
+ - Learning Rate: 5e-5
56
+ - Batch Size: 32
57
+ - Epochs: 1
58
+ - Max Sequence Length: 512 tokens
59
+ - Optimizer: adamw_torch_fused
60
+
61
+ ### Preprocessing
62
+ All datasets were converted to a standardized format:
63
+ ```python
64
+ {
65
+ "text": "full claim text",
66
+ "label": 0.0 or 1.0,
67
+ "source": "dataset_name"
68
+ }