rlogh commited on
Commit
b431aac
·
verified ·
1 Parent(s): 8edcbb2

Upload folder using huggingface_hub

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. README.md +176 -176
  2. checkpoint-18/config.json +36 -36
  3. checkpoint-18/model.safetensors +1 -1
  4. checkpoint-18/optimizer.pt +2 -2
  5. checkpoint-18/rng_state.pth +2 -2
  6. checkpoint-18/scheduler.pt +1 -1
  7. checkpoint-18/special_tokens_map.json +7 -7
  8. checkpoint-18/tokenizer_config.json +58 -58
  9. checkpoint-18/trainer_state.json +73 -73
  10. checkpoint-18/training_args.bin +1 -1
  11. checkpoint-18/vocab.txt +0 -0
  12. checkpoint-27/config.json +36 -36
  13. checkpoint-27/model.safetensors +1 -1
  14. checkpoint-27/optimizer.pt +2 -2
  15. checkpoint-27/rng_state.pth +2 -2
  16. checkpoint-27/scheduler.pt +1 -1
  17. checkpoint-27/special_tokens_map.json +7 -7
  18. checkpoint-27/tokenizer_config.json +58 -58
  19. checkpoint-27/trainer_state.json +96 -96
  20. checkpoint-27/training_args.bin +1 -1
  21. checkpoint-27/vocab.txt +0 -0
  22. checkpoint-36/config.json +36 -36
  23. checkpoint-36/model.safetensors +1 -1
  24. checkpoint-36/optimizer.pt +2 -2
  25. checkpoint-36/rng_state.pth +2 -2
  26. checkpoint-36/scheduler.pt +1 -1
  27. checkpoint-36/special_tokens_map.json +7 -7
  28. checkpoint-36/tokenizer_config.json +58 -58
  29. checkpoint-36/trainer_state.json +119 -119
  30. checkpoint-36/training_args.bin +1 -1
  31. checkpoint-36/vocab.txt +0 -0
  32. checkpoint-45/config.json +36 -36
  33. checkpoint-45/model.safetensors +1 -1
  34. checkpoint-45/optimizer.pt +2 -2
  35. checkpoint-45/rng_state.pth +2 -2
  36. checkpoint-45/scheduler.pt +1 -1
  37. checkpoint-45/special_tokens_map.json +7 -7
  38. checkpoint-45/tokenizer_config.json +58 -58
  39. checkpoint-45/trainer_state.json +142 -142
  40. checkpoint-45/training_args.bin +1 -1
  41. checkpoint-45/vocab.txt +0 -0
  42. checkpoint-54/config.json +36 -36
  43. checkpoint-54/model.safetensors +1 -1
  44. checkpoint-54/optimizer.pt +2 -2
  45. checkpoint-54/rng_state.pth +2 -2
  46. checkpoint-54/scheduler.pt +1 -1
  47. checkpoint-54/special_tokens_map.json +7 -7
  48. checkpoint-54/tokenizer_config.json +58 -58
  49. checkpoint-54/trainer_state.json +158 -158
  50. checkpoint-54/training_args.bin +1 -1
README.md CHANGED
@@ -1,176 +1,176 @@
1
- ---
2
- license: mit
3
- tags:
4
- - text-classification
5
- - cheese
6
- - texture
7
- - distilbert
8
- - transformers
9
- - fine-tuned
10
- datasets:
11
- - aslan-ng/cheese-text
12
- metrics:
13
- - accuracy
14
- model-index:
15
- - name: Cheese Texture Classifier (DistilBERT)
16
- results:
17
- - task:
18
- type: text-classification
19
- name: Cheese Texture Classification
20
- dataset:
21
- type: aslan-ng/cheese-text
22
- name: Cheese Text Dataset
23
- metrics:
24
- - type: accuracy
25
- value: 0.400
26
- name: Test Accuracy
27
- ---
28
-
29
- # Cheese Texture Classifier (DistilBERT)
30
-
31
- **Model Creator**: Rumi Loghmani (@rlogh)
32
- **Original Dataset**: aslan-ng/cheese-text (by Aslan Noorghasemi)
33
-
34
- This model performs 4-class texture classification on cheese descriptions using fine-tuned DistilBERT.
35
-
36
- ## Model Description
37
-
38
- - **Architecture**: DistilBERT-base-uncased fine-tuned for sequence classification
39
- - **Task**: 4-class texture classification (hard, semi-hard, semi-soft, soft)
40
- - **Input**: Cheese description text (up to 512 tokens)
41
- - **Output**: 4-class probability distribution
42
-
43
- ## Training Details
44
-
45
- ### Data
46
- - **Dataset**: [aslan-ng/cheese-text](https://huggingface.co/datasets/aslan-ng/cheese-text) (original split: 100 samples)
47
- - **Train/Val/Test Split**: 70/15/15 (stratified)
48
- - **Text Source**: Cheese descriptions from the dataset
49
- - **Labels**: Texture categories (hard, semi-hard, semi-soft, soft)
50
-
51
- ### Preprocessing
52
- - **Tokenization**: DistilBERT tokenizer with 512 max length
53
- - **Padding**: Max length padding
54
- - **Truncation**: Long descriptions truncated to 512 tokens
55
-
56
- ### Training Setup
57
- - **Model**: distilbert-base-uncased
58
- - **Epochs**: 10
59
- - **Batch Size**: 8 (train/val)
60
- - **Learning Rate**: 2e-5
61
- - **Warmup Steps**: 10
62
- - **Weight Decay**: 0.01
63
- - **Optimizer**: AdamW
64
- - **Scheduler**: Linear warmup + linear decay
65
- - **Mixed Precision**: FP16 (if GPU available)
66
- - **Seed**: 42 (for reproducibility)
67
-
68
- ### Hardware/Compute
69
- - **Training Device**: GPU (CUDA)
70
- - **Training Time**: ~5-10 minutes on GPU
71
- - **Model Size**: ~67M parameters
72
- - **Memory Usage**: ~2-4GB GPU memory
73
-
74
- ## Performance
75
-
76
- - **Test Accuracy**: 0.400
77
- - **Test Loss**: 1.274
78
-
79
- ### Class-wise Performance
80
- precision recall f1-score support
81
-
82
- hard 0.50 0.33 0.40 3
83
- semi-hard 0.33 0.50 0.40 4
84
- semi-soft 0.33 0.50 0.40 4
85
- soft 1.00 0.25 0.40 4
86
-
87
- accuracy 0.40 15
88
- macro avg 0.54 0.40 0.40 15
89
- weighted avg 0.54 0.40 0.40 15
90
-
91
-
92
- ## Usage
93
-
94
- ```python
95
- from transformers import AutoTokenizer, AutoModelForSequenceClassification
96
- import torch
97
-
98
- # Load model and tokenizer
99
- model_name = "rlogh/cheese-texture-classifier-distilbert"
100
- tokenizer = AutoTokenizer.from_pretrained(model_name)
101
- model = AutoModelForSequenceClassification.from_pretrained(model_name)
102
-
103
- # Example prediction
104
- text = "Feta is a crumbly, tangy Greek cheese with a salty bite and creamy undertones."
105
- inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
106
-
107
- with torch.no_grad():
108
- outputs = model(**inputs)
109
- predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
110
- predicted_class = torch.argmax(predictions, dim=-1).item()
111
-
112
- class_names = ["hard", "semi-hard", "semi-soft", "soft"]
113
- print(f"Predicted texture: {class_names[predicted_class]}")
114
- ```
115
-
116
- ## Class Definitions
117
-
118
- - **Hard**: Firm, aged cheeses that are dense and can be grated (e.g., Parmesan, Cheddar)
119
- - **Semi-hard**: Moderately firm cheeses with some flexibility (e.g., Gouda, Swiss)
120
- - **Semi-soft**: Cheeses with some give but maintain shape (e.g., Mozzarella, Blue cheese)
121
- - **Soft**: Creamy, spreadable cheeses (e.g., Brie, Camembert, Cottage cheese)
122
-
123
- ## Limitations and Ethics
124
-
125
- ### Limitations
126
- - **Small Dataset**: Trained on only 100 samples, limiting generalization
127
- - **Text Quality**: Performance depends on description quality and consistency
128
- - **Subjective Labels**: Texture classification has inherent subjectivity
129
- - **Domain Specific**: Only applicable to cheese texture classification
130
- - **Language**: English-only model
131
-
132
- ### Ethical Considerations
133
- - **Bias**: Model may reflect biases in the original dataset
134
- - **Cultural Context**: Cheese descriptions may be culturally specific
135
- - **Commercial Use**: Not intended for commercial cheese production decisions
136
- - **Accuracy**: Should not be used for critical food safety applications
137
-
138
- ### Recommendations
139
- - Use for educational/research purposes only
140
- - Validate predictions with domain experts
141
- - Consider cultural context when applying to different regions
142
- - Retrain with larger, more diverse datasets for production use
143
-
144
- ## AI Usage Disclosure
145
-
146
- This model was developed using:
147
- - **Base Model**: DistilBERT (distilbert-base-uncased)
148
- - **Training Framework**: Hugging Face Transformers
149
- - **Fine-tuning**: Standard BERT fine-tuning techniques
150
- - **No Additional AI**: No other AI systems were used in development
151
-
152
- ## Citation
153
-
154
- **Model Citation:**
155
- ```bibtex
156
- @model{rlogh/cheese-texture-classifier-distilbert,
157
- title={Cheese Texture Classifier (DistilBERT)},
158
- author={Rumi Loghmani},
159
- year={2024},
160
- url={https://huggingface.co/rlogh/cheese-texture-classifier-distilbert}
161
- }
162
- ```
163
-
164
- **Dataset Citation:**
165
- ```bibtex
166
- @dataset{aslan-ng/cheese-text,
167
- title={Cheese Text Dataset},
168
- author={Aslan Noorghasemi},
169
- year={2024},
170
- url={https://huggingface.co/datasets/aslan-ng/cheese-text}
171
- }
172
- ```
173
-
174
- ## License
175
-
176
- MIT License - See LICENSE file for details.
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - text-classification
5
+ - cheese
6
+ - texture
7
+ - distilbert
8
+ - transformers
9
+ - fine-tuned
10
+ datasets:
11
+ - aslan-ng/cheese-text
12
+ metrics:
13
+ - accuracy
14
+ model-index:
15
+ - name: Cheese Texture Classifier (DistilBERT)
16
+ results:
17
+ - task:
18
+ type: text-classification
19
+ name: Cheese Texture Classification
20
+ dataset:
21
+ type: aslan-ng/cheese-text
22
+ name: Cheese Text Dataset
23
+ metrics:
24
+ - type: accuracy
25
+ value: 0.400
26
+ name: Test Accuracy
27
+ ---
28
+
29
+ # Cheese Texture Classifier (DistilBERT)
30
+
31
+ **Model Creator**: Rumi Loghmani (@rlogh)
32
+ **Original Dataset**: aslan-ng/cheese-text (by Aslan Noorghasemi)
33
+
34
+ This model performs 4-class texture classification on cheese descriptions using fine-tuned DistilBERT.
35
+
36
+ ## Model Description
37
+
38
+ - **Architecture**: DistilBERT-base-uncased fine-tuned for sequence classification
39
+ - **Task**: 4-class texture classification (hard, semi-hard, semi-soft, soft)
40
+ - **Input**: Cheese description text (up to 512 tokens)
41
+ - **Output**: 4-class probability distribution
42
+
43
+ ## Training Details
44
+
45
+ ### Data
46
+ - **Dataset**: [aslan-ng/cheese-text](https://huggingface.co/datasets/aslan-ng/cheese-text) (original split: 100 samples)
47
+ - **Train/Val/Test Split**: 70/15/15 (stratified)
48
+ - **Text Source**: Cheese descriptions from the dataset
49
+ - **Labels**: Texture categories (hard, semi-hard, semi-soft, soft)
50
+
51
+ ### Preprocessing
52
+ - **Tokenization**: DistilBERT tokenizer with 512 max length
53
+ - **Padding**: Max length padding
54
+ - **Truncation**: Long descriptions truncated to 512 tokens
55
+
56
+ ### Training Setup
57
+ - **Model**: distilbert-base-uncased
58
+ - **Epochs**: 10
59
+ - **Batch Size**: 8 (train/val)
60
+ - **Learning Rate**: 2e-5
61
+ - **Warmup Steps**: 10
62
+ - **Weight Decay**: 0.01
63
+ - **Optimizer**: AdamW
64
+ - **Scheduler**: Linear warmup + linear decay
65
+ - **Mixed Precision**: FP16 (if GPU available)
66
+ - **Seed**: 42 (for reproducibility)
67
+
68
+ ### Hardware/Compute
69
+ - **Training Device**: CPU
70
+ - **Training Time**: ~5-10 minutes on GPU
71
+ - **Model Size**: ~67M parameters
72
+ - **Memory Usage**: ~2-4GB GPU memory
73
+
74
+ ## Performance
75
+
76
+ - **Test Accuracy**: 0.400
77
+ - **Test Loss**: 1.290
78
+
79
+ ### Class-wise Performance
80
+ precision recall f1-score support
81
+
82
+ hard 0.50 0.33 0.40 3
83
+ semi-hard 0.29 0.50 0.36 4
84
+ semi-soft 0.40 0.50 0.44 4
85
+ soft 1.00 0.25 0.40 4
86
+
87
+ accuracy 0.40 15
88
+ macro avg 0.55 0.40 0.40 15
89
+ weighted avg 0.55 0.40 0.40 15
90
+
91
+
92
+ ## Usage
93
+
94
+ ```python
95
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
96
+ import torch
97
+
98
+ # Load model and tokenizer
99
+ model_name = "rlogh/cheese-texture-classifier-distilbert"
100
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
101
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
102
+
103
+ # Example prediction
104
+ text = "Feta is a crumbly, tangy Greek cheese with a salty bite and creamy undertones."
105
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
106
+
107
+ with torch.no_grad():
108
+ outputs = model(**inputs)
109
+ predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
110
+ predicted_class = torch.argmax(predictions, dim=-1).item()
111
+
112
+ class_names = ["hard", "semi-hard", "semi-soft", "soft"]
113
+ print(f"Predicted texture: {class_names[predicted_class]}")
114
+ ```
115
+
116
+ ## Class Definitions
117
+
118
+ - **Hard**: Firm, aged cheeses that are dense and can be grated (e.g., Parmesan, Cheddar)
119
+ - **Semi-hard**: Moderately firm cheeses with some flexibility (e.g., Gouda, Swiss)
120
+ - **Semi-soft**: Cheeses with some give but maintain shape (e.g., Mozzarella, Blue cheese)
121
+ - **Soft**: Creamy, spreadable cheeses (e.g., Brie, Camembert, Cottage cheese)
122
+
123
+ ## Limitations and Ethics
124
+
125
+ ### Limitations
126
+ - **Small Dataset**: Trained on only 100 samples, limiting generalization
127
+ - **Text Quality**: Performance depends on description quality and consistency
128
+ - **Subjective Labels**: Texture classification has inherent subjectivity
129
+ - **Domain Specific**: Only applicable to cheese texture classification
130
+ - **Language**: English-only model
131
+
132
+ ### Ethical Considerations
133
+ - **Bias**: Model may reflect biases in the original dataset
134
+ - **Cultural Context**: Cheese descriptions may be culturally specific
135
+ - **Commercial Use**: Not intended for commercial cheese production decisions
136
+ - **Accuracy**: Should not be used for critical food safety applications
137
+
138
+ ### Recommendations
139
+ - Use for educational/research purposes only
140
+ - Validate predictions with domain experts
141
+ - Consider cultural context when applying to different regions
142
+ - Retrain with larger, more diverse datasets for production use
143
+
144
+ ## AI Usage Disclosure
145
+
146
+ This model was developed using:
147
+ - **Base Model**: DistilBERT (distilbert-base-uncased)
148
+ - **Training Framework**: Hugging Face Transformers
149
+ - **Fine-tuning**: Standard BERT fine-tuning techniques
150
+ - **No Additional AI**: No other AI systems were used in development
151
+
152
+ ## Citation
153
+
154
+ **Model Citation:**
155
+ ```bibtex
156
+ @model{rlogh/cheese-texture-classifier-distilbert,
157
+ title={Cheese Texture Classifier (DistilBERT)},
158
+ author={Rumi Loghmani},
159
+ year={2024},
160
+ url={https://huggingface.co/rlogh/cheese-texture-classifier-distilbert}
161
+ }
162
+ ```
163
+
164
+ **Dataset Citation:**
165
+ ```bibtex
166
+ @dataset{aslan-ng/cheese-text,
167
+ title={Cheese Text Dataset},
168
+ author={Aslan Noorghasemi},
169
+ year={2024},
170
+ url={https://huggingface.co/datasets/aslan-ng/cheese-text}
171
+ }
172
+ ```
173
+
174
+ ## License
175
+
176
+ MIT License - See LICENSE file for details.
checkpoint-18/config.json CHANGED
@@ -1,36 +1,36 @@
1
- {
2
- "activation": "gelu",
3
- "architectures": [
4
- "DistilBertForSequenceClassification"
5
- ],
6
- "attention_dropout": 0.1,
7
- "dim": 768,
8
- "dropout": 0.1,
9
- "dtype": "float32",
10
- "hidden_dim": 3072,
11
- "id2label": {
12
- "0": "hard",
13
- "1": "semi-hard",
14
- "2": "semi-soft",
15
- "3": "soft"
16
- },
17
- "initializer_range": 0.02,
18
- "label2id": {
19
- "hard": 0,
20
- "semi-hard": 1,
21
- "semi-soft": 2,
22
- "soft": 3
23
- },
24
- "max_position_embeddings": 512,
25
- "model_type": "distilbert",
26
- "n_heads": 12,
27
- "n_layers": 6,
28
- "pad_token_id": 0,
29
- "problem_type": "single_label_classification",
30
- "qa_dropout": 0.1,
31
- "seq_classif_dropout": 0.2,
32
- "sinusoidal_pos_embds": false,
33
- "tie_weights_": true,
34
- "transformers_version": "4.56.1",
35
- "vocab_size": 30522
36
- }
 
1
+ {
2
+ "activation": "gelu",
3
+ "architectures": [
4
+ "DistilBertForSequenceClassification"
5
+ ],
6
+ "attention_dropout": 0.1,
7
+ "dim": 768,
8
+ "dropout": 0.1,
9
+ "dtype": "float32",
10
+ "hidden_dim": 3072,
11
+ "id2label": {
12
+ "0": "hard",
13
+ "1": "semi-hard",
14
+ "2": "semi-soft",
15
+ "3": "soft"
16
+ },
17
+ "initializer_range": 0.02,
18
+ "label2id": {
19
+ "hard": 0,
20
+ "semi-hard": 1,
21
+ "semi-soft": 2,
22
+ "soft": 3
23
+ },
24
+ "max_position_embeddings": 512,
25
+ "model_type": "distilbert",
26
+ "n_heads": 12,
27
+ "n_layers": 6,
28
+ "pad_token_id": 0,
29
+ "problem_type": "single_label_classification",
30
+ "qa_dropout": 0.1,
31
+ "seq_classif_dropout": 0.2,
32
+ "sinusoidal_pos_embds": false,
33
+ "tie_weights_": true,
34
+ "transformers_version": "4.56.1",
35
+ "vocab_size": 30522
36
+ }
checkpoint-18/model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9186fe36c00e96c7e1bcb163360a0c126b68f912e75a7b99162c4d2bb613851c
3
  size 267838720
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:11fe9e2b21185c4f2954811e3a55e84e135c56e516110a0c1fa796bedc5fac7e
3
  size 267838720
checkpoint-18/optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:764865d51baf2ef3802c555c974821b44033f32c9711af3cc57760694e04f01a
3
- size 535740043
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a6c4c757ffaa0a421a500e295e32ff062d66229a9454efbaf7e87273b3f3b4f2
3
+ size 535737163
checkpoint-18/rng_state.pth CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:128fd6789dbecdaa703802c84141d4eeb7956a1f3aa57027a4a20d800b5b22e4
3
- size 14645
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:af011acd911ebca2868ec2c14dc0310b255271ca08299aad8c829637dbba9d41
3
+ size 14455
checkpoint-18/scheduler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c5cb1142c9841e543151dfa10e04da4d1ddc82e8206c217979304fd6bbdcf000
3
  size 1465
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e2c385954a605c7e89276ade36d44efa2f1e4d36d1ad3f35c3db6322f80f00a
3
  size 1465
checkpoint-18/special_tokens_map.json CHANGED
@@ -1,7 +1,7 @@
1
- {
2
- "cls_token": "[CLS]",
3
- "mask_token": "[MASK]",
4
- "pad_token": "[PAD]",
5
- "sep_token": "[SEP]",
6
- "unk_token": "[UNK]"
7
- }
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
checkpoint-18/tokenizer_config.json CHANGED
@@ -1,58 +1,58 @@
1
- {
2
- "added_tokens_decoder": {
3
- "0": {
4
- "content": "[PAD]",
5
- "lstrip": false,
6
- "normalized": false,
7
- "rstrip": false,
8
- "single_word": false,
9
- "special": true
10
- },
11
- "100": {
12
- "content": "[UNK]",
13
- "lstrip": false,
14
- "normalized": false,
15
- "rstrip": false,
16
- "single_word": false,
17
- "special": true
18
- },
19
- "101": {
20
- "content": "[CLS]",
21
- "lstrip": false,
22
- "normalized": false,
23
- "rstrip": false,
24
- "single_word": false,
25
- "special": true
26
- },
27
- "102": {
28
- "content": "[SEP]",
29
- "lstrip": false,
30
- "normalized": false,
31
- "rstrip": false,
32
- "single_word": false,
33
- "special": true
34
- },
35
- "103": {
36
- "content": "[MASK]",
37
- "lstrip": false,
38
- "normalized": false,
39
- "rstrip": false,
40
- "single_word": false,
41
- "special": true
42
- }
43
- },
44
- "clean_up_tokenization_spaces": true,
45
- "cls_token": "[CLS]",
46
- "do_basic_tokenize": true,
47
- "do_lower_case": true,
48
- "extra_special_tokens": {},
49
- "mask_token": "[MASK]",
50
- "model_max_length": 512,
51
- "never_split": null,
52
- "pad_token": "[PAD]",
53
- "sep_token": "[SEP]",
54
- "strip_accents": null,
55
- "tokenize_chinese_chars": true,
56
- "tokenizer_class": "DistilBertTokenizer",
57
- "unk_token": "[UNK]"
58
- }
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "DistilBertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
checkpoint-18/trainer_state.json CHANGED
@@ -1,73 +1,73 @@
1
- {
2
- "best_global_step": 9,
3
- "best_metric": 0.4,
4
- "best_model_checkpoint": "./cheese-text-classifier\\checkpoint-9",
5
- "epoch": 2.0,
6
- "eval_steps": 500,
7
- "global_step": 18,
8
- "is_hyper_param_search": false,
9
- "is_local_process_zero": true,
10
- "is_world_process_zero": true,
11
- "log_history": [
12
- {
13
- "epoch": 0.5555555555555556,
14
- "grad_norm": 2.3992674350738525,
15
- "learning_rate": 8.000000000000001e-06,
16
- "loss": 1.3393,
17
- "step": 5
18
- },
19
- {
20
- "epoch": 1.0,
21
- "eval_accuracy": 0.4,
22
- "eval_loss": 1.3667316436767578,
23
- "eval_runtime": 0.1993,
24
- "eval_samples_per_second": 75.254,
25
- "eval_steps_per_second": 10.034,
26
- "step": 9
27
- },
28
- {
29
- "epoch": 1.1111111111111112,
30
- "grad_norm": 3.988067865371704,
31
- "learning_rate": 1.8e-05,
32
- "loss": 1.334,
33
- "step": 10
34
- },
35
- {
36
- "epoch": 1.6666666666666665,
37
- "grad_norm": 3.5621254444122314,
38
- "learning_rate": 1.9e-05,
39
- "loss": 1.3276,
40
- "step": 15
41
- },
42
- {
43
- "epoch": 2.0,
44
- "eval_accuracy": 0.4,
45
- "eval_loss": 1.351041555404663,
46
- "eval_runtime": 0.1866,
47
- "eval_samples_per_second": 80.391,
48
- "eval_steps_per_second": 10.719,
49
- "step": 18
50
- }
51
- ],
52
- "logging_steps": 5,
53
- "max_steps": 90,
54
- "num_input_tokens_seen": 0,
55
- "num_train_epochs": 10,
56
- "save_steps": 500,
57
- "stateful_callbacks": {
58
- "TrainerControl": {
59
- "args": {
60
- "should_epoch_stop": false,
61
- "should_evaluate": false,
62
- "should_log": false,
63
- "should_save": true,
64
- "should_training_stop": false
65
- },
66
- "attributes": {}
67
- }
68
- },
69
- "total_flos": 18546097274880.0,
70
- "train_batch_size": 8,
71
- "trial_name": null,
72
- "trial_params": null
73
- }
 
1
+ {
2
+ "best_global_step": 9,
3
+ "best_metric": 0.26666666666666666,
4
+ "best_model_checkpoint": "./cheese-text-classifier/checkpoint-9",
5
+ "epoch": 2.0,
6
+ "eval_steps": 500,
7
+ "global_step": 18,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.5555555555555556,
14
+ "grad_norm": 2.245687246322632,
15
+ "learning_rate": 8.000000000000001e-06,
16
+ "loss": 1.3897,
17
+ "step": 5
18
+ },
19
+ {
20
+ "epoch": 1.0,
21
+ "eval_accuracy": 0.26666666666666666,
22
+ "eval_loss": 1.3788894414901733,
23
+ "eval_runtime": 15.3581,
24
+ "eval_samples_per_second": 0.977,
25
+ "eval_steps_per_second": 0.13,
26
+ "step": 9
27
+ },
28
+ {
29
+ "epoch": 1.1111111111111112,
30
+ "grad_norm": 3.2673349380493164,
31
+ "learning_rate": 1.8e-05,
32
+ "loss": 1.3742,
33
+ "step": 10
34
+ },
35
+ {
36
+ "epoch": 1.6666666666666665,
37
+ "grad_norm": 2.7203733921051025,
38
+ "learning_rate": 1.9e-05,
39
+ "loss": 1.376,
40
+ "step": 15
41
+ },
42
+ {
43
+ "epoch": 2.0,
44
+ "eval_accuracy": 0.26666666666666666,
45
+ "eval_loss": 1.3665523529052734,
46
+ "eval_runtime": 11.7383,
47
+ "eval_samples_per_second": 1.278,
48
+ "eval_steps_per_second": 0.17,
49
+ "step": 18
50
+ }
51
+ ],
52
+ "logging_steps": 5,
53
+ "max_steps": 90,
54
+ "num_input_tokens_seen": 0,
55
+ "num_train_epochs": 10,
56
+ "save_steps": 500,
57
+ "stateful_callbacks": {
58
+ "TrainerControl": {
59
+ "args": {
60
+ "should_epoch_stop": false,
61
+ "should_evaluate": false,
62
+ "should_log": false,
63
+ "should_save": true,
64
+ "should_training_stop": false
65
+ },
66
+ "attributes": {}
67
+ }
68
+ },
69
+ "total_flos": 18546097274880.0,
70
+ "train_batch_size": 8,
71
+ "trial_name": null,
72
+ "trial_params": null
73
+ }
checkpoint-18/training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4e09a73f1acd70625a9205129f2e812d36048a9c26a346e45f2c59de8cf03c1d
3
  size 5713
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:601f05c65c90d8f98b95428e2f3212dc00f04a7ec1ca04f350f1094719f63609
3
  size 5713
checkpoint-18/vocab.txt CHANGED
The diff for this file is too large to render. See raw diff
 
checkpoint-27/config.json CHANGED
@@ -1,36 +1,36 @@
1
- {
2
- "activation": "gelu",
3
- "architectures": [
4
- "DistilBertForSequenceClassification"
5
- ],
6
- "attention_dropout": 0.1,
7
- "dim": 768,
8
- "dropout": 0.1,
9
- "dtype": "float32",
10
- "hidden_dim": 3072,
11
- "id2label": {
12
- "0": "hard",
13
- "1": "semi-hard",
14
- "2": "semi-soft",
15
- "3": "soft"
16
- },
17
- "initializer_range": 0.02,
18
- "label2id": {
19
- "hard": 0,
20
- "semi-hard": 1,
21
- "semi-soft": 2,
22
- "soft": 3
23
- },
24
- "max_position_embeddings": 512,
25
- "model_type": "distilbert",
26
- "n_heads": 12,
27
- "n_layers": 6,
28
- "pad_token_id": 0,
29
- "problem_type": "single_label_classification",
30
- "qa_dropout": 0.1,
31
- "seq_classif_dropout": 0.2,
32
- "sinusoidal_pos_embds": false,
33
- "tie_weights_": true,
34
- "transformers_version": "4.56.1",
35
- "vocab_size": 30522
36
- }
 
1
+ {
2
+ "activation": "gelu",
3
+ "architectures": [
4
+ "DistilBertForSequenceClassification"
5
+ ],
6
+ "attention_dropout": 0.1,
7
+ "dim": 768,
8
+ "dropout": 0.1,
9
+ "dtype": "float32",
10
+ "hidden_dim": 3072,
11
+ "id2label": {
12
+ "0": "hard",
13
+ "1": "semi-hard",
14
+ "2": "semi-soft",
15
+ "3": "soft"
16
+ },
17
+ "initializer_range": 0.02,
18
+ "label2id": {
19
+ "hard": 0,
20
+ "semi-hard": 1,
21
+ "semi-soft": 2,
22
+ "soft": 3
23
+ },
24
+ "max_position_embeddings": 512,
25
+ "model_type": "distilbert",
26
+ "n_heads": 12,
27
+ "n_layers": 6,
28
+ "pad_token_id": 0,
29
+ "problem_type": "single_label_classification",
30
+ "qa_dropout": 0.1,
31
+ "seq_classif_dropout": 0.2,
32
+ "sinusoidal_pos_embds": false,
33
+ "tie_weights_": true,
34
+ "transformers_version": "4.56.1",
35
+ "vocab_size": 30522
36
+ }
checkpoint-27/model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:55b4ae0a98e84722b94644c4c957706abff999d48271c524f0fbcc939eef3fb4
3
  size 267838720
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d4888eeca7da0a0b10e6b5a3fb079da48820c2f9fdd6e89feffcf85bd2a0d0ef
3
  size 267838720
checkpoint-27/optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e43ef309428ff39308802dfcd91f6e2e108ced7d58eee8b2a0454b509db0c9e7
3
- size 535740043
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6fae6998da3a0801252042a7e3bfc3dc0ce8d6c774b40c1093e349b6ab71b28b
3
+ size 535737163
checkpoint-27/rng_state.pth CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:67c9ded891071528d1d7d37f98c82a4150c15973ace82e86232ed82afc455292
3
- size 14645
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:554a6f518e4416ac9e811b97c1541ed8428693dee1381b601f6162a53b82547e
3
+ size 14455
checkpoint-27/scheduler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:261ab7bb7f2dd6da966c1a5d536f05b09c8e99cc5b65e2e3c3057a488f68aed4
3
  size 1465
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:095e33bc11ef1e7d68d549fbfb34695fddb25a664a3a948781f0a9761c641095
3
  size 1465
checkpoint-27/special_tokens_map.json CHANGED
@@ -1,7 +1,7 @@
1
- {
2
- "cls_token": "[CLS]",
3
- "mask_token": "[MASK]",
4
- "pad_token": "[PAD]",
5
- "sep_token": "[SEP]",
6
- "unk_token": "[UNK]"
7
- }
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
checkpoint-27/tokenizer_config.json CHANGED
@@ -1,58 +1,58 @@
1
- {
2
- "added_tokens_decoder": {
3
- "0": {
4
- "content": "[PAD]",
5
- "lstrip": false,
6
- "normalized": false,
7
- "rstrip": false,
8
- "single_word": false,
9
- "special": true
10
- },
11
- "100": {
12
- "content": "[UNK]",
13
- "lstrip": false,
14
- "normalized": false,
15
- "rstrip": false,
16
- "single_word": false,
17
- "special": true
18
- },
19
- "101": {
20
- "content": "[CLS]",
21
- "lstrip": false,
22
- "normalized": false,
23
- "rstrip": false,
24
- "single_word": false,
25
- "special": true
26
- },
27
- "102": {
28
- "content": "[SEP]",
29
- "lstrip": false,
30
- "normalized": false,
31
- "rstrip": false,
32
- "single_word": false,
33
- "special": true
34
- },
35
- "103": {
36
- "content": "[MASK]",
37
- "lstrip": false,
38
- "normalized": false,
39
- "rstrip": false,
40
- "single_word": false,
41
- "special": true
42
- }
43
- },
44
- "clean_up_tokenization_spaces": true,
45
- "cls_token": "[CLS]",
46
- "do_basic_tokenize": true,
47
- "do_lower_case": true,
48
- "extra_special_tokens": {},
49
- "mask_token": "[MASK]",
50
- "model_max_length": 512,
51
- "never_split": null,
52
- "pad_token": "[PAD]",
53
- "sep_token": "[SEP]",
54
- "strip_accents": null,
55
- "tokenize_chinese_chars": true,
56
- "tokenizer_class": "DistilBertTokenizer",
57
- "unk_token": "[UNK]"
58
- }
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "DistilBertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
checkpoint-27/trainer_state.json CHANGED
@@ -1,96 +1,96 @@
1
- {
2
- "best_global_step": 9,
3
- "best_metric": 0.4,
4
- "best_model_checkpoint": "./cheese-text-classifier\\checkpoint-9",
5
- "epoch": 3.0,
6
- "eval_steps": 500,
7
- "global_step": 27,
8
- "is_hyper_param_search": false,
9
- "is_local_process_zero": true,
10
- "is_world_process_zero": true,
11
- "log_history": [
12
- {
13
- "epoch": 0.5555555555555556,
14
- "grad_norm": 2.3992674350738525,
15
- "learning_rate": 8.000000000000001e-06,
16
- "loss": 1.3393,
17
- "step": 5
18
- },
19
- {
20
- "epoch": 1.0,
21
- "eval_accuracy": 0.4,
22
- "eval_loss": 1.3667316436767578,
23
- "eval_runtime": 0.1993,
24
- "eval_samples_per_second": 75.254,
25
- "eval_steps_per_second": 10.034,
26
- "step": 9
27
- },
28
- {
29
- "epoch": 1.1111111111111112,
30
- "grad_norm": 3.988067865371704,
31
- "learning_rate": 1.8e-05,
32
- "loss": 1.334,
33
- "step": 10
34
- },
35
- {
36
- "epoch": 1.6666666666666665,
37
- "grad_norm": 3.5621254444122314,
38
- "learning_rate": 1.9e-05,
39
- "loss": 1.3276,
40
- "step": 15
41
- },
42
- {
43
- "epoch": 2.0,
44
- "eval_accuracy": 0.4,
45
- "eval_loss": 1.351041555404663,
46
- "eval_runtime": 0.1866,
47
- "eval_samples_per_second": 80.391,
48
- "eval_steps_per_second": 10.719,
49
- "step": 18
50
- },
51
- {
52
- "epoch": 2.2222222222222223,
53
- "grad_norm": 3.5041863918304443,
54
- "learning_rate": 1.775e-05,
55
- "loss": 1.3127,
56
- "step": 20
57
- },
58
- {
59
- "epoch": 2.7777777777777777,
60
- "grad_norm": 3.1173243522644043,
61
- "learning_rate": 1.65e-05,
62
- "loss": 1.2905,
63
- "step": 25
64
- },
65
- {
66
- "epoch": 3.0,
67
- "eval_accuracy": 0.4,
68
- "eval_loss": 1.3322917222976685,
69
- "eval_runtime": 0.187,
70
- "eval_samples_per_second": 80.199,
71
- "eval_steps_per_second": 10.693,
72
- "step": 27
73
- }
74
- ],
75
- "logging_steps": 5,
76
- "max_steps": 90,
77
- "num_input_tokens_seen": 0,
78
- "num_train_epochs": 10,
79
- "save_steps": 500,
80
- "stateful_callbacks": {
81
- "TrainerControl": {
82
- "args": {
83
- "should_epoch_stop": false,
84
- "should_evaluate": false,
85
- "should_log": false,
86
- "should_save": true,
87
- "should_training_stop": false
88
- },
89
- "attributes": {}
90
- }
91
- },
92
- "total_flos": 27819145912320.0,
93
- "train_batch_size": 8,
94
- "trial_name": null,
95
- "trial_params": null
96
- }
 
1
+ {
2
+ "best_global_step": 27,
3
+ "best_metric": 0.4,
4
+ "best_model_checkpoint": "./cheese-text-classifier/checkpoint-27",
5
+ "epoch": 3.0,
6
+ "eval_steps": 500,
7
+ "global_step": 27,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.5555555555555556,
14
+ "grad_norm": 2.245687246322632,
15
+ "learning_rate": 8.000000000000001e-06,
16
+ "loss": 1.3897,
17
+ "step": 5
18
+ },
19
+ {
20
+ "epoch": 1.0,
21
+ "eval_accuracy": 0.26666666666666666,
22
+ "eval_loss": 1.3788894414901733,
23
+ "eval_runtime": 15.3581,
24
+ "eval_samples_per_second": 0.977,
25
+ "eval_steps_per_second": 0.13,
26
+ "step": 9
27
+ },
28
+ {
29
+ "epoch": 1.1111111111111112,
30
+ "grad_norm": 3.2673349380493164,
31
+ "learning_rate": 1.8e-05,
32
+ "loss": 1.3742,
33
+ "step": 10
34
+ },
35
+ {
36
+ "epoch": 1.6666666666666665,
37
+ "grad_norm": 2.7203733921051025,
38
+ "learning_rate": 1.9e-05,
39
+ "loss": 1.376,
40
+ "step": 15
41
+ },
42
+ {
43
+ "epoch": 2.0,
44
+ "eval_accuracy": 0.26666666666666666,
45
+ "eval_loss": 1.3665523529052734,
46
+ "eval_runtime": 11.7383,
47
+ "eval_samples_per_second": 1.278,
48
+ "eval_steps_per_second": 0.17,
49
+ "step": 18
50
+ },
51
+ {
52
+ "epoch": 2.2222222222222223,
53
+ "grad_norm": 2.6290931701660156,
54
+ "learning_rate": 1.775e-05,
55
+ "loss": 1.3642,
56
+ "step": 20
57
+ },
58
+ {
59
+ "epoch": 2.7777777777777777,
60
+ "grad_norm": 2.063326358795166,
61
+ "learning_rate": 1.65e-05,
62
+ "loss": 1.3545,
63
+ "step": 25
64
+ },
65
+ {
66
+ "epoch": 3.0,
67
+ "eval_accuracy": 0.4,
68
+ "eval_loss": 1.3509334325790405,
69
+ "eval_runtime": 11.9563,
70
+ "eval_samples_per_second": 1.255,
71
+ "eval_steps_per_second": 0.167,
72
+ "step": 27
73
+ }
74
+ ],
75
+ "logging_steps": 5,
76
+ "max_steps": 90,
77
+ "num_input_tokens_seen": 0,
78
+ "num_train_epochs": 10,
79
+ "save_steps": 500,
80
+ "stateful_callbacks": {
81
+ "TrainerControl": {
82
+ "args": {
83
+ "should_epoch_stop": false,
84
+ "should_evaluate": false,
85
+ "should_log": false,
86
+ "should_save": true,
87
+ "should_training_stop": false
88
+ },
89
+ "attributes": {}
90
+ }
91
+ },
92
+ "total_flos": 27819145912320.0,
93
+ "train_batch_size": 8,
94
+ "trial_name": null,
95
+ "trial_params": null
96
+ }
checkpoint-27/training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4e09a73f1acd70625a9205129f2e812d36048a9c26a346e45f2c59de8cf03c1d
3
  size 5713
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:601f05c65c90d8f98b95428e2f3212dc00f04a7ec1ca04f350f1094719f63609
3
  size 5713
checkpoint-27/vocab.txt CHANGED
The diff for this file is too large to render. See raw diff
 
checkpoint-36/config.json CHANGED
@@ -1,36 +1,36 @@
1
- {
2
- "activation": "gelu",
3
- "architectures": [
4
- "DistilBertForSequenceClassification"
5
- ],
6
- "attention_dropout": 0.1,
7
- "dim": 768,
8
- "dropout": 0.1,
9
- "dtype": "float32",
10
- "hidden_dim": 3072,
11
- "id2label": {
12
- "0": "hard",
13
- "1": "semi-hard",
14
- "2": "semi-soft",
15
- "3": "soft"
16
- },
17
- "initializer_range": 0.02,
18
- "label2id": {
19
- "hard": 0,
20
- "semi-hard": 1,
21
- "semi-soft": 2,
22
- "soft": 3
23
- },
24
- "max_position_embeddings": 512,
25
- "model_type": "distilbert",
26
- "n_heads": 12,
27
- "n_layers": 6,
28
- "pad_token_id": 0,
29
- "problem_type": "single_label_classification",
30
- "qa_dropout": 0.1,
31
- "seq_classif_dropout": 0.2,
32
- "sinusoidal_pos_embds": false,
33
- "tie_weights_": true,
34
- "transformers_version": "4.56.1",
35
- "vocab_size": 30522
36
- }
 
1
+ {
2
+ "activation": "gelu",
3
+ "architectures": [
4
+ "DistilBertForSequenceClassification"
5
+ ],
6
+ "attention_dropout": 0.1,
7
+ "dim": 768,
8
+ "dropout": 0.1,
9
+ "dtype": "float32",
10
+ "hidden_dim": 3072,
11
+ "id2label": {
12
+ "0": "hard",
13
+ "1": "semi-hard",
14
+ "2": "semi-soft",
15
+ "3": "soft"
16
+ },
17
+ "initializer_range": 0.02,
18
+ "label2id": {
19
+ "hard": 0,
20
+ "semi-hard": 1,
21
+ "semi-soft": 2,
22
+ "soft": 3
23
+ },
24
+ "max_position_embeddings": 512,
25
+ "model_type": "distilbert",
26
+ "n_heads": 12,
27
+ "n_layers": 6,
28
+ "pad_token_id": 0,
29
+ "problem_type": "single_label_classification",
30
+ "qa_dropout": 0.1,
31
+ "seq_classif_dropout": 0.2,
32
+ "sinusoidal_pos_embds": false,
33
+ "tie_weights_": true,
34
+ "transformers_version": "4.56.1",
35
+ "vocab_size": 30522
36
+ }
checkpoint-36/model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9c9103761626f348ad8ca62f8ef15453e3af535e38f46e03550cf0d68cfff73a
3
  size 267838720
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:29b812f914923941dbe98fce4ff58906f5d0eae634ad2df4198c356581fc654c
3
  size 267838720
checkpoint-36/optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:cedb5646f450b04e186c4c25cbb1dca1a8f0c7dd82ac5ae174215f95f7ea0834
3
- size 535740043
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8690a3e03e5a439a76615bbee3420038fe419e65b4be94ff4e685f69f12804fd
3
+ size 535737163
checkpoint-36/rng_state.pth CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:bf1e78dd4b5227c2191e07b09d8fa38a94689402f68cf150e526371b1a4a54c5
3
- size 14645
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:10c93fff6674ca3522c6a4ae993b108084f886b364f5e3ff427c4e294456a47d
3
+ size 14455
checkpoint-36/scheduler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:df566675e03ea54508b1a61983473558080c554f19f4207d0ed860743b01bbbf
3
  size 1465
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ea88c3b0d587a8a75030aaac449e3d64407849d75a8c69dcf18ed1c9863f7914
3
  size 1465
checkpoint-36/special_tokens_map.json CHANGED
@@ -1,7 +1,7 @@
1
- {
2
- "cls_token": "[CLS]",
3
- "mask_token": "[MASK]",
4
- "pad_token": "[PAD]",
5
- "sep_token": "[SEP]",
6
- "unk_token": "[UNK]"
7
- }
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
checkpoint-36/tokenizer_config.json CHANGED
@@ -1,58 +1,58 @@
1
- {
2
- "added_tokens_decoder": {
3
- "0": {
4
- "content": "[PAD]",
5
- "lstrip": false,
6
- "normalized": false,
7
- "rstrip": false,
8
- "single_word": false,
9
- "special": true
10
- },
11
- "100": {
12
- "content": "[UNK]",
13
- "lstrip": false,
14
- "normalized": false,
15
- "rstrip": false,
16
- "single_word": false,
17
- "special": true
18
- },
19
- "101": {
20
- "content": "[CLS]",
21
- "lstrip": false,
22
- "normalized": false,
23
- "rstrip": false,
24
- "single_word": false,
25
- "special": true
26
- },
27
- "102": {
28
- "content": "[SEP]",
29
- "lstrip": false,
30
- "normalized": false,
31
- "rstrip": false,
32
- "single_word": false,
33
- "special": true
34
- },
35
- "103": {
36
- "content": "[MASK]",
37
- "lstrip": false,
38
- "normalized": false,
39
- "rstrip": false,
40
- "single_word": false,
41
- "special": true
42
- }
43
- },
44
- "clean_up_tokenization_spaces": true,
45
- "cls_token": "[CLS]",
46
- "do_basic_tokenize": true,
47
- "do_lower_case": true,
48
- "extra_special_tokens": {},
49
- "mask_token": "[MASK]",
50
- "model_max_length": 512,
51
- "never_split": null,
52
- "pad_token": "[PAD]",
53
- "sep_token": "[SEP]",
54
- "strip_accents": null,
55
- "tokenize_chinese_chars": true,
56
- "tokenizer_class": "DistilBertTokenizer",
57
- "unk_token": "[UNK]"
58
- }
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "DistilBertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
checkpoint-36/trainer_state.json CHANGED
@@ -1,119 +1,119 @@
1
- {
2
- "best_global_step": 9,
3
- "best_metric": 0.4,
4
- "best_model_checkpoint": "./cheese-text-classifier\\checkpoint-9",
5
- "epoch": 4.0,
6
- "eval_steps": 500,
7
- "global_step": 36,
8
- "is_hyper_param_search": false,
9
- "is_local_process_zero": true,
10
- "is_world_process_zero": true,
11
- "log_history": [
12
- {
13
- "epoch": 0.5555555555555556,
14
- "grad_norm": 2.3992674350738525,
15
- "learning_rate": 8.000000000000001e-06,
16
- "loss": 1.3393,
17
- "step": 5
18
- },
19
- {
20
- "epoch": 1.0,
21
- "eval_accuracy": 0.4,
22
- "eval_loss": 1.3667316436767578,
23
- "eval_runtime": 0.1993,
24
- "eval_samples_per_second": 75.254,
25
- "eval_steps_per_second": 10.034,
26
- "step": 9
27
- },
28
- {
29
- "epoch": 1.1111111111111112,
30
- "grad_norm": 3.988067865371704,
31
- "learning_rate": 1.8e-05,
32
- "loss": 1.334,
33
- "step": 10
34
- },
35
- {
36
- "epoch": 1.6666666666666665,
37
- "grad_norm": 3.5621254444122314,
38
- "learning_rate": 1.9e-05,
39
- "loss": 1.3276,
40
- "step": 15
41
- },
42
- {
43
- "epoch": 2.0,
44
- "eval_accuracy": 0.4,
45
- "eval_loss": 1.351041555404663,
46
- "eval_runtime": 0.1866,
47
- "eval_samples_per_second": 80.391,
48
- "eval_steps_per_second": 10.719,
49
- "step": 18
50
- },
51
- {
52
- "epoch": 2.2222222222222223,
53
- "grad_norm": 3.5041863918304443,
54
- "learning_rate": 1.775e-05,
55
- "loss": 1.3127,
56
- "step": 20
57
- },
58
- {
59
- "epoch": 2.7777777777777777,
60
- "grad_norm": 3.1173243522644043,
61
- "learning_rate": 1.65e-05,
62
- "loss": 1.2905,
63
- "step": 25
64
- },
65
- {
66
- "epoch": 3.0,
67
- "eval_accuracy": 0.4,
68
- "eval_loss": 1.3322917222976685,
69
- "eval_runtime": 0.187,
70
- "eval_samples_per_second": 80.199,
71
- "eval_steps_per_second": 10.693,
72
- "step": 27
73
- },
74
- {
75
- "epoch": 3.3333333333333335,
76
- "grad_norm": 5.531054973602295,
77
- "learning_rate": 1.525e-05,
78
- "loss": 1.2266,
79
- "step": 30
80
- },
81
- {
82
- "epoch": 3.888888888888889,
83
- "grad_norm": 3.253871440887451,
84
- "learning_rate": 1.4e-05,
85
- "loss": 1.1597,
86
- "step": 35
87
- },
88
- {
89
- "epoch": 4.0,
90
- "eval_accuracy": 0.4,
91
- "eval_loss": 1.316666841506958,
92
- "eval_runtime": 0.156,
93
- "eval_samples_per_second": 96.134,
94
- "eval_steps_per_second": 12.818,
95
- "step": 36
96
- }
97
- ],
98
- "logging_steps": 5,
99
- "max_steps": 90,
100
- "num_input_tokens_seen": 0,
101
- "num_train_epochs": 10,
102
- "save_steps": 500,
103
- "stateful_callbacks": {
104
- "TrainerControl": {
105
- "args": {
106
- "should_epoch_stop": false,
107
- "should_evaluate": false,
108
- "should_log": false,
109
- "should_save": true,
110
- "should_training_stop": false
111
- },
112
- "attributes": {}
113
- }
114
- },
115
- "total_flos": 37092194549760.0,
116
- "train_batch_size": 8,
117
- "trial_name": null,
118
- "trial_params": null
119
- }
 
1
+ {
2
+ "best_global_step": 36,
3
+ "best_metric": 0.4666666666666667,
4
+ "best_model_checkpoint": "./cheese-text-classifier/checkpoint-36",
5
+ "epoch": 4.0,
6
+ "eval_steps": 500,
7
+ "global_step": 36,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.5555555555555556,
14
+ "grad_norm": 2.245687246322632,
15
+ "learning_rate": 8.000000000000001e-06,
16
+ "loss": 1.3897,
17
+ "step": 5
18
+ },
19
+ {
20
+ "epoch": 1.0,
21
+ "eval_accuracy": 0.26666666666666666,
22
+ "eval_loss": 1.3788894414901733,
23
+ "eval_runtime": 15.3581,
24
+ "eval_samples_per_second": 0.977,
25
+ "eval_steps_per_second": 0.13,
26
+ "step": 9
27
+ },
28
+ {
29
+ "epoch": 1.1111111111111112,
30
+ "grad_norm": 3.2673349380493164,
31
+ "learning_rate": 1.8e-05,
32
+ "loss": 1.3742,
33
+ "step": 10
34
+ },
35
+ {
36
+ "epoch": 1.6666666666666665,
37
+ "grad_norm": 2.7203733921051025,
38
+ "learning_rate": 1.9e-05,
39
+ "loss": 1.376,
40
+ "step": 15
41
+ },
42
+ {
43
+ "epoch": 2.0,
44
+ "eval_accuracy": 0.26666666666666666,
45
+ "eval_loss": 1.3665523529052734,
46
+ "eval_runtime": 11.7383,
47
+ "eval_samples_per_second": 1.278,
48
+ "eval_steps_per_second": 0.17,
49
+ "step": 18
50
+ },
51
+ {
52
+ "epoch": 2.2222222222222223,
53
+ "grad_norm": 2.6290931701660156,
54
+ "learning_rate": 1.775e-05,
55
+ "loss": 1.3642,
56
+ "step": 20
57
+ },
58
+ {
59
+ "epoch": 2.7777777777777777,
60
+ "grad_norm": 2.063326358795166,
61
+ "learning_rate": 1.65e-05,
62
+ "loss": 1.3545,
63
+ "step": 25
64
+ },
65
+ {
66
+ "epoch": 3.0,
67
+ "eval_accuracy": 0.4,
68
+ "eval_loss": 1.3509334325790405,
69
+ "eval_runtime": 11.9563,
70
+ "eval_samples_per_second": 1.255,
71
+ "eval_steps_per_second": 0.167,
72
+ "step": 27
73
+ },
74
+ {
75
+ "epoch": 3.3333333333333335,
76
+ "grad_norm": 4.2415618896484375,
77
+ "learning_rate": 1.525e-05,
78
+ "loss": 1.3202,
79
+ "step": 30
80
+ },
81
+ {
82
+ "epoch": 3.888888888888889,
83
+ "grad_norm": 2.6389851570129395,
84
+ "learning_rate": 1.4e-05,
85
+ "loss": 1.2773,
86
+ "step": 35
87
+ },
88
+ {
89
+ "epoch": 4.0,
90
+ "eval_accuracy": 0.4666666666666667,
91
+ "eval_loss": 1.3365625143051147,
92
+ "eval_runtime": 16.0326,
93
+ "eval_samples_per_second": 0.936,
94
+ "eval_steps_per_second": 0.125,
95
+ "step": 36
96
+ }
97
+ ],
98
+ "logging_steps": 5,
99
+ "max_steps": 90,
100
+ "num_input_tokens_seen": 0,
101
+ "num_train_epochs": 10,
102
+ "save_steps": 500,
103
+ "stateful_callbacks": {
104
+ "TrainerControl": {
105
+ "args": {
106
+ "should_epoch_stop": false,
107
+ "should_evaluate": false,
108
+ "should_log": false,
109
+ "should_save": true,
110
+ "should_training_stop": false
111
+ },
112
+ "attributes": {}
113
+ }
114
+ },
115
+ "total_flos": 37092194549760.0,
116
+ "train_batch_size": 8,
117
+ "trial_name": null,
118
+ "trial_params": null
119
+ }
checkpoint-36/training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4e09a73f1acd70625a9205129f2e812d36048a9c26a346e45f2c59de8cf03c1d
3
  size 5713
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:601f05c65c90d8f98b95428e2f3212dc00f04a7ec1ca04f350f1094719f63609
3
  size 5713
checkpoint-36/vocab.txt CHANGED
The diff for this file is too large to render. See raw diff
 
checkpoint-45/config.json CHANGED
@@ -1,36 +1,36 @@
1
- {
2
- "activation": "gelu",
3
- "architectures": [
4
- "DistilBertForSequenceClassification"
5
- ],
6
- "attention_dropout": 0.1,
7
- "dim": 768,
8
- "dropout": 0.1,
9
- "dtype": "float32",
10
- "hidden_dim": 3072,
11
- "id2label": {
12
- "0": "hard",
13
- "1": "semi-hard",
14
- "2": "semi-soft",
15
- "3": "soft"
16
- },
17
- "initializer_range": 0.02,
18
- "label2id": {
19
- "hard": 0,
20
- "semi-hard": 1,
21
- "semi-soft": 2,
22
- "soft": 3
23
- },
24
- "max_position_embeddings": 512,
25
- "model_type": "distilbert",
26
- "n_heads": 12,
27
- "n_layers": 6,
28
- "pad_token_id": 0,
29
- "problem_type": "single_label_classification",
30
- "qa_dropout": 0.1,
31
- "seq_classif_dropout": 0.2,
32
- "sinusoidal_pos_embds": false,
33
- "tie_weights_": true,
34
- "transformers_version": "4.56.1",
35
- "vocab_size": 30522
36
- }
 
1
+ {
2
+ "activation": "gelu",
3
+ "architectures": [
4
+ "DistilBertForSequenceClassification"
5
+ ],
6
+ "attention_dropout": 0.1,
7
+ "dim": 768,
8
+ "dropout": 0.1,
9
+ "dtype": "float32",
10
+ "hidden_dim": 3072,
11
+ "id2label": {
12
+ "0": "hard",
13
+ "1": "semi-hard",
14
+ "2": "semi-soft",
15
+ "3": "soft"
16
+ },
17
+ "initializer_range": 0.02,
18
+ "label2id": {
19
+ "hard": 0,
20
+ "semi-hard": 1,
21
+ "semi-soft": 2,
22
+ "soft": 3
23
+ },
24
+ "max_position_embeddings": 512,
25
+ "model_type": "distilbert",
26
+ "n_heads": 12,
27
+ "n_layers": 6,
28
+ "pad_token_id": 0,
29
+ "problem_type": "single_label_classification",
30
+ "qa_dropout": 0.1,
31
+ "seq_classif_dropout": 0.2,
32
+ "sinusoidal_pos_embds": false,
33
+ "tie_weights_": true,
34
+ "transformers_version": "4.56.1",
35
+ "vocab_size": 30522
36
+ }
checkpoint-45/model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:de112e603b7ce6bea67f3c24b99fafac7bf34253d9bdf5ff989b09ac2b83ca4a
3
  size 267838720
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:881e3ee05fb017ce1e24d8350e287da798aabc02cafb046a9aaa2420b7ba77f3
3
  size 267838720
checkpoint-45/optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:22a00865eaa9f97851e56a88891b92d080ddf724e23e3f3cfcdf1ccfa7f9ec5e
3
- size 535740043
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3aee79bdea1f103631c54ca9a61e674818b2052f820821825012f7f18df6e5fd
3
+ size 535737163
checkpoint-45/rng_state.pth CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5cbb01837febe7919f0932a54877b346f17063c101f3fe1e012d2f57b20df246
3
- size 14645
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dfd0ad23828b8751d20859a2c2db516e48d99722a8d6c22df92cd57e959ce4aa
3
+ size 14455
checkpoint-45/scheduler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:eb288caf5a3a28e45dd5cedb94250ec50410cd795dd5578017e9d2d2319ad46d
3
  size 1465
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7ab1da70076c3717ffc71289d6363e31a50dc677989fbcd4dc39336f5515d6d4
3
  size 1465
checkpoint-45/special_tokens_map.json CHANGED
@@ -1,7 +1,7 @@
1
- {
2
- "cls_token": "[CLS]",
3
- "mask_token": "[MASK]",
4
- "pad_token": "[PAD]",
5
- "sep_token": "[SEP]",
6
- "unk_token": "[UNK]"
7
- }
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
checkpoint-45/tokenizer_config.json CHANGED
@@ -1,58 +1,58 @@
1
- {
2
- "added_tokens_decoder": {
3
- "0": {
4
- "content": "[PAD]",
5
- "lstrip": false,
6
- "normalized": false,
7
- "rstrip": false,
8
- "single_word": false,
9
- "special": true
10
- },
11
- "100": {
12
- "content": "[UNK]",
13
- "lstrip": false,
14
- "normalized": false,
15
- "rstrip": false,
16
- "single_word": false,
17
- "special": true
18
- },
19
- "101": {
20
- "content": "[CLS]",
21
- "lstrip": false,
22
- "normalized": false,
23
- "rstrip": false,
24
- "single_word": false,
25
- "special": true
26
- },
27
- "102": {
28
- "content": "[SEP]",
29
- "lstrip": false,
30
- "normalized": false,
31
- "rstrip": false,
32
- "single_word": false,
33
- "special": true
34
- },
35
- "103": {
36
- "content": "[MASK]",
37
- "lstrip": false,
38
- "normalized": false,
39
- "rstrip": false,
40
- "single_word": false,
41
- "special": true
42
- }
43
- },
44
- "clean_up_tokenization_spaces": true,
45
- "cls_token": "[CLS]",
46
- "do_basic_tokenize": true,
47
- "do_lower_case": true,
48
- "extra_special_tokens": {},
49
- "mask_token": "[MASK]",
50
- "model_max_length": 512,
51
- "never_split": null,
52
- "pad_token": "[PAD]",
53
- "sep_token": "[SEP]",
54
- "strip_accents": null,
55
- "tokenize_chinese_chars": true,
56
- "tokenizer_class": "DistilBertTokenizer",
57
- "unk_token": "[UNK]"
58
- }
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "DistilBertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
checkpoint-45/trainer_state.json CHANGED
@@ -1,142 +1,142 @@
1
- {
2
- "best_global_step": 9,
3
- "best_metric": 0.4,
4
- "best_model_checkpoint": "./cheese-text-classifier\\checkpoint-9",
5
- "epoch": 5.0,
6
- "eval_steps": 500,
7
- "global_step": 45,
8
- "is_hyper_param_search": false,
9
- "is_local_process_zero": true,
10
- "is_world_process_zero": true,
11
- "log_history": [
12
- {
13
- "epoch": 0.5555555555555556,
14
- "grad_norm": 2.3992674350738525,
15
- "learning_rate": 8.000000000000001e-06,
16
- "loss": 1.3393,
17
- "step": 5
18
- },
19
- {
20
- "epoch": 1.0,
21
- "eval_accuracy": 0.4,
22
- "eval_loss": 1.3667316436767578,
23
- "eval_runtime": 0.1993,
24
- "eval_samples_per_second": 75.254,
25
- "eval_steps_per_second": 10.034,
26
- "step": 9
27
- },
28
- {
29
- "epoch": 1.1111111111111112,
30
- "grad_norm": 3.988067865371704,
31
- "learning_rate": 1.8e-05,
32
- "loss": 1.334,
33
- "step": 10
34
- },
35
- {
36
- "epoch": 1.6666666666666665,
37
- "grad_norm": 3.5621254444122314,
38
- "learning_rate": 1.9e-05,
39
- "loss": 1.3276,
40
- "step": 15
41
- },
42
- {
43
- "epoch": 2.0,
44
- "eval_accuracy": 0.4,
45
- "eval_loss": 1.351041555404663,
46
- "eval_runtime": 0.1866,
47
- "eval_samples_per_second": 80.391,
48
- "eval_steps_per_second": 10.719,
49
- "step": 18
50
- },
51
- {
52
- "epoch": 2.2222222222222223,
53
- "grad_norm": 3.5041863918304443,
54
- "learning_rate": 1.775e-05,
55
- "loss": 1.3127,
56
- "step": 20
57
- },
58
- {
59
- "epoch": 2.7777777777777777,
60
- "grad_norm": 3.1173243522644043,
61
- "learning_rate": 1.65e-05,
62
- "loss": 1.2905,
63
- "step": 25
64
- },
65
- {
66
- "epoch": 3.0,
67
- "eval_accuracy": 0.4,
68
- "eval_loss": 1.3322917222976685,
69
- "eval_runtime": 0.187,
70
- "eval_samples_per_second": 80.199,
71
- "eval_steps_per_second": 10.693,
72
- "step": 27
73
- },
74
- {
75
- "epoch": 3.3333333333333335,
76
- "grad_norm": 5.531054973602295,
77
- "learning_rate": 1.525e-05,
78
- "loss": 1.2266,
79
- "step": 30
80
- },
81
- {
82
- "epoch": 3.888888888888889,
83
- "grad_norm": 3.253871440887451,
84
- "learning_rate": 1.4e-05,
85
- "loss": 1.1597,
86
- "step": 35
87
- },
88
- {
89
- "epoch": 4.0,
90
- "eval_accuracy": 0.4,
91
- "eval_loss": 1.316666841506958,
92
- "eval_runtime": 0.156,
93
- "eval_samples_per_second": 96.134,
94
- "eval_steps_per_second": 12.818,
95
- "step": 36
96
- },
97
- {
98
- "epoch": 4.444444444444445,
99
- "grad_norm": 5.929306507110596,
100
- "learning_rate": 1.275e-05,
101
- "loss": 1.1161,
102
- "step": 40
103
- },
104
- {
105
- "epoch": 5.0,
106
- "grad_norm": 4.499341011047363,
107
- "learning_rate": 1.15e-05,
108
- "loss": 1.1096,
109
- "step": 45
110
- },
111
- {
112
- "epoch": 5.0,
113
- "eval_accuracy": 0.4,
114
- "eval_loss": 1.304785132408142,
115
- "eval_runtime": 0.1465,
116
- "eval_samples_per_second": 102.38,
117
- "eval_steps_per_second": 13.651,
118
- "step": 45
119
- }
120
- ],
121
- "logging_steps": 5,
122
- "max_steps": 90,
123
- "num_input_tokens_seen": 0,
124
- "num_train_epochs": 10,
125
- "save_steps": 500,
126
- "stateful_callbacks": {
127
- "TrainerControl": {
128
- "args": {
129
- "should_epoch_stop": false,
130
- "should_evaluate": false,
131
- "should_log": false,
132
- "should_save": true,
133
- "should_training_stop": false
134
- },
135
- "attributes": {}
136
- }
137
- },
138
- "total_flos": 46365243187200.0,
139
- "train_batch_size": 8,
140
- "trial_name": null,
141
- "trial_params": null
142
- }
 
1
+ {
2
+ "best_global_step": 36,
3
+ "best_metric": 0.4666666666666667,
4
+ "best_model_checkpoint": "./cheese-text-classifier/checkpoint-36",
5
+ "epoch": 5.0,
6
+ "eval_steps": 500,
7
+ "global_step": 45,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.5555555555555556,
14
+ "grad_norm": 2.245687246322632,
15
+ "learning_rate": 8.000000000000001e-06,
16
+ "loss": 1.3897,
17
+ "step": 5
18
+ },
19
+ {
20
+ "epoch": 1.0,
21
+ "eval_accuracy": 0.26666666666666666,
22
+ "eval_loss": 1.3788894414901733,
23
+ "eval_runtime": 15.3581,
24
+ "eval_samples_per_second": 0.977,
25
+ "eval_steps_per_second": 0.13,
26
+ "step": 9
27
+ },
28
+ {
29
+ "epoch": 1.1111111111111112,
30
+ "grad_norm": 3.2673349380493164,
31
+ "learning_rate": 1.8e-05,
32
+ "loss": 1.3742,
33
+ "step": 10
34
+ },
35
+ {
36
+ "epoch": 1.6666666666666665,
37
+ "grad_norm": 2.7203733921051025,
38
+ "learning_rate": 1.9e-05,
39
+ "loss": 1.376,
40
+ "step": 15
41
+ },
42
+ {
43
+ "epoch": 2.0,
44
+ "eval_accuracy": 0.26666666666666666,
45
+ "eval_loss": 1.3665523529052734,
46
+ "eval_runtime": 11.7383,
47
+ "eval_samples_per_second": 1.278,
48
+ "eval_steps_per_second": 0.17,
49
+ "step": 18
50
+ },
51
+ {
52
+ "epoch": 2.2222222222222223,
53
+ "grad_norm": 2.6290931701660156,
54
+ "learning_rate": 1.775e-05,
55
+ "loss": 1.3642,
56
+ "step": 20
57
+ },
58
+ {
59
+ "epoch": 2.7777777777777777,
60
+ "grad_norm": 2.063326358795166,
61
+ "learning_rate": 1.65e-05,
62
+ "loss": 1.3545,
63
+ "step": 25
64
+ },
65
+ {
66
+ "epoch": 3.0,
67
+ "eval_accuracy": 0.4,
68
+ "eval_loss": 1.3509334325790405,
69
+ "eval_runtime": 11.9563,
70
+ "eval_samples_per_second": 1.255,
71
+ "eval_steps_per_second": 0.167,
72
+ "step": 27
73
+ },
74
+ {
75
+ "epoch": 3.3333333333333335,
76
+ "grad_norm": 4.2415618896484375,
77
+ "learning_rate": 1.525e-05,
78
+ "loss": 1.3202,
79
+ "step": 30
80
+ },
81
+ {
82
+ "epoch": 3.888888888888889,
83
+ "grad_norm": 2.6389851570129395,
84
+ "learning_rate": 1.4e-05,
85
+ "loss": 1.2773,
86
+ "step": 35
87
+ },
88
+ {
89
+ "epoch": 4.0,
90
+ "eval_accuracy": 0.4666666666666667,
91
+ "eval_loss": 1.3365625143051147,
92
+ "eval_runtime": 16.0326,
93
+ "eval_samples_per_second": 0.936,
94
+ "eval_steps_per_second": 0.125,
95
+ "step": 36
96
+ },
97
+ {
98
+ "epoch": 4.444444444444445,
99
+ "grad_norm": 4.96947717666626,
100
+ "learning_rate": 1.275e-05,
101
+ "loss": 1.2581,
102
+ "step": 40
103
+ },
104
+ {
105
+ "epoch": 5.0,
106
+ "grad_norm": 4.3252763748168945,
107
+ "learning_rate": 1.15e-05,
108
+ "loss": 1.2385,
109
+ "step": 45
110
+ },
111
+ {
112
+ "epoch": 5.0,
113
+ "eval_accuracy": 0.4,
114
+ "eval_loss": 1.3152297735214233,
115
+ "eval_runtime": 11.4967,
116
+ "eval_samples_per_second": 1.305,
117
+ "eval_steps_per_second": 0.174,
118
+ "step": 45
119
+ }
120
+ ],
121
+ "logging_steps": 5,
122
+ "max_steps": 90,
123
+ "num_input_tokens_seen": 0,
124
+ "num_train_epochs": 10,
125
+ "save_steps": 500,
126
+ "stateful_callbacks": {
127
+ "TrainerControl": {
128
+ "args": {
129
+ "should_epoch_stop": false,
130
+ "should_evaluate": false,
131
+ "should_log": false,
132
+ "should_save": true,
133
+ "should_training_stop": false
134
+ },
135
+ "attributes": {}
136
+ }
137
+ },
138
+ "total_flos": 46365243187200.0,
139
+ "train_batch_size": 8,
140
+ "trial_name": null,
141
+ "trial_params": null
142
+ }
checkpoint-45/training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4e09a73f1acd70625a9205129f2e812d36048a9c26a346e45f2c59de8cf03c1d
3
  size 5713
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:601f05c65c90d8f98b95428e2f3212dc00f04a7ec1ca04f350f1094719f63609
3
  size 5713
checkpoint-45/vocab.txt CHANGED
The diff for this file is too large to render. See raw diff
 
checkpoint-54/config.json CHANGED
@@ -1,36 +1,36 @@
1
- {
2
- "activation": "gelu",
3
- "architectures": [
4
- "DistilBertForSequenceClassification"
5
- ],
6
- "attention_dropout": 0.1,
7
- "dim": 768,
8
- "dropout": 0.1,
9
- "dtype": "float32",
10
- "hidden_dim": 3072,
11
- "id2label": {
12
- "0": "hard",
13
- "1": "semi-hard",
14
- "2": "semi-soft",
15
- "3": "soft"
16
- },
17
- "initializer_range": 0.02,
18
- "label2id": {
19
- "hard": 0,
20
- "semi-hard": 1,
21
- "semi-soft": 2,
22
- "soft": 3
23
- },
24
- "max_position_embeddings": 512,
25
- "model_type": "distilbert",
26
- "n_heads": 12,
27
- "n_layers": 6,
28
- "pad_token_id": 0,
29
- "problem_type": "single_label_classification",
30
- "qa_dropout": 0.1,
31
- "seq_classif_dropout": 0.2,
32
- "sinusoidal_pos_embds": false,
33
- "tie_weights_": true,
34
- "transformers_version": "4.56.1",
35
- "vocab_size": 30522
36
- }
 
1
+ {
2
+ "activation": "gelu",
3
+ "architectures": [
4
+ "DistilBertForSequenceClassification"
5
+ ],
6
+ "attention_dropout": 0.1,
7
+ "dim": 768,
8
+ "dropout": 0.1,
9
+ "dtype": "float32",
10
+ "hidden_dim": 3072,
11
+ "id2label": {
12
+ "0": "hard",
13
+ "1": "semi-hard",
14
+ "2": "semi-soft",
15
+ "3": "soft"
16
+ },
17
+ "initializer_range": 0.02,
18
+ "label2id": {
19
+ "hard": 0,
20
+ "semi-hard": 1,
21
+ "semi-soft": 2,
22
+ "soft": 3
23
+ },
24
+ "max_position_embeddings": 512,
25
+ "model_type": "distilbert",
26
+ "n_heads": 12,
27
+ "n_layers": 6,
28
+ "pad_token_id": 0,
29
+ "problem_type": "single_label_classification",
30
+ "qa_dropout": 0.1,
31
+ "seq_classif_dropout": 0.2,
32
+ "sinusoidal_pos_embds": false,
33
+ "tie_weights_": true,
34
+ "transformers_version": "4.56.1",
35
+ "vocab_size": 30522
36
+ }
checkpoint-54/model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5aa846c5c226056cf5ce5353619db80041460988d55162b17a591cffe149a800
3
  size 267838720
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c67f13c61deeb6b7adef2ccee9818a40ccb3f8d91726396efcfe5f63186e2597
3
  size 267838720
checkpoint-54/optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d1eb8fea032ede5f8effc13bef01b07d87685d4ecd7c665082c802a5915b57d1
3
- size 535740043
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:668a302afd21ca03554c15ff8c56e7f2e9f8439d40c5666355f63c4c20b84194
3
+ size 535737163
checkpoint-54/rng_state.pth CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:067a193dac0906a35e128e6d201f4390897d736a9c9e8d5a2830394c21b775c0
3
- size 14645
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1ee8d2e9a02430c78237891f53745435230af59814a0e2823c3b4c7c861b7824
3
+ size 14455
checkpoint-54/scheduler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4671a75c3a48b181d2f951bb250e6d670ab787e12764f4c7021663e9d714a63d
3
  size 1465
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:413360cb5e557f5ecf080a401c6b3faae1a8f1d812b7d0c52a9cc6ff59355c54
3
  size 1465
checkpoint-54/special_tokens_map.json CHANGED
@@ -1,7 +1,7 @@
1
- {
2
- "cls_token": "[CLS]",
3
- "mask_token": "[MASK]",
4
- "pad_token": "[PAD]",
5
- "sep_token": "[SEP]",
6
- "unk_token": "[UNK]"
7
- }
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
checkpoint-54/tokenizer_config.json CHANGED
@@ -1,58 +1,58 @@
1
- {
2
- "added_tokens_decoder": {
3
- "0": {
4
- "content": "[PAD]",
5
- "lstrip": false,
6
- "normalized": false,
7
- "rstrip": false,
8
- "single_word": false,
9
- "special": true
10
- },
11
- "100": {
12
- "content": "[UNK]",
13
- "lstrip": false,
14
- "normalized": false,
15
- "rstrip": false,
16
- "single_word": false,
17
- "special": true
18
- },
19
- "101": {
20
- "content": "[CLS]",
21
- "lstrip": false,
22
- "normalized": false,
23
- "rstrip": false,
24
- "single_word": false,
25
- "special": true
26
- },
27
- "102": {
28
- "content": "[SEP]",
29
- "lstrip": false,
30
- "normalized": false,
31
- "rstrip": false,
32
- "single_word": false,
33
- "special": true
34
- },
35
- "103": {
36
- "content": "[MASK]",
37
- "lstrip": false,
38
- "normalized": false,
39
- "rstrip": false,
40
- "single_word": false,
41
- "special": true
42
- }
43
- },
44
- "clean_up_tokenization_spaces": true,
45
- "cls_token": "[CLS]",
46
- "do_basic_tokenize": true,
47
- "do_lower_case": true,
48
- "extra_special_tokens": {},
49
- "mask_token": "[MASK]",
50
- "model_max_length": 512,
51
- "never_split": null,
52
- "pad_token": "[PAD]",
53
- "sep_token": "[SEP]",
54
- "strip_accents": null,
55
- "tokenize_chinese_chars": true,
56
- "tokenizer_class": "DistilBertTokenizer",
57
- "unk_token": "[UNK]"
58
- }
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "DistilBertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
checkpoint-54/trainer_state.json CHANGED
@@ -1,158 +1,158 @@
1
- {
2
- "best_global_step": 54,
3
- "best_metric": 0.4666666666666667,
4
- "best_model_checkpoint": "./cheese-text-classifier\\checkpoint-54",
5
- "epoch": 6.0,
6
- "eval_steps": 500,
7
- "global_step": 54,
8
- "is_hyper_param_search": false,
9
- "is_local_process_zero": true,
10
- "is_world_process_zero": true,
11
- "log_history": [
12
- {
13
- "epoch": 0.5555555555555556,
14
- "grad_norm": 2.3992674350738525,
15
- "learning_rate": 8.000000000000001e-06,
16
- "loss": 1.3393,
17
- "step": 5
18
- },
19
- {
20
- "epoch": 1.0,
21
- "eval_accuracy": 0.4,
22
- "eval_loss": 1.3667316436767578,
23
- "eval_runtime": 0.1993,
24
- "eval_samples_per_second": 75.254,
25
- "eval_steps_per_second": 10.034,
26
- "step": 9
27
- },
28
- {
29
- "epoch": 1.1111111111111112,
30
- "grad_norm": 3.988067865371704,
31
- "learning_rate": 1.8e-05,
32
- "loss": 1.334,
33
- "step": 10
34
- },
35
- {
36
- "epoch": 1.6666666666666665,
37
- "grad_norm": 3.5621254444122314,
38
- "learning_rate": 1.9e-05,
39
- "loss": 1.3276,
40
- "step": 15
41
- },
42
- {
43
- "epoch": 2.0,
44
- "eval_accuracy": 0.4,
45
- "eval_loss": 1.351041555404663,
46
- "eval_runtime": 0.1866,
47
- "eval_samples_per_second": 80.391,
48
- "eval_steps_per_second": 10.719,
49
- "step": 18
50
- },
51
- {
52
- "epoch": 2.2222222222222223,
53
- "grad_norm": 3.5041863918304443,
54
- "learning_rate": 1.775e-05,
55
- "loss": 1.3127,
56
- "step": 20
57
- },
58
- {
59
- "epoch": 2.7777777777777777,
60
- "grad_norm": 3.1173243522644043,
61
- "learning_rate": 1.65e-05,
62
- "loss": 1.2905,
63
- "step": 25
64
- },
65
- {
66
- "epoch": 3.0,
67
- "eval_accuracy": 0.4,
68
- "eval_loss": 1.3322917222976685,
69
- "eval_runtime": 0.187,
70
- "eval_samples_per_second": 80.199,
71
- "eval_steps_per_second": 10.693,
72
- "step": 27
73
- },
74
- {
75
- "epoch": 3.3333333333333335,
76
- "grad_norm": 5.531054973602295,
77
- "learning_rate": 1.525e-05,
78
- "loss": 1.2266,
79
- "step": 30
80
- },
81
- {
82
- "epoch": 3.888888888888889,
83
- "grad_norm": 3.253871440887451,
84
- "learning_rate": 1.4e-05,
85
- "loss": 1.1597,
86
- "step": 35
87
- },
88
- {
89
- "epoch": 4.0,
90
- "eval_accuracy": 0.4,
91
- "eval_loss": 1.316666841506958,
92
- "eval_runtime": 0.156,
93
- "eval_samples_per_second": 96.134,
94
- "eval_steps_per_second": 12.818,
95
- "step": 36
96
- },
97
- {
98
- "epoch": 4.444444444444445,
99
- "grad_norm": 5.929306507110596,
100
- "learning_rate": 1.275e-05,
101
- "loss": 1.1161,
102
- "step": 40
103
- },
104
- {
105
- "epoch": 5.0,
106
- "grad_norm": 4.499341011047363,
107
- "learning_rate": 1.15e-05,
108
- "loss": 1.1096,
109
- "step": 45
110
- },
111
- {
112
- "epoch": 5.0,
113
- "eval_accuracy": 0.4,
114
- "eval_loss": 1.304785132408142,
115
- "eval_runtime": 0.1465,
116
- "eval_samples_per_second": 102.38,
117
- "eval_steps_per_second": 13.651,
118
- "step": 45
119
- },
120
- {
121
- "epoch": 5.555555555555555,
122
- "grad_norm": 4.16509485244751,
123
- "learning_rate": 1.025e-05,
124
- "loss": 0.9954,
125
- "step": 50
126
- },
127
- {
128
- "epoch": 6.0,
129
- "eval_accuracy": 0.4666666666666667,
130
- "eval_loss": 1.3045247793197632,
131
- "eval_runtime": 0.1861,
132
- "eval_samples_per_second": 80.608,
133
- "eval_steps_per_second": 10.748,
134
- "step": 54
135
- }
136
- ],
137
- "logging_steps": 5,
138
- "max_steps": 90,
139
- "num_input_tokens_seen": 0,
140
- "num_train_epochs": 10,
141
- "save_steps": 500,
142
- "stateful_callbacks": {
143
- "TrainerControl": {
144
- "args": {
145
- "should_epoch_stop": false,
146
- "should_evaluate": false,
147
- "should_log": false,
148
- "should_save": true,
149
- "should_training_stop": false
150
- },
151
- "attributes": {}
152
- }
153
- },
154
- "total_flos": 55638291824640.0,
155
- "train_batch_size": 8,
156
- "trial_name": null,
157
- "trial_params": null
158
- }
 
1
+ {
2
+ "best_global_step": 36,
3
+ "best_metric": 0.4666666666666667,
4
+ "best_model_checkpoint": "./cheese-text-classifier/checkpoint-36",
5
+ "epoch": 6.0,
6
+ "eval_steps": 500,
7
+ "global_step": 54,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.5555555555555556,
14
+ "grad_norm": 2.245687246322632,
15
+ "learning_rate": 8.000000000000001e-06,
16
+ "loss": 1.3897,
17
+ "step": 5
18
+ },
19
+ {
20
+ "epoch": 1.0,
21
+ "eval_accuracy": 0.26666666666666666,
22
+ "eval_loss": 1.3788894414901733,
23
+ "eval_runtime": 15.3581,
24
+ "eval_samples_per_second": 0.977,
25
+ "eval_steps_per_second": 0.13,
26
+ "step": 9
27
+ },
28
+ {
29
+ "epoch": 1.1111111111111112,
30
+ "grad_norm": 3.2673349380493164,
31
+ "learning_rate": 1.8e-05,
32
+ "loss": 1.3742,
33
+ "step": 10
34
+ },
35
+ {
36
+ "epoch": 1.6666666666666665,
37
+ "grad_norm": 2.7203733921051025,
38
+ "learning_rate": 1.9e-05,
39
+ "loss": 1.376,
40
+ "step": 15
41
+ },
42
+ {
43
+ "epoch": 2.0,
44
+ "eval_accuracy": 0.26666666666666666,
45
+ "eval_loss": 1.3665523529052734,
46
+ "eval_runtime": 11.7383,
47
+ "eval_samples_per_second": 1.278,
48
+ "eval_steps_per_second": 0.17,
49
+ "step": 18
50
+ },
51
+ {
52
+ "epoch": 2.2222222222222223,
53
+ "grad_norm": 2.6290931701660156,
54
+ "learning_rate": 1.775e-05,
55
+ "loss": 1.3642,
56
+ "step": 20
57
+ },
58
+ {
59
+ "epoch": 2.7777777777777777,
60
+ "grad_norm": 2.063326358795166,
61
+ "learning_rate": 1.65e-05,
62
+ "loss": 1.3545,
63
+ "step": 25
64
+ },
65
+ {
66
+ "epoch": 3.0,
67
+ "eval_accuracy": 0.4,
68
+ "eval_loss": 1.3509334325790405,
69
+ "eval_runtime": 11.9563,
70
+ "eval_samples_per_second": 1.255,
71
+ "eval_steps_per_second": 0.167,
72
+ "step": 27
73
+ },
74
+ {
75
+ "epoch": 3.3333333333333335,
76
+ "grad_norm": 4.2415618896484375,
77
+ "learning_rate": 1.525e-05,
78
+ "loss": 1.3202,
79
+ "step": 30
80
+ },
81
+ {
82
+ "epoch": 3.888888888888889,
83
+ "grad_norm": 2.6389851570129395,
84
+ "learning_rate": 1.4e-05,
85
+ "loss": 1.2773,
86
+ "step": 35
87
+ },
88
+ {
89
+ "epoch": 4.0,
90
+ "eval_accuracy": 0.4666666666666667,
91
+ "eval_loss": 1.3365625143051147,
92
+ "eval_runtime": 16.0326,
93
+ "eval_samples_per_second": 0.936,
94
+ "eval_steps_per_second": 0.125,
95
+ "step": 36
96
+ },
97
+ {
98
+ "epoch": 4.444444444444445,
99
+ "grad_norm": 4.96947717666626,
100
+ "learning_rate": 1.275e-05,
101
+ "loss": 1.2581,
102
+ "step": 40
103
+ },
104
+ {
105
+ "epoch": 5.0,
106
+ "grad_norm": 4.3252763748168945,
107
+ "learning_rate": 1.15e-05,
108
+ "loss": 1.2385,
109
+ "step": 45
110
+ },
111
+ {
112
+ "epoch": 5.0,
113
+ "eval_accuracy": 0.4,
114
+ "eval_loss": 1.3152297735214233,
115
+ "eval_runtime": 11.4967,
116
+ "eval_samples_per_second": 1.305,
117
+ "eval_steps_per_second": 0.174,
118
+ "step": 45
119
+ },
120
+ {
121
+ "epoch": 5.555555555555555,
122
+ "grad_norm": 3.6944730281829834,
123
+ "learning_rate": 1.025e-05,
124
+ "loss": 1.1673,
125
+ "step": 50
126
+ },
127
+ {
128
+ "epoch": 6.0,
129
+ "eval_accuracy": 0.4666666666666667,
130
+ "eval_loss": 1.3012595176696777,
131
+ "eval_runtime": 11.6907,
132
+ "eval_samples_per_second": 1.283,
133
+ "eval_steps_per_second": 0.171,
134
+ "step": 54
135
+ }
136
+ ],
137
+ "logging_steps": 5,
138
+ "max_steps": 90,
139
+ "num_input_tokens_seen": 0,
140
+ "num_train_epochs": 10,
141
+ "save_steps": 500,
142
+ "stateful_callbacks": {
143
+ "TrainerControl": {
144
+ "args": {
145
+ "should_epoch_stop": false,
146
+ "should_evaluate": false,
147
+ "should_log": false,
148
+ "should_save": true,
149
+ "should_training_stop": false
150
+ },
151
+ "attributes": {}
152
+ }
153
+ },
154
+ "total_flos": 55638291824640.0,
155
+ "train_batch_size": 8,
156
+ "trial_name": null,
157
+ "trial_params": null
158
+ }
checkpoint-54/training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4e09a73f1acd70625a9205129f2e812d36048a9c26a346e45f2c59de8cf03c1d
3
  size 5713
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:601f05c65c90d8f98b95428e2f3212dc00f04a7ec1ca04f350f1094719f63609
3
  size 5713