6e1a46d2917b4862d38a4f0c9b349b78

This model is a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Qwen-7B on the nyu-mll/glue [stsb] dataset. It achieves the following results on the evaluation set:

Loss: 2.0875
Data Size: 1.0
Epoch Runtime: 161.4278
Mse: 0.5218
Mae: 0.5519
R2: 0.7666

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Mse	Mae	R2
No log	0	0	448.9303	0	4.7458	112.2354	10.3057	-49.2069
No log	1	179	88.2811	0.0078	5.8425	22.0705	3.7456	-8.8729
No log	2	358	79.4700	0.0156	15.1152	19.8687	3.4889	-7.8880
No log	3	537	8.6103	0.0312	27.0614	2.1534	1.1714	0.0367
No log	4	716	12.3908	0.0625	35.6056	3.0979	1.5029	-0.3858
No log	5	895	5.2826	0.125	52.5455	1.3211	0.9492	0.4090
2.2677	6	1074	4.7406	0.25	74.0799	1.1853	0.9211	0.4698
3.7239	7	1253	4.6343	0.5	117.4151	1.1584	0.8625	0.4818
2.6132	8.0	1432	6.7421	1.0	192.2860	1.6860	1.0907	0.2458
1.6451	9.0	1611	2.3480	1.0	155.3065	0.5871	0.5916	0.7374
1.4991	10.0	1790	2.3674	1.0	167.1774	0.5922	0.6104	0.7351
1.2215	11.0	1969	2.0947	1.0	156.1953	0.5239	0.5679	0.7657
1.058	12.0	2148	3.7766	1.0	167.6526	0.9443	0.7900	0.5776
1.5368	13.0	2327	2.0198	1.0	180.6919	0.5051	0.5459	0.7740
0.7465	14.0	2506	2.2596	1.0	170.0243	0.5652	0.5900	0.7472
0.5813	15.0	2685	2.1532	1.0	155.3671	0.5384	0.5654	0.7591
0.5467	16.0	2864	3.2212	1.0	164.9452	0.8057	0.7418	0.6396
0.6682	17.0	3043	2.0875	1.0	161.4278	0.5218	0.5519	0.7666

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.0.0
Tokenizers 0.22.1

Downloads last month: 13

Safetensors

Model size

2B params

Tensor type

F32

Model tree for contemmcm/6e1a46d2917b4862d38a4f0c9b349b78

Base model

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

Finetuned

(229)

this model

Evaluation results

Metadata error: specify a dataset to view leaderboard