---
library_name: transformers
license: mit
base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
tags:
- generated_from_trainer
model-index:
- name: 6e1a46d2917b4862d38a4f0c9b349b78
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# 6e1a46d2917b4862d38a4f0c9b349b78

This model is a fine-tuned version of [deepseek-ai/DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) on the nyu-mll/glue [stsb] dataset.
It achieves the following results on the evaluation set:
- Loss: 2.0875
- Data Size: 1.0
- Epoch Runtime: 161.4278
- Mse: 0.5218
- Mae: 0.5519
- R2: 0.7666

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: constant
- num_epochs: 50

### Training results

| Training Loss | Epoch | Step | Validation Loss | Data Size | Epoch Runtime | Mse      | Mae     | R2       |
|:-------------:|:-----:|:----:|:---------------:|:---------:|:-------------:|:--------:|:-------:|:--------:|
| No log        | 0     | 0    | 448.9303        | 0         | 4.7458        | 112.2354 | 10.3057 | -49.2069 |
| No log        | 1     | 179  | 88.2811         | 0.0078    | 5.8425        | 22.0705  | 3.7456  | -8.8729  |
| No log        | 2     | 358  | 79.4700         | 0.0156    | 15.1152       | 19.8687  | 3.4889  | -7.8880  |
| No log        | 3     | 537  | 8.6103          | 0.0312    | 27.0614       | 2.1534   | 1.1714  | 0.0367   |
| No log        | 4     | 716  | 12.3908         | 0.0625    | 35.6056       | 3.0979   | 1.5029  | -0.3858  |
| No log        | 5     | 895  | 5.2826          | 0.125     | 52.5455       | 1.3211   | 0.9492  | 0.4090   |
| 2.2677        | 6     | 1074 | 4.7406          | 0.25      | 74.0799       | 1.1853   | 0.9211  | 0.4698   |
| 3.7239        | 7     | 1253 | 4.6343          | 0.5       | 117.4151      | 1.1584   | 0.8625  | 0.4818   |
| 2.6132        | 8.0   | 1432 | 6.7421          | 1.0       | 192.2860      | 1.6860   | 1.0907  | 0.2458   |
| 1.6451        | 9.0   | 1611 | 2.3480          | 1.0       | 155.3065      | 0.5871   | 0.5916  | 0.7374   |
| 1.4991        | 10.0  | 1790 | 2.3674          | 1.0       | 167.1774      | 0.5922   | 0.6104  | 0.7351   |
| 1.2215        | 11.0  | 1969 | 2.0947          | 1.0       | 156.1953      | 0.5239   | 0.5679  | 0.7657   |
| 1.058         | 12.0  | 2148 | 3.7766          | 1.0       | 167.6526      | 0.9443   | 0.7900  | 0.5776   |
| 1.5368        | 13.0  | 2327 | 2.0198          | 1.0       | 180.6919      | 0.5051   | 0.5459  | 0.7740   |
| 0.7465        | 14.0  | 2506 | 2.2596          | 1.0       | 170.0243      | 0.5652   | 0.5900  | 0.7472   |
| 0.5813        | 15.0  | 2685 | 2.1532          | 1.0       | 155.3671      | 0.5384   | 0.5654  | 0.7591   |
| 0.5467        | 16.0  | 2864 | 3.2212          | 1.0       | 164.9452      | 0.8057   | 0.7418  | 0.6396   |
| 0.6682        | 17.0  | 3043 | 2.0875          | 1.0       | 161.4278      | 0.5218   | 0.5519  | 0.7666   |


### Framework versions

- Transformers 4.57.0
- Pytorch 2.8.0+cu128
- Datasets 4.0.0
- Tokenizers 0.22.1