Update README.md
Browse files
README.md
CHANGED
|
@@ -45,6 +45,7 @@ and then finetuned the entire model on [nasa smd qa training split](https://hugg
|
|
| 45 |
|
| 46 |
|
| 47 |
## Finetune (1) Hyperparameters (NQ Dataset) zero-shot:
|
|
|
|
| 48 |
train_batch_size = 16
|
| 49 |
val_batch_size = 8
|
| 50 |
n_epochs = 2
|
|
@@ -56,8 +57,10 @@ last_layer_learning_rate=5e-6
|
|
| 56 |
qa_head_learning_rate=3e-5
|
| 57 |
release_model= "roberta-finetuned-nq"
|
| 58 |
gradient_checkpointing=True
|
|
|
|
| 59 |
|
| 60 |
## Finetune (2) Hyperparameters (Nasa train):
|
|
|
|
| 61 |
train_batch_size = 16
|
| 62 |
val_batch_size = 8
|
| 63 |
n_epochs = 5
|
|
@@ -68,7 +71,8 @@ optimizer=adamW
|
|
| 68 |
layer_learning_rate=1e-6
|
| 69 |
qa_head_learning_rate=1e-5
|
| 70 |
release_model= "roberta-finetuned-nq-nasa"
|
| 71 |
-
gradient_checkpointing=True
|
|
|
|
| 72 |
|
| 73 |
## Motive of 2 finetunes
|
| 74 |
Finetune 1 was already strong towards answering capability, however the model tried to answer more often than abstaining therefore finetune 2 was needed to teach the model when to abstain.
|
|
|
|
| 45 |
|
| 46 |
|
| 47 |
## Finetune (1) Hyperparameters (NQ Dataset) zero-shot:
|
| 48 |
+
```
|
| 49 |
train_batch_size = 16
|
| 50 |
val_batch_size = 8
|
| 51 |
n_epochs = 2
|
|
|
|
| 57 |
qa_head_learning_rate=3e-5
|
| 58 |
release_model= "roberta-finetuned-nq"
|
| 59 |
gradient_checkpointing=True
|
| 60 |
+
```
|
| 61 |
|
| 62 |
## Finetune (2) Hyperparameters (Nasa train):
|
| 63 |
+
```
|
| 64 |
train_batch_size = 16
|
| 65 |
val_batch_size = 8
|
| 66 |
n_epochs = 5
|
|
|
|
| 71 |
layer_learning_rate=1e-6
|
| 72 |
qa_head_learning_rate=1e-5
|
| 73 |
release_model= "roberta-finetuned-nq-nasa"
|
| 74 |
+
gradient_checkpointing=True
|
| 75 |
+
```
|
| 76 |
|
| 77 |
## Motive of 2 finetunes
|
| 78 |
Finetune 1 was already strong towards answering capability, however the model tried to answer more often than abstaining therefore finetune 2 was needed to teach the model when to abstain.
|