roberta-finetuned-nq-nasa for Extractive QA

Base model is from deepset roberta-base-squad2, finetuned last layer and qa_head on Natural Questions short version and then finetuned the entire model on nasa smd qa training split.

Task: Question Answering
Language: Engligh
Local setup: 1x RTX 4080 super
Evaluation Metric: Squad_v2
Evaluation Dataset: nasa smd qa validation split

Finetune (1) Hyperparameters (NQ Dataset) zero-shot:

train_batch_size = 16   
val_batch_size = 8   
n_epochs = 2   
base_LM_model = "roberta-base"   
max_seq_len = 384   
doc_stride=128   
optimizer=adamW   
last_layer_learning_rate=5e-6   
qa_head_learning_rate=3e-5   
release_model= "roberta-finetuned-nq"    
gradient_checkpointing=True

Finetune (2) Hyperparameters (Nasa train):

train_batch_size = 16   
val_batch_size = 8   
n_epochs = 5   
base_LM_model = "roberta-finetuned-nq"      
max_seq_len = 384      
doc_stride=128      
optimizer=adamW      
layer_learning_rate=1e-6    
qa_head_learning_rate=1e-5     
release_model= "roberta-finetuned-nq-nasa"     
gradient_checkpointing=True

Motive of 2 finetunes

Finetune 1 was already strong towards answering capability, however the model tried to answer more often than abstaining therefore finetune 2 was needed to teach the model when to abstain.

Performance

"exact": 66.0,
"f1": 79.86357273703948,
"total": 50,
"HasAns_exact": 53.333333333333336,
"HasAns_f1": 76.43928789506579,
"HasAns_total": 30,
"NoAns_exact": 85.0,
"NoAns_f1": 85.0,
"NoAns_total": 20