roberta-finetuned-nq-nasa for Extractive QA
Base model is from deepset roberta-base-squad2, finetuned last layer and qa_head on Natural Questions short version and then finetuned the entire model on nasa smd qa training split.
Task: Question Answering
Language: Engligh
Local setup: 1x RTX 4080 super
Evaluation Metric: Squad_v2
Evaluation Dataset: nasa smd qa validation split
Finetune (1) Hyperparameters (NQ Dataset) zero-shot:
train_batch_size = 16
val_batch_size = 8
n_epochs = 2
base_LM_model = "roberta-base"
max_seq_len = 384
doc_stride=128
optimizer=adamW
last_layer_learning_rate=5e-6
qa_head_learning_rate=3e-5
release_model= "roberta-finetuned-nq"
gradient_checkpointing=True
Finetune (2) Hyperparameters (Nasa train):
train_batch_size = 16
val_batch_size = 8
n_epochs = 5
base_LM_model = "roberta-finetuned-nq"
max_seq_len = 384
doc_stride=128
optimizer=adamW
layer_learning_rate=1e-6
qa_head_learning_rate=1e-5
release_model= "roberta-finetuned-nq-nasa"
gradient_checkpointing=True
Motive of 2 finetunes
Finetune 1 was already strong towards answering capability, however the model tried to answer more often than abstaining therefore finetune 2 was needed to teach the model when to abstain.
Performance
"exact": 66.0,
"f1": 79.86357273703948,
"total": 50,
"HasAns_exact": 53.333333333333336,
"HasAns_f1": 76.43928789506579,
"HasAns_total": 30,
"NoAns_exact": 85.0,
"NoAns_f1": 85.0,
"NoAns_total": 20
Acknowledgements
- hugging face deepset/roberta-base-squad2 authors
- hugging face cjlovering Natural Questions (short) Dataset
- hugging face NASA SMD Dataset authors
- Huggingface Trainer Documentation
- Developed as a part of Westcliff MSCS AIT500 course 2025 Fall 1 session 2. We thank our professor Desmond for the continous guidance.
- Downloads last month
- 18
Model tree for quantaRoche/roberta-base-finetuned-nq-nasa-qa
Datasets used to train quantaRoche/roberta-base-finetuned-nq-nasa-qa
Evaluation results
- EM on nasa-impact/nasa-smd-qa-benchmarkvalidation set self-reported66.000
- f1 on nasa-impact/nasa-smd-qa-benchmarkvalidation set self-reported79.864