quantaRoche's picture
Update README.md
3bba2f3 verified
metadata
datasets:
  - cjlovering/natural-questions-short
  - nasa-impact/nasa-smd-qa-benchmark
language:
  - en
metrics:
  - squad_v2
base_model:
  - deepset/roberta-base-squad2
pipeline_tag: question-answering
library_name: transformers
tags:
  - astronomy
  - space
license: cc-by-4.0
model-index:
  - name: quantaRoche/roberta-finetuned-nq-nasa-qa
    results:
      - task:
          type: question-answering
        dataset:
          type: question-answering
          name: nasa-impact/nasa-smd-qa-benchmark
          split: validation
        metrics:
          - type: exact_match
            name: EM
            value: 66
          - type: f1
            name: f1
            value: 79.86357273703948

roberta-finetuned-nq-nasa for Extractive QA

Base model is from deepset roberta-base-squad2, finetuned last layer and qa_head on Natural Questions short version and then finetuned the entire model on nasa smd qa training split.

Task: Question Answering
Language: Engligh
Local setup: 1x RTX 4080 super
Evaluation Metric: Squad_v2
Evaluation Dataset: nasa smd qa validation split

Finetune (1) Hyperparameters (NQ Dataset) zero-shot:

train_batch_size = 16   
val_batch_size = 8   
n_epochs = 2   
base_LM_model = "roberta-base"   
max_seq_len = 384   
doc_stride=128   
optimizer=adamW   
last_layer_learning_rate=5e-6   
qa_head_learning_rate=3e-5   
release_model= "roberta-finetuned-nq"    
gradient_checkpointing=True   

Finetune (2) Hyperparameters (Nasa train):

train_batch_size = 16   
val_batch_size = 8   
n_epochs = 5   
base_LM_model = "roberta-finetuned-nq"      
max_seq_len = 384      
doc_stride=128      
optimizer=adamW      
layer_learning_rate=1e-6    
qa_head_learning_rate=1e-5     
release_model= "roberta-finetuned-nq-nasa"     
gradient_checkpointing=True

Motive of 2 finetunes

Finetune 1 was already strong towards answering capability, however the model tried to answer more often than abstaining therefore finetune 2 was needed to teach the model when to abstain.

Performance

"exact": 66.0,
"f1": 79.86357273703948,
"total": 50,
"HasAns_exact": 53.333333333333336,
"HasAns_f1": 76.43928789506579,
"HasAns_total": 30,
"NoAns_exact": 85.0,
"NoAns_f1": 85.0,
"NoAns_total": 20

Acknowledgements