--- datasets: - cjlovering/natural-questions-short - nasa-impact/nasa-smd-qa-benchmark language: - en metrics: - squad_v2 base_model: - deepset/roberta-base-squad2 pipeline_tag: question-answering library_name: transformers tags: - astronomy - space license: cc-by-4.0 model-index: - name: quantaRoche/roberta-finetuned-nq-nasa-qa results: - task: type: question-answering dataset: type: question-answering name: nasa-impact/nasa-smd-qa-benchmark split: validation metrics: - type: exact_match name: EM value: 66.0 - type: f1 name: f1 value: 79.86357273703948 --- # roberta-finetuned-nq-nasa for Extractive QA Base model is from [deepset roberta-base-squad2](https://huggingface.co/deepset/roberta-base-squad2), finetuned last layer and qa_head on [Natural Questions short version](https://huggingface.co/datasets/cjlovering/natural-questions-short) and then finetuned the entire model on [nasa smd qa training split](https://huggingface.co/datasets/nasa-impact/nasa-smd-qa-benchmark). **Task:** Question Answering **Language:** Engligh **Local setup:** 1x RTX 4080 super **Evaluation Metric:** Squad_v2 **Evaluation Dataset:** [nasa smd qa validation split](https://huggingface.co/datasets/nasa-impact/nasa-smd-qa-benchmark) ## Finetune (1) Hyperparameters (NQ Dataset) zero-shot: ``` train_batch_size = 16 val_batch_size = 8 n_epochs = 2 base_LM_model = "roberta-base" max_seq_len = 384 doc_stride=128 optimizer=adamW last_layer_learning_rate=5e-6 qa_head_learning_rate=3e-5 release_model= "roberta-finetuned-nq" gradient_checkpointing=True ``` ## Finetune (2) Hyperparameters (Nasa train): ``` train_batch_size = 16 val_batch_size = 8 n_epochs = 5 base_LM_model = "roberta-finetuned-nq" max_seq_len = 384 doc_stride=128 optimizer=adamW layer_learning_rate=1e-6 qa_head_learning_rate=1e-5 release_model= "roberta-finetuned-nq-nasa" gradient_checkpointing=True ``` ## Motive of 2 finetunes Finetune 1 was already strong towards answering capability, however the model tried to answer more often than abstaining therefore finetune 2 was needed to teach the model when to abstain. ## Performance ``` "exact": 66.0, "f1": 79.86357273703948, "total": 50, "HasAns_exact": 53.333333333333336, "HasAns_f1": 76.43928789506579, "HasAns_total": 30, "NoAns_exact": 85.0, "NoAns_f1": 85.0, "NoAns_total": 20 ``` ## Acknowledgements - [hugging face deepset/roberta-base-squad2 authors](https://huggingface.co/deepset/roberta-base-squad2) - [hugging face cjlovering Natural Questions (short) Dataset](https://huggingface.co/datasets/cjlovering/natural-questions-short) - [hugging face NASA SMD Dataset authors](https://huggingface.co/datasets/nasa-impact/nasa-smd-qa-benchmark) - [Huggingface Trainer Documentation](https://huggingface.co/docs/transformers/en/main_classes/trainer) - Developed as a part of Westcliff MSCS AIT500 course 2025 Fall 1 session 2. We thank our professor Desmond for the continous guidance.