Hello, when I finetune my RAG-based model on my 4xV100 box i’m having some issues with OOM on my GPUs. I can only use a batch size of 1 for both train and eval and still fit the examples into the GPU memory. The GPUs have about 16GB of memory each and a batch size of 1 uses between 11-15GB of memory depending on the other params i’m using. This could just be the nature of the model, but I want to make sure that I’m not doing something wrong that is blowing up the memory. My knowledge dataset is much smaller than the default indexes. I am using the finetuning script in examples/research-projects/rag/finetune_rag.sh. Thank you for your help.
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| GPU OOM when training | 2 | 3386 | October 20, 2021 | |
| How much memory is needed for mbart-large-cc25? | 1 | 1090 | August 29, 2020 | |
| Hyperparameter tuning practical guide? | 1 | 498 | October 6, 2021 | |
| OOM GPU when extracting features into dict according to fine-tuning documentation | 0 | 377 | December 6, 2021 | |
| Regarding the eval batch size for large models | 0 | 1108 | May 9, 2022 |