Hello, all!
My computer doesn’t have internet connection. So I have to first download dataset on another computer and copy the dataset to my offline computer.
I use the following code snippet to download wikitext-2-raw-v1 dataset.
from datasets import load_dataset
datasets = load_dataset("wikitext", "wikitext-2-raw-v1")
And I found that some cached files are in the ~/.cache/huggingface/ 's sub dirs.
In the ~/.cache/huggingface/modules/datasets_modules/datasets/wikitext/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126 dir I can see:
__init__.py, __pycache__, dataset_infos.json, wikitext.json, wikitext.py
In the ~/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a1 26 dir I can see:
LICENSE dataset_info.json wikitext-test.arrow wikitext-train.arrow wikitext-validation.arrow
Do I have to copy all those files to the offline computer? Can I change a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126 to other names?
Or how to change those arrow files to csv files?