Open
Description
Add Link
https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html
Describe the bug
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 9912422/9912422 [00:03<00:00, 3078874.05it/s]
5%|█████▎ | 491520/9912422 [00:01<00:22, 417952.41it/s]Traceback (most recent call last):
File "fsdp_mnist.py", line 173, in <module>
mp.spawn(fsdp_main,
File "/root/miniconda3/envs/old_mega/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 246, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method="spawn")
File "/root/miniconda3/envs/old_mega/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 202, in start_processes
while not context.join():
File "/root/miniconda3/envs/old_mega/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 163, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/root/miniconda3/envs/old_mega/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 74, in _wrap
fn(i, *args)
File "/ssd1/gaotianlin/baidu/hac-aiacc/Megatron/old_scripts/fsdp/fsdp_mnist.py", line 94, in fsdp_main
dataset1 = datasets.MNIST('./data', train=True, download=True,
File "/root/miniconda3/envs/old_mega/lib/python3.8/site-packages/torchvision/datasets/mnist.py", line 99, in __init__
self.download()
File "/root/miniconda3/envs/old_mega/lib/python3.8/site-packages/torchvision/datasets/mnist.py", line 187, in download
download_and_extract_archive(url, download_root=self.raw_folder, filename=filename, md5=md5)
File "/root/miniconda3/envs/old_mega/lib/python3.8/site-packages/torchvision/datasets/utils.py", line 434, in download_and_extract_archive
download_url(url, download_root, filename, md5)
File "/root/miniconda3/envs/old_mega/lib/python3.8/site-packages/torchvision/datasets/utils.py", line 155, in download_url
raise RuntimeError("File not found or corrupted.")
RuntimeError: File not found or corrupted.
/root/miniconda3/envs/old_mega/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Describe your environment
...