-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Hi, thank you for sharing your code.
I'm trying to follow your instructions but when I run discovery code, it fails to load pretrained model.
My environment is
- Ubuntu 16.04 LTS
- 2ea Nvidia RTX3090
- python 3.8, cuda 11.0, pytorch 1.7.1, torchvision 0.8.2
- same version of pytorch-lightning and lightning-bolts as the repo
My errors are
Traceback (most recent call last):
File "main_discover.py", line 280, in
main(args)
File "main_discover.py", line 266, in main
model = Discoverer(**args.dict)
File "main_discover.py", line 70, in init
state_dict = torch.load(self.hparams.pretrained, map_location=self.device)
File "/home/dircon/anaconda3/envs/uno/lib/python3.8/site-packages/torch/serialization.py", line 594, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "/home/dircon/anaconda3/envs/uno/lib/python3.8/site-packages/torch/serialization.py", line 853, in _load
result = unpickler.load()
File "/home/dircon/anaconda3/envs/uno/lib/python3.8/site-packages/torch/serialization.py", line 845, in persistent_load
load_tensor(data_type, size, key, _maybe_decode_ascii(location))
File "/home/dircon/anaconda3/envs/uno/lib/python3.8/site-packages/torch/serialization.py", line 833, in load_tensor
storage = zip_file.get_storage_from_record(name, size, dtype).storage()
RuntimeError: [enforce fail at inline_container.cc:145] . PytorchStreamReader failed reading file data/94820505364016: invalid header or archive is corrupted
I believe it's due to distributed data parallel(ddp) but how can I stop from multiple cards to save the model?