RuntimeError: Trying to resize storage that is not resizable

Hi. I am using dataset of 2-d (x,y) coordinates with length n for each instance. For example, a batch from length-20 dataset has a shape of (batch_size, 2, length). I have 5 datasets that each has length-10, 20, 30, 50 and 100.

Anyways, when I train my model on length-10, length-20 and length-30 dataset, it works fine. But when I try to train my model on length-50 and length-100 dataset, I encounter following error in Sanity checking.

Traceback (most recent call last):
File “/home/gailab/ms/netsp/train.py”, line 81, in
main(args)
File “/home/gailab/ms/netsp/train.py”, line 58, in main
trainer.fit(model, dm, ckpt_path=ckpt_path)
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py”, line 696, in fit
self._call_and_handle_interrupt(
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py”, line 650, in _call_and_handle_i
nterrupt
return trainer_fn(*args, **kwargs)
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py”, line 737, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py”, line 1168, in _run [0/1779]
results = self._run_stage()
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py”, line 1254, in _run_stage
return self._run_train()
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py”, line 1276, in _run_train
self._run_sanity_check()
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py”, line 1345, in _run_sanity_check
val_loop.run()
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py”, line 200, in run
self.advance(*args, **kwargs)
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py”, line 155, in advance
dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs)
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py”, line 200, in run
self.advance(*args, **kwargs)
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py”, line 127, in advance
batch = next(data_fetcher)
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/utilities/fetching.py”, line 185, in next
return self.fetching_function()
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/utilities/fetching.py”, line 264, in fetching_funct$on
self._fetch_next_batch(self.dataloader_iter)
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/utilities/fetching.py”, line 278, in _fetch_next_ba$ch
batch = next(iterator)
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/utils/data/dataloader.py”, line 530, in next
data = self._next_data()
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/utils/data/dataloader.py”, line 1224, in _next_data
return self._process_data(data)
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/utils/data/dataloader.py”, line 1250, in _process_data
data.reraise()
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/_utils.py”, line 457, in reraise
raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py”, line 287, in _worker_loop
data = fetcher.fetch(index)
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py”, line 52, in fetch
return self.collate_fn(data)
File "/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/utils/data/utils/collate.py", line 137, in default_collate
out = elem.new(storage).resize
(len(batch), *list(elem.size()))
RuntimeError: Trying to resize storage that is not resizable

At first, I though it was a memory size issue because traning on dataset of less length didn’t cause this error. So I changed my batch size but it results the same.

I also tried changing num_workers for dataloaders. I was using num_workers=40 and reduced it to 20, 10 and 1 but still got same error. When I set num_workers=0, I got a different error.

Traceback (most recent call last):
File “/home/gailab/ms/netsp/train.py”, line 81, in
main(args)
File “/home/gailab/ms/netsp/train.py”, line 58, in main
trainer.fit(model, dm, ckpt_path=ckpt_path)
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py”, line 696, in fit
self._call_and_handle_interrupt(
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py”, line 650, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py”, line 737, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py”, line 1168, in _run
results = self._run_stage()
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py”, line 1254, in _run_stage
return self._run_train()
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py”, line 1276, in _run_train
self._run_sanity_check()
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py”, line 1345, in _run_sanity_check
val_loop.run() File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py”, line 200, in run
self.advance(*args, **kwargs)
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py”, line 155, in advance
dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs)
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py”, line 200, in run
self.advance(*args, **kwargs)
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py”, line 127, in advance
batch = next(data_fetcher)
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/utilities/fetching.py”, line 185, in next
return self.fetching_function()
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/utilities/fetching.py”, line 264, in fetching_function
self._fetch_next_batch(self.dataloader_iter)
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/utilities/fetching.py”, line 278, in _fetch_next_batch
batch = next(iterator)
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/utils/data/dataloader.py”, line 530, in next
data = self._next_data()
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/utils/data/dataloader.py”, line 570, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py”, line 52, in fetch
return self.collate_fn(data)
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py”, line 138, in default_collate
return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [200000, 2, 50] at entry 0 and [200000, 50] at entry 1

I literally have no idea what is this error about. Please someone help me out. Any help would be appreciated much.

I encountered the same error as your first one, and solved it by setting num_workers to 0.
For your second error dataloader calls stack() that takes a list of tensors and concatenate them into one, so the problem with your code is that the specific dataset.getitem() that you defined returned a tensor of shape [200000,2,50] when being called the first time and returned a tensor of [200000, 50] at the seconed time.

1 Like

I just got the same error as your second one XD.
you can check your dataset by printing the shape of tensor and the filename at the time __getitem()__ is being called. I just found out that some images in tiny-imagenet is 1-channel while most of them are 3-channel, and this is the root cause

Can you please explain to me how can i see which images are with one channel and which with 3? And mostly, because i think i’ve got a way at the first question, how can i change that and make all my images have the same channels? i am searching for a long time now. I have a different dataset of this topic but i don’t think that’s the problem. What can i do?