I’m trying to use the 4 GPUs on my machine to train a huggingface model for a project. Single GPU with 32 bit precision works without any problems (16 bit is not working and I’ve asked a question about it here). Multi GPU with 32 bit precision just hangs. I’m running this on a Jupyter NB and I saw this error on the terminal:
Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp-7c85b1e2.so.1 library.
Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it.
This seems to be a Pytorch error that is discussed here and there are a couple of solutions proposed one of which is to import numpy before torch.multiprocessing. I am importing numpy first (and don’t actually import torch.multiprocessing) but I’m not sure how PL does it.
I’m using Pytorch version 1.7.1 and PL version 1.1.2. Anyone else run into this problem? Is there a solution to this?
I set MKL_SERVICE_FORCE_INTEL=1, but still got the following error:
Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp-7c85b1e2.so.1 library.
Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it.
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/miniconda3/envs/eml/lib/python3.7/multiprocessing/spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "/home/miniconda3/envs/eml/lib/python3.7/multiprocessing/spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
AttributeError: Can't get attribute 'EmailAuthorSequenceClf' on <module '__main__' (built-in)>