I’m trying to use the 4 GPUs on my machine to train a huggingface model for a project. Single GPU with 32 bit precision works without any problems (16 bit is not working and I’ve asked a question about it here). Multi GPU with 32 bit precision just hangs. I’m running this on a Jupyter NB and I saw this error on the terminal:
Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp-7c85b1e2.so.1 library. Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it.
This seems to be a Pytorch error that is discussed here and there are a couple of solutions proposed one of which is to import
torch.multiprocessing. I am importing
numpy first (and don’t actually import
torch.multiprocessing) but I’m not sure how PL does it.
I’m using Pytorch version
1.7.1 and PL version
1.1.2. Anyone else run into this problem? Is there a solution to this?