Hello, I’m trying to use the 4 GPUs on my machine to train a huggingface model for a project. Single GPU with 32 bit precision works without any problems (16 bit is not working and I’ve asked a question about it here ). Multi GPU with 32 bit precision just hangs. I’m running this on a Jupyter NB and…

Multigpu training just hangs

sudarshan85 January 8, 2021, 4:29am 7

This is issue is tied to the other issue based on my silly mistake! It got solved. FYI, the silly mistake: I copied my Lightning module that I created for a similar project over. When I copied the training_step, by mistake I didn’t include return loss. So basically, training was being done with no loss.

show post in topic

Home
Categories
FAQ/Guidelines
Terms of Service
Privacy Policy

Powered by Discourse, best viewed with JavaScript enabled