Easily skipping optimizers for modular networks

My network is composed of several modules, only some of which get called each iteration. When I run my code, I get an error, probably because there aren’t gradients flowing to each module every time:

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Is there any easy way to just skip over the optimizers for modules that aren’t being used?

Are you explicitly setting something to .eval() in your LightningModule definition?

No, I’m not setting anything to .eval() during training

Could you post your LightningModule definition here?

optimizer will not skip any parameters unless you have requires_grad set to False for some layer or if that layer isn’t used in forward then grad will of-course be zero so won’t affect those parameter weights.

There error you are getting is maybe because either you are using .detach() somewhere that breaks the backward flow or all out model parameters are requires_grad=False.