I want to train two subnetworks with their corresponding optimizers and schedulers.
However, as stated in the document,
If you use multiple optimizers, gradients will be calculated only for the parameters of the current optimizer at each training step.
Thus, I can’t jointly train both networks in one training step. Is there any way to solve this problem? Thanks!