Gradient Manipulation for Multitasking


I would like to implement [1812.02224] Adapting Auxiliary Losses Using Gradient Similarity. Where I have a main loss and multiple different auxiliary losses and would pass these auxiliary losses to optimizer if the gradients have cosine similarity over zero with the main task loss. For this I calculate gradients with respect to each loss using torch.autograd.grad and calculate cosine similarities and add only the selected task losses to optimized loss. However, since I don’t know how to pass these gradients to optimizer I calculate the backwards two times every step. I would like to learn how to implement this in an effective way.

Thanks a lot.