Error resuming from checkpoint with multiple GPUs

I started training a model on two GPUs, using the following trainer:

     trainer = pl.Trainer(
          devices = [0,2], accelerator='gpu', precision=16, max_epochs=2000, 
          callbacks=checkpoint_callback, logger=pl.loggers.TensorBoardLogger('logs/'), 
          gradient_clip_val=5.0, gradient_clip_algorithm='norm')

This is set to save the best three epochs (based on the validation loss) and the last epoch:

 checkpoint_callback = ModelCheckpoint(
        monitor="val_loss",
        save_top_k=3, 
        mode="min",
        save_last=True         
    )

Training halted unexpectedly and I now want to resume it, which I did by configuring my trainer as follows:

    trainer = pl.Trainer(devices=[2,0], accelerator="gpu", precision=16, max_epochs=2000, 
         callbacks=checkpoint_callback, logger=pl.loggers.TensorBoardLogger('logs/'), 
         gradient_clip_val=5.0, gradient_clip_algorithm='norm', 
         resume_from_checkpoint="path/to/checkpoint.ckpt")

But, after initializing the two distributed processes and completing the validation sanity check, this crashes on starting the first step of the new training epoch, giving a long error stack that ends with:

File "/home/username/miniconda3/lib/
python3.8/site-packages/torch/optim/_functional.py", line 86, in adam 
   exp_avg.mul_(beta1).add_(grad, alpha=1 - beta1)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cuda:0!

So somehow it seems that it’s not correctly dividing all the tensors onto the two GPUs. I wonder if this has to do with how it’s loading the checkpoint. Am I doing something wrong here? Is this even possible, and if so how do I do it correctly?

(If I try to resume with a trainer that’s set to use just one GPU, there’s no problem.)