Code stops executing after 1 epoch

I’m trying to replicate the MNIST example from here

But the code runs for 1 epoch and then gets stuck when it starts the 2nd epoch. It doesn’t throw an error but also doesn’t continue executing.

Please find the colab notebook here

I really need to be able to use TPU for my training, please let me know what I’m doing wrong.

Thanks in Advance

I actually ran your code with just a small modification.

trainer = pl.Trainer(tpu_cores=8, max_epochs=5)

Epoch 2 started in my case. This might not be the problem you are facing, can you please check?

It’s stuck after the 1st step in the second epoch.

Can you share the screenshot from what you saw