Why `precision=16` for me is almost useless for speeding up?

Part of my code is

checkpoint_callback = ModelCheckpoint(save_weights_only=False, mode="min",
        monitor="val_loss",dirpath='outputs',save_last=False,save_top_k=1)
trainer=pl.Trainer(gpus=1,strategy='dp',
        max_epochs=10,
        auto_lr_find=True,
        precision=16,
        callbacks=[
            checkpoint_callback,
            LearningRateMonitor("epoch"),
            RichProgressBar(),
        ],  
        log_every_n_steps=10,
        )
trainer.tune(model,train_loader,val_loader)
trainer.fit(model,train_loader,val_loader,ckpt_path=None)

After ten epochs, precision=32 costs 5m 33s while real time for precision=16 is 5m 55s. There are almost the same, and half precision is even a bit larger.
Used package version: pytorch-lightning 1.5.5, torch 1.10.0.
The device name is GeForce GTX 1080 Ti, cuda version is 11.1. GPU usage memory is 1167MiB, 1149MiB for precision 32 and 16 respectively. They are mostly the same.
Anybody has ever meet similary problem?

Hey,

This is expected for the 1080 Ti, as it does not support half-precision operations (as you need tensor cores (1) which the 1080Ti does have, but only first gen). First gen enables FP16 support in general but it may be slower and more memory hungry on these GPUs. It really only brought a benefit with the 20XX Series.

Best,
Justus