On the MNIST autoencoder example from here, and more importantly my code, when I set profiler="pytorch"
I only get statistics for whatever “records” is. I don’t get stats for training_step_and_backward
, training_step
, backward
, validation_step
, test_step
, and predict_step
like the documentation says I’m supposed to. Is there something else I need to do to profile my training? I’m on torch 1.9.0+cu111, torchvision 0.10.0+cu111 and pytorch-lightning 1.4.1.
Here’s the console output:
python testmnistautoencoder.py
/home/enolan/mystuff/code/clip-gen/venv/lib/python3.9/site-packages/torchvision/datasets/mnist.py:498: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:180.)
return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
| Name | Type | Params
---------------------------------------
0 | encoder | Sequential | 50.4 K
1 | decoder | Sequential | 51.2 K
---------------------------------------
101 K Trainable params
0 Non-trainable params
101 K Total params
0.407 Total estimated model params size (MB)
/home/enolan/mystuff/code/clip-gen/venv/lib/python3.9/site-packages/pytorch_lightning/trainer/data_loading.py:105: UserWarning: The dataloader, train dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 16 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
rank_zero_warn(
Epoch 24: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 235/235 [00:06<00:00, 35.37it/s, loss=0.0376, v_num=15]
FIT Profiler Report
Profile stats for: records
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
ProfilerStep* 4.54% 4.265ms 98.37% 92.353ms 30.784ms 0.000us 0.00% 517.000us 172.333us 3
enumerate(DataLoader)#_SingleProcessDataLoaderIter._... 59.39% 55.760ms 82.97% 77.897ms 25.966ms 0.000us 0.00% 0.000us 0.000us 3
aten::to 4.20% 3.939ms 11.01% 10.336ms 6.464us 0.000us 0.00% 193.000us 0.121us 1599
optimizer_step_and_closure_0 0.18% 173.000us 9.07% 8.516ms 2.839ms 0.000us 0.00% 324.000us 108.000us 3
Optimizer.step#Adam.step 1.07% 1.000ms 8.87% 8.330ms 2.777ms 0.000us 0.00% 324.000us 108.000us 3
aten::div 3.94% 3.699ms 6.66% 6.253ms 7.836us 16.000us 2.01% 16.000us 0.020us 798
training_step_and_backward 0.55% 516.000us 5.60% 5.257ms 1.752ms 0.000us 0.00% 165.000us 55.000us 3
aten::copy_ 3.64% 3.421ms 4.83% 4.533ms 2.889us 195.000us 24.47% 195.000us 0.124us 1569
aten::select 2.80% 2.633ms 3.19% 2.995ms 1.942us 0.000us 0.00% 0.000us 0.000us 1542
backward 2.57% 2.417ms 2.67% 2.509ms 836.333us 0.000us 0.00% 1.000us 0.333us 3
aten::empty_strided 2.09% 1.965ms 2.09% 1.965ms 1.255us 0.000us 0.00% 0.000us 0.000us 1566
aten::permute 1.52% 1.431ms 2.06% 1.932ms 2.516us 0.000us 0.00% 0.000us 0.000us 768
aten::view 2.03% 1.904ms 2.03% 1.904ms 2.432us 0.000us 0.00% 0.000us 0.000us 783
aten::stack 0.47% 445.000us 1.85% 1.734ms 578.000us 0.000us 0.00% 0.000us 0.000us 3
training_step 0.62% 585.000us 1.75% 1.639ms 546.333us 0.000us 0.00% 152.000us 50.667us 3
cudaLaunchKernel 1.65% 1.550ms 1.65% 1.550ms 4.572us 0.000us 0.00% 0.000us 0.000us 339
aten::as_strided 0.98% 917.000us 1.48% 1.387ms 0.438us 0.000us 0.00% 0.000us 0.000us 3165
aten::item 1.18% 1.112ms 1.27% 1.191ms 1.545us 0.000us 0.00% 0.000us 0.000us 771
cudaStreamSynchronize 0.84% 786.000us 0.84% 786.000us 131.000us 0.000us 0.00% 0.000us 0.000us 6
aten::add_ 0.41% 388.000us 0.74% 696.000us 9.667us 49.000us 6.15% 49.000us 0.681us 72
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Self CPU time total: 93.887ms
Self CUDA time total: 797.000us
And the source for testmnistautoencoder.py
:
import os
import torch
from torch import nn
import torch.nn.functional as F
from torchvision import transforms
from torchvision.datasets import MNIST
from torch.utils.data import DataLoader, random_split
import pytorch_lightning as pl
class LitAutoEncoder(pl.LightningModule):
def __init__(self):
super().__init__()
self.encoder = nn.Sequential(
nn.Linear(28*28, 64),
nn.ReLU(),
nn.Linear(64, 3)
)
self.decoder = nn.Sequential(
nn.Linear(3, 64),
nn.ReLU(),
nn.Linear(64, 28*28)
)
def forward(self, x):
# in lightning, forward defines the prediction/inference actions
embedding = self.encoder(x)
return embedding
def training_step(self, batch, batch_idx):
# training_step defined the train loop.
# It is independent of forward
x, y = batch
x = x.view(x.size(0), -1)
z = self.encoder(x)
x_hat = self.decoder(z)
loss = F.mse_loss(x_hat, x)
# Logging to TensorBoard by default
self.log('train_loss', loss)
return loss
def configure_optimizers(self):
optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)
return optimizer
dataset = MNIST(os.getcwd(), download=True, transform=transforms.ToTensor())
train_loader = DataLoader(dataset, batch_size=256)
# init model
autoencoder = LitAutoEncoder()
# most basic trainer, uses good defaults (auto-tensorboard, checkpoints, logs, and more)
# trainer = pl.Trainer(gpus=8) (if you have GPUs)
trainer = pl.Trainer(profiler="pytorch", gpus=1, max_epochs = 25)
trainer.fit(autoencoder, train_loader)
Any help would be very much appreciated.