Cuda out of memory

this is my code link: https://colab.research.google.com/drive/18dJM0iyhhiJnahkz9lnKfa4UKyDhJx08?usp=sharing

the batch size is 1… but it not work

RuntimeError: CUDA out of memory. Tried to allocate 2.68 GiB (GPU 0; 8.00 GiB total capacity; 5.36 GiB already allocated; 888.75 MiB free; 5.36 GiB reserved in total by PyTorch)

cpu is run good…

The issue is that you never reduce your spatial size which results in huge activations and a huge Linear layer (with over 700 million parameters) which blows up your gpu memory usage.

changing your network structure to the following (to reduce your spatial dimensions) works for me:

self.seq1 = nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, padding=1),
            nn.ReLU(),

            nn.MaxPool2d(2),

            nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, padding=1),
            nn.Conv2d(in_channels=256, out_channels=512, kernel_size=3, padding=1),
            torch.nn.ReLU(),

            nn.MaxPool2d(2),

            nn.Conv2d(512, 256, 3, padding=1),
            nn.ReLU(),
            nn.Conv2d(256, 128, 3, padding=1),
            nn.ReLU(),

            nn.MaxPool2d(2),

            nn.Conv2d(128, 64, 3, padding=1),
            nn.ReLU(),
            nn.Conv2d(64, 32, 3, padding=1),
            nn.ReLU(),

            nn.MaxPool2d(2),
        )
        self.dense = nn.Sequential(
            nn.Linear(32 * 14 * 14, 1024),
            nn.ReLU(),
            nn.Dropout(p=0.5),
            nn.Linear(1024, 2)
        )

also note: you don’t have to create a new instance of torch.nn.CrossEntropyLoss every time. Either create an instance once or use the functional interface :slight_smile: