Best practices for image loading in DataModule

I have a dataset of images and in regular PyTorch I would define a dataset that uses the __getitem__ function to load images as the dataset is being iterated over. How can I mimic this behavior using the DataModule? The examples I’ve seen that use the setup function seems to load all the data in that function instead of loading them one by one as the data is being iterated over. This would be a problem if my training dataset is large and I have to load all those images into memory when setup is called.

I haven’t seen an example that uses DataModule to load images/samples on the fly.

Here is my old dataset:

class ImageDataset(torch.utils.data.Dataset):    
    def __init__(self, x_train, y_train=None):
        # x_train and y_train are pandas DataFrames
        self.data = x_train
        self.label = y_train
        self.transform = transforms.Compose(
        [
            transforms.CenterCrop(128),
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
        ])
    
    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        ''' Load image, transform, and return image along with target (if present) '''
        image = Image.open(self.data.iloc[idx]["file_name"]).convert("RGB")
        image = self.transform(image)
        image_id = self.data.iloc[idx]["image_id"]
        if self.label is not None:
            label = self.label.iloc[idx]
            sample = {"image_id": image_id, "image": image, "label": label}
        else:
            sample = {
                "image_id": image_id,
                "image": image,
            }
        return sample

datamodules are meant to hold your dataloaders/datasets. You still need to do everything else related to data loading using PyTorch Dataset/Dataloader.