Validation_step and validation_epoch_end won't get called in trainer.fit() routine

Hello,

my Problem is the following:

If I use the normal data loader for getting the training data loaded into the trainer.fit() routine, everything works fine. (validation step after each epoch)

However, when I create a custom batch sampler (pulling even amount of events from each class), inside the the trainer loop, only the training_step gets executed (behaviour here seems as expected).

The validation step then gets only executed in the initial validation check.

You can see my Batch sampler here (Susy1LeptonAnalysis/PytorchHelp.py at pytorch_tryout · frengelk/Susy1LeptonAnalysis · GitHub)

In the iter method, I create for each step_per_epoch (int, definded by me) an array of indices, which gets returned by yield array here (last line of previous link, I can only put 2 links in my post).

The model is defined here:
(Same file as Batch Sampler, starting in line 49.)

The trainer gets called here:

I know that the code is nested and embedded in luigi, so it might be difficult to read at some points.

If you have any questions, or need more information, I am happy to make my problem easier to understand.

Best regards and thanks in advance,
Frederic

from a quick look, I don’t think you are using the BatchSampler for the validation dataloader.

We have moved the discussions to GitHub Discussions. You might want to check that out instead to get a quick response. The forums will be marked read-only after some time.

Thank you

Hi Frengelk, have you already solved your problem? I might meet the same problem. Waiting for your reply :slight_smile: Thanks in advance!

Hi - have you already solved this problem? I am having the same problem!

Hi @Cynthia_Maldonado, could you check if you are using BatchSampler correcly?

>>> sampler = DistributedSampler(dataset) if is_distributed else None
>>> loader = DataLoader(dataset, shuffle=(sampler is None),
...                     sampler=sampler)