I’m looking for a good way to sync my output dir name (which contains a timestamp etc) between DDP processes. For now, I’m doing something like this:
local_rank = os.environ.get('LOCAL_RANK', 0) if local_rank == 0: now = datetime.now(dateutil.tz.tzlocal()) timestamp = now.strftime('%Y_%m_%d_%H_%M_%S') run_output_dir = os.path.join(cfg.output_dir, '%s_%s_%s_%s' % (cfg.dataset, cfg.cfg_name, timestamp, cfg.seed)) os.environ['RUN_OUTPUT_DIR'] = run_output_dir else: run_output_dir = os.environ['RUN_OUTPUT_DIR']
Is this OK or does someone have a better solution?
I’ve tried to use
torch.distributed.recv, but these only work for tensors.
I’m also using the
WandBLogger, so I have considered having all processes save output to
wandb_logger.experiment.dir, but that doesn’t work because the logger returns a dummy experiment in all but the main process (link).