Is it possible in ddp mode to log combined metrics across processes? At least val epoch end metrics?

Yes! use self.log

This syncs the metric across devices

def training_step(...)
    self.log('x', x, dist_sync=True)

@williamfalcon How can I use this with one of the logger classes like NeptuneLogger? I cant find any documentation on it.

Currently each of the processes of ddp logs separately.

I am also interested in this! For per step reporting during training and at the end of each epoch.

If I call self.logger.experiment.add_scalar("some_name", some_tensor, global_step=self.global_step) (in case of using the Tensorboard Logger) I get multiple entries for each time step. Thats hard to read and an aggregated value would be much more useful.
If I use self.logger.experiment.add_histogram("some_name", some_tensor, global_step=self.global_step) the histograms do not even appear in tensorboard due to multiples entries written to the same time step / name tag. Ideally the tensors being reported should be merged from all processes and then jointly logged.

To be clear, I am know how to do the aggregation (e.g. mean for scalars, concat for histograms), I just would like to know how do this in Pytorch Lightning?

Currently I am thinking of a work around, for which one creates a class inheriting from EvalResult or TrainResult, which would have an additional function like def custom_log(self, value, fn, sync_dist=True) which takes a lambda like fn = lambda x: self.logger.experiment.add_scalar("some_name", x, self.global_step).
This function would then use the standard sync from the base class, but execute the lambda on the aggregated values.
Probably not the cleanest API, but it could work.