How to sync rouge score between different process?

I’m writing a project about finetuning a sequence generation model. I’m looking for an example about how to gather different generative results on different GPU( one machine multiple GPU ), to calculate a correct rouge score for the whole validation dataset? I know DDP could help me to sync tensor on different devices, but I have no idea how to gather rouge scores on different devices.