Incorrect and indeterministic category scores in multi-gpu inference

# Description
During multi-GPU inference, I am experiencing issues with category scores that are both incorrect and non-deterministic. Single-GPU or CPU-based inference produces consistent and expected results without this issue.

# Environment
- OS: Ubuntu 20.04
- Python version: Python 3.11.9

# Steps to Reproduce
1. Step 1 : Generate images using the prompts provided [prompts](https://github.com/TencentQQGYLab/ELLA/blob/main/dpg_bench/prompts).
2. Step 2 : Execute the evaluation command:
```bash
bash dpg_bench/dist_eval.sh $YOUR_IMAGE_PATH $RESOLUTION
```
3. step 3：Re-run step 2.

# Expected Behavior
- Evaluating the same images should yield identical L1 category scores, L2 category scores, and DPG-Bench scores across multiple runs when multi-GPU inference.

# Actual Behavior
- When evaluating the same images, the L1 and L2 category scores vary between runs when multi-GPU inference.

# Possible Solutions or Workarounds
- `gather_object` function does not ensure a consistent order of `global_categories` when running on multiple GPUs. To address this, we need to enforce a consistent order.
- Modify the `compute_dpg_bench.py` script at [line 222](https://github.com/TencentQQGYLab/ELLA/blob/3c228f1dc6c4d3cad0a47493816151a419f14db3/dpg_bench/compute_dpg_bench.py#L222) as follows:
```python
# global_categories = set(global_categories)
global_categories = sorted(set(global_categories))
```
This change ensures that the categories are processed in a fixed, sorted order, which may resolve the inconsistency observed during multi-GPU inference.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect and indeterministic category scores in multi-gpu inference #56

Description

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Possible Solutions or Workarounds

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Incorrect and indeterministic category scores in multi-gpu inference #56

Description

Description

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Possible Solutions or Workarounds

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions