Skip to content

Correctness of when to call set_device in the docs for DDP #2859

Closed
@craymichael

Description

@craymichael

📚 The doc issue

In the docs tutorial on how to set up Multi-GPU training, it is suggested that the following is the proper way to setup each process (initializing the, e.g., NCCL, process group and then calling torch.cuda.set_device(rank)):

def ddp_setup(rank: int, world_size: int):
    """
    Args:
        rank: Unique identifier of each process
        world_size: Total number of processes
    """
    os.environ["MASTER_ADDR"] = "localhost"
    os.environ["MASTER_PORT"] = "12355"
    init_process_group(backend="nccl", rank=rank, world_size=world_size)
    torch.cuda.set_device(rank)

However, these issues suggest that the proper way is to call set_device before initializing the process group:

Which is the correct order? Are there pauses or slowdowns if the order changes?

Suggest a potential alternative/fix

No response

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions