Skip to content

[BUG] - <title>No output in tutorial of "Writing Distributed Applications with PyTorch"" #3003

Closed
@chrischang80

Description

@chrischang80

Add Link

https://pytorch.org/tutorials/intermediate/dist_tuto.html

Describe the bug

After running the code in this tutorial I found there is no output in these sample program. For example, if I execute the code snippet like below, there's no result print back.

"""run.py:"""
#!/usr/bin/env python
import os
import torch
import torch.distributed as dist
import torch.multiprocessing as mp

""" All-Reduce example."""
def run(rank, size):
    """ Simple collective communication. """
    group = dist.new_group([0, 1])
    tensor = torch.ones(1)
    dist.all_reduce(tensor, op=dist.ReduceOp.SUM, group=group)
    print('Rank ', rank, ' has data ', tensor[0])

def init_process(rank, size, fn, backend='gloo'):
    """ Initialize the distributed environment. """
    os.environ['MASTER_ADDR'] = '127.0.0.1'
    os.environ['MASTER_PORT'] = '29500'
    dist.init_process_group(backend, rank=rank, world_size=size)
    fn(rank, size)


if __name__ == "__main__":
    size = 2
    processes = []
    mp.set_start_method("spawn")
    for rank in range(size):
        p = mp.Process(target=init_process, args=(rank, size, run))
        p.start()
        processes.append(p)

    for p in processes:
        p.join()

Finally, I found if I replace the code

mp.set_start_method("spawn")

with

mp.get_context("spawn")

then everything is fine.

Describe your environment

Google Colab

cc @wconstab @osalpekar @H-Huang @kwen2501

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions