Skip to content

Update DDP Tutorial to remove Single-Process Multi-Device Use Case #973

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 29, 2020

Conversation

mrshenli
Copy link
Contributor

@mrshenli mrshenli commented Apr 28, 2020

As we do not recommend using Single-Process Multi-Device mode with
DDP due to the reduced throughput and as this mode is broken in
v1.5, added this PR to update DDP tutorial to using Multi-Process
Single-Device mode.

This PR also includes the following minor fixes:

  1. highlight that this tutorial requires at least 8 GPUs to run
  2. added a pointer to RPC
  3. added a pointer to torchelastic
  4. as DDP now broadcast initial param values from rank 0 to others
    it no longer require setting RNG seed.

@netlify
Copy link

netlify bot commented Apr 28, 2020

Deploy preview for pytorch-tutorials-preview ready!

Built with commit 5ca59d7

https://deploy-preview-973--pytorch-tutorials-preview.netlify.app

@mrshenli
Copy link
Contributor Author

The corresponding example PR is pytorch/examples#763

@mrshenli mrshenli changed the title [WIP] Update DDP Tutorial to remove Single-Process Multi-Device Use Case Update DDP Tutorial to remove Single-Process Multi-Device Use Case Apr 28, 2020
@mrshenli
Copy link
Contributor Author

Hey @kiukchung this currently links to https://github.com/pytorch/elastic. Let me know if you prefer https://pytorch.org/elastic/0.2.0rc1/index.html or a different link. Thanks!

@mrshenli
Copy link
Contributor Author

create a DDP instance in each process. DDP uses collective communications in the
`torch.distributed <https://pytorch.org/tutorials/intermediate/dist_tuto.html>`__
package to synchronize gradients and buffers. More specifically, DDP inserts
an autograd hook for each model parameter which will fire when the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the autograd hook indeed per-parameter as given by model.parameters()? If user takes a model and then manually adds a nn.Parameter() to it, will DDP sync gradients from this parameter as well? If so, might be worth mentioning this (maybe in the docs is better though)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If user manually adds a nn.Parameter() the Model.__setattr__ should detect that and call register_parameter for it, so this manually added parameter would also be included ini model.parameters(). Let me modify this to "each model parameter in model.parameters()".

are inevitable due to, e.g., network delays, resource contentions,
unpredictable workload spikes. To avoid timeouts in these situations, make
sure that you pass a sufficiently large ``timeout`` value when calling
In DDP, the constructor and the backward pass are distributed synchronization
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed we removed forward method, was it inaccurate that it was a point of synchronization or did something change?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kiuk

@mrshenli that would be awesome, the correct link is https://pytorch.org/elastic (it redirects to the lastest release)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed we removed forward method, was it inaccurate that it was a point of synchronization or did something change?

There is sync in forward to broadcast buffers from rank 0 to all other ranks. I removed this as 1) there is no intra-node sync in MPSD mode 2) buffer is not used in this tutorial. But give it a second thought, I think it might be better to keep this to stay consistent with the previous version which is correct for general cases. Thanks for spotting this!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

links are updated, thx!

As we do not recommend using Single-Process Multi-Device mode with
DDP due to the reduced throughput and as this mode is broken in
v1.5, added this PR to update DDP tutorial to using Multi-Process
Single-Device mode.

This PR also includes the following minor fixes:

1. highlight that this tutorial requires at least 8 GPUs to run
2. added a pointer to RPC
3. added a pointer to torchelastic
4. as DDP now broadcast initial param values from rank 0 to others
   it no longer require setting RNG seed.
Copy link
Member

@rohan-varma rohan-varma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@jlin27 jlin27 merged commit f557ee0 into pytorch:master Apr 29, 2020
rodrigo-techera pushed a commit to Experience-Monks/tutorials that referenced this pull request Nov 29, 2021
Update DDP Tutorial to remove Single-Process Multi-Device Use Case
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants