-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Update DDP Tutorial to remove Single-Process Multi-Device Use Case #973
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Deploy preview for pytorch-tutorials-preview ready! Built with commit 5ca59d7 https://deploy-preview-973--pytorch-tutorials-preview.netlify.app |
The corresponding example PR is pytorch/examples#763 |
Hey @kiukchung this currently links to https://github.com/pytorch/elastic. Let me know if you prefer https://pytorch.org/elastic/0.2.0rc1/index.html or a different link. Thanks! |
intermediate_source/ddp_tutorial.rst
Outdated
create a DDP instance in each process. DDP uses collective communications in the | ||
`torch.distributed <https://pytorch.org/tutorials/intermediate/dist_tuto.html>`__ | ||
package to synchronize gradients and buffers. More specifically, DDP inserts | ||
an autograd hook for each model parameter which will fire when the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the autograd hook indeed per-parameter as given by model.parameters()
? If user takes a model and then manually adds a nn.Parameter()
to it, will DDP sync gradients from this parameter as well? If so, might be worth mentioning this (maybe in the docs is better though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If user manually adds a nn.Parameter()
the Model.__setattr__
should detect that and call register_parameter
for it, so this manually added parameter would also be included ini model.parameters()
. Let me modify this to "each model parameter in model.parameters()
".
intermediate_source/ddp_tutorial.rst
Outdated
are inevitable due to, e.g., network delays, resource contentions, | ||
unpredictable workload spikes. To avoid timeouts in these situations, make | ||
sure that you pass a sufficiently large ``timeout`` value when calling | ||
In DDP, the constructor and the backward pass are distributed synchronization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noticed we removed forward method, was it inaccurate that it was a point of synchronization or did something change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kiuk
@mrshenli that would be awesome, the correct link is https://pytorch.org/elastic (it redirects to the lastest release)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noticed we removed forward method, was it inaccurate that it was a point of synchronization or did something change?
There is sync in forward to broadcast buffers from rank 0 to all other ranks. I removed this as 1) there is no intra-node sync in MPSD mode 2) buffer is not used in this tutorial. But give it a second thought, I think it might be better to keep this to stay consistent with the previous version which is correct for general cases. Thanks for spotting this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
links are updated, thx!
As we do not recommend using Single-Process Multi-Device mode with DDP due to the reduced throughput and as this mode is broken in v1.5, added this PR to update DDP tutorial to using Multi-Process Single-Device mode. This PR also includes the following minor fixes: 1. highlight that this tutorial requires at least 8 GPUs to run 2. added a pointer to RPC 3. added a pointer to torchelastic 4. as DDP now broadcast initial param values from rank 0 to others it no longer require setting RNG seed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Update DDP Tutorial to remove Single-Process Multi-Device Use Case
As we do not recommend using Single-Process Multi-Device mode with
DDP due to the reduced throughput and as this mode is broken in
v1.5, added this PR to update DDP tutorial to using Multi-Process
Single-Device mode.
This PR also includes the following minor fixes:
it no longer require setting RNG seed.