Open
Description
📚 Documentation
I believe the optimizer in this example should be declared after the parallelize module call, as in sequence parallelism. Without this, in latest torch, the example seems to not update the weights and thus not truly train. Please lmk if im missing anything and thanks so much for all your work!
Tiny fix PR below:
#1324
Metadata
Metadata
Assignees
Labels
No labels