-
Notifications
You must be signed in to change notification settings - Fork 4.2k
DQN tutorial fixes #2026
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DQN tutorial fixes #2026
Conversation
Hi @SiftingSands! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks! |
✅ Deploy Preview for pytorch-tutorials-preview ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @SiftingSands for this
LGTM, if you don't mind I'll have a deeper look at the tuto and check why it doesn't learn before merging.
Forgive me for putting this on hold!
@vmoens No worries at all. I did try tweaking various hyperparameters and doing reward shaping to get it to train within 500 to 1000 episodes. As with others, I saw non-monotonic reward history, where the agent would perform well then forget how to solve the problem again. I believe this is to be expected for vanilla DQN learning from pixels instead of the raw state variables. I can send over my best results in a new merge request later this evening (GMT -4) if you're interested (still only able to consistently get ~100 timesteps by 1000 episodes, but I haven't tried saving the "best" model to see how it generalizes). I tried to keep my changes minor relative to Adam's original work such as on only using the "duration" in the reward and not using any state variables such as angle or position. |
Sure let's keep iterating on this. There are some things I'm trying ok my side too! |
Unfortunately, I do not know if it trained successfully in the past. I looked through a few blogs to see if anyone reported it working, but all successful examples involved significant modifications such as using the state variables instead of pixels. I wonder if it's always been somewhat "broken", since there's this issue from 2018 #209 about missing the target network. Just speculating here though... |
By adding pygame to the requirements and bumping gym to 0.25.0 (to account for changes introduced by #2026 ) Add cell for installing `gym` on Colab
Gym's domain name has changed since #1928 (comment)
Minor updates to Gym calls, which would produce errors with the latest version of Gym (0.25.2). (possibly fixes #1432 (comment))
Still doesn't train properly as noted in #1755 (comment) , but at least it runs now as shown in the figure below (didn't touch any other settings in the tutorial).