Skip to content

DQN tutorial fixes #2026

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Sep 6, 2022
Merged

DQN tutorial fixes #2026

merged 4 commits into from
Sep 6, 2022

Conversation

SiftingSands
Copy link
Contributor

@SiftingSands SiftingSands commented Sep 5, 2022

Gym's domain name has changed since #1928 (comment)

Minor updates to Gym calls, which would produce errors with the latest version of Gym (0.25.2). (possibly fixes #1432 (comment))

Still doesn't train properly as noted in #1755 (comment) , but at least it runs now as shown in the figure below (didn't touch any other settings in the tutorial).

Figure_2

@facebook-github-bot
Copy link
Contributor

Hi @SiftingSands!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

@SiftingSands SiftingSands changed the title Dqn fix DQN tutorial fixes Sep 5, 2022
@netlify
Copy link

netlify bot commented Sep 5, 2022

Deploy Preview for pytorch-tutorials-preview ready!

Name Link
🔨 Latest commit cf2e13a
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-tutorials-preview/deploys/63176793fd8b890008e7f3f1
😎 Deploy Preview https://deploy-preview-2026--pytorch-tutorials-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@svekars svekars requested a review from vmoens September 6, 2022 15:29
@svekars svekars requested a review from nairbv September 6, 2022 15:30
Copy link
Contributor

@vmoens vmoens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @SiftingSands for this
LGTM, if you don't mind I'll have a deeper look at the tuto and check why it doesn't learn before merging.
Forgive me for putting this on hold!

@svekars svekars merged commit 9fb9f47 into pytorch:master Sep 6, 2022
@SiftingSands
Copy link
Contributor Author

SiftingSands commented Sep 6, 2022

@vmoens No worries at all. I did try tweaking various hyperparameters and doing reward shaping to get it to train within 500 to 1000 episodes. As with others, I saw non-monotonic reward history, where the agent would perform well then forget how to solve the problem again. I believe this is to be expected for vanilla DQN learning from pixels instead of the raw state variables.

I can send over my best results in a new merge request later this evening (GMT -4) if you're interested (still only able to consistently get ~100 timesteps by 1000 episodes, but I haven't tried saving the "best" model to see how it generalizes). I tried to keep my changes minor relative to Adam's original work such as on only using the "duration" in the reward and not using any state variables such as angle or position.

@SiftingSands SiftingSands deleted the DQN_fix branch September 6, 2022 18:28
@vmoens
Copy link
Contributor

vmoens commented Sep 6, 2022

@vmoens No worries at all. I did try tweaking various hyperparameters and doing reward shaping to get it to train within 500 to 1000 episodes. As with others, I saw non-monotonic reward history, where the agent would perform well then forget how to solve the problem again. I believe this is to be expected for vanilla DQN learning from pixels instead of the raw state variables.

I can send over my best results in a new merge request later this evening (GMT -4) if you're interested (still only able to consistently get ~100 timesteps by 1000 episodes, but I haven't tried saving the "best" model to see how it generalizes). I tried to keep my changes minor relative to Adam's original work such as on only using the "duration" in the reward and not using any state variables such as angle or position.

Sure let's keep iterating on this. There are some things I'm trying ok my side too!
Do you know if it used to train and then it didn't or did we always have this performance?

@SiftingSands
Copy link
Contributor Author

Sure let's keep iterating on this. There are some things I'm trying ok my side too! Do you know if it used to train and then it didn't or did we always have this performance?

Unfortunately, I do not know if it trained successfully in the past. I looked through a few blogs to see if anyone reported it working, but all successful examples involved significant modifications such as using the state variables instead of pixels. I wonder if it's always been somewhat "broken", since there's this issue from 2018 #209 about missing the target network. Just speculating here though...

malfet added a commit that referenced this pull request Oct 5, 2022
By adding pygame to the requirements and bumping gym to 0.25.0 (to account for changes introduced by #2026 )

Add cell for installing `gym` on Colab
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reinforcement Tutorial (DQN)
4 participants