Skip to content

Commit 3b2fd56

Browse files
Resolve conflict
1 parent 397057d commit 3b2fd56

File tree

1 file changed

+2
-21
lines changed

1 file changed

+2
-21
lines changed

intermediate_source/reinforcement_q_learning.py

Lines changed: 2 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -3,58 +3,41 @@
33
Reinforcement Learning (DQN) Tutorial
44
=====================================
55
**Author**: `Adam Paszke <https://github.com/apaszke>`_
6-
7-
86
This tutorial shows how to use PyTorch to train a Deep Q Learning (DQN) agent
97
on the CartPole-v1 task from the `OpenAI Gym <https://www.gymlibrary.dev/>`__.
10-
118
**Task**
12-
139
The agent has to decide between two actions - moving the cart left or
1410
right - so that the pole attached to it stays upright. You can find an
1511
official leaderboard with various algorithms and visualizations at the
1612
`Gym website <https://www.gymlibrary.dev/environments/classic_control/cart_pole>`__.
17-
1813
.. figure:: /_static/img/cartpole.gif
1914
:alt: cartpole
20-
2115
cartpole
22-
2316
As the agent observes the current state of the environment and chooses
2417
an action, the environment *transitions* to a new state, and also
2518
returns a reward that indicates the consequences of the action. In this
2619
task, rewards are +1 for every incremental timestep and the environment
2720
terminates if the pole falls over too far or the cart moves more then 2.4
2821
units away from center. This means better performing scenarios will run
2922
for longer duration, accumulating larger return.
30-
3123
The CartPole task is designed so that the inputs to the agent are 4 real
3224
values representing the environment state (position, velocity, etc.).
3325
We take these 4 inputs without any scaling and pass them through a
3426
small fully-connected network with 2 outputs, one for each action.
3527
The network is trained to predict the expected value for each action,
3628
given the input state. The action with the highest expected value is
3729
then chosen.
38-
39-
4030
**Packages**
41-
42-
4331
First, let's import needed packages. Firstly, we need
4432
`gym <https://github.com/openai/gym>`__ for the environment
4533
Install by using `pip`. If you are running this in Google colab, run:
46-
4734
.. code-block:: bash
48-
4935
%%bash
5036
pip3 install gym[classic_control]
51-
5237
We'll also use the following from PyTorch:
53-
5438
- neural networks (``torch.nn``)
5539
- optimization (``torch.optim``)
5640
- automatic differentiation (``torch.autograd``)
57-
5841
"""
5942

6043
import gym
@@ -443,14 +426,12 @@ def optimize_model():
443426
break
444427

445428
print('Complete')
446-
env.render()
447-
env.close()
448429
durations_t = torch.tensor(episode_durations, dtype=torch.float)
449430
plt.title('Result')
450431
plt.xlabel('Episode')
451432
plt.ylabel('Duration')
452-
plt.ioff()
453433
plt.plot(durations_t.numpy())
434+
plt.ioff()
454435
plt.show()
455436

456437
######################################################################
@@ -464,4 +445,4 @@ def optimize_model():
464445
# Optimization picks a random batch from the replay memory to do training of the
465446
# new policy. The "older" target_net is also used in optimization to compute the
466447
# expected Q values. A soft update of its weights are performed at every step.
467-
#
448+
#

0 commit comments

Comments
 (0)