Skip to content

Commit 684421c

Browse files
Resolve conflict
1 parent 3b2fd56 commit 684421c

File tree

1 file changed

+18
-1
lines changed

1 file changed

+18
-1
lines changed

intermediate_source/reinforcement_q_learning.py

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,41 +3,58 @@
33
Reinforcement Learning (DQN) Tutorial
44
=====================================
55
**Author**: `Adam Paszke <https://github.com/apaszke>`_
6+
7+
68
This tutorial shows how to use PyTorch to train a Deep Q Learning (DQN) agent
79
on the CartPole-v1 task from the `OpenAI Gym <https://www.gymlibrary.dev/>`__.
10+
811
**Task**
12+
913
The agent has to decide between two actions - moving the cart left or
1014
right - so that the pole attached to it stays upright. You can find an
1115
official leaderboard with various algorithms and visualizations at the
1216
`Gym website <https://www.gymlibrary.dev/environments/classic_control/cart_pole>`__.
17+
1318
.. figure:: /_static/img/cartpole.gif
1419
:alt: cartpole
20+
1521
cartpole
22+
1623
As the agent observes the current state of the environment and chooses
1724
an action, the environment *transitions* to a new state, and also
1825
returns a reward that indicates the consequences of the action. In this
1926
task, rewards are +1 for every incremental timestep and the environment
2027
terminates if the pole falls over too far or the cart moves more then 2.4
2128
units away from center. This means better performing scenarios will run
2229
for longer duration, accumulating larger return.
30+
2331
The CartPole task is designed so that the inputs to the agent are 4 real
2432
values representing the environment state (position, velocity, etc.).
2533
We take these 4 inputs without any scaling and pass them through a
2634
small fully-connected network with 2 outputs, one for each action.
2735
The network is trained to predict the expected value for each action,
2836
given the input state. The action with the highest expected value is
2937
then chosen.
38+
39+
3040
**Packages**
41+
42+
3143
First, let's import needed packages. Firstly, we need
3244
`gym <https://github.com/openai/gym>`__ for the environment
3345
Install by using `pip`. If you are running this in Google colab, run:
46+
3447
.. code-block:: bash
48+
3549
%%bash
3650
pip3 install gym[classic_control]
51+
3752
We'll also use the following from PyTorch:
53+
3854
- neural networks (``torch.nn``)
3955
- optimization (``torch.optim``)
4056
- automatic differentiation (``torch.autograd``)
57+
4158
"""
4259

4360
import gym
@@ -445,4 +462,4 @@ def optimize_model():
445462
# Optimization picks a random batch from the replay memory to do training of the
446463
# new policy. The "older" target_net is also used in optimization to compute the
447464
# expected Q values. A soft update of its weights are performed at every step.
448-
#
465+
#

0 commit comments

Comments
 (0)