Closed
Description
Opening because #683 is closed and the issue persists. When running the vanilla code from the Q learning example, even for 10k episodes, the agent does not learn. The reward is constant at 1.0, and losses increase over time.
$ python3 -c 'import torch;print(torch.__version__)'
1.10.0a0
Metadata
Metadata
Assignees
Labels
No labels