cartpole example not working

 I am unable to make the cartpole example working. It fails to learn even after 2000 iterations. Please check what may be wrong.  My notebook with test is here, fully based on your notebook: https://github.com/poedator/otus_data_science/blob/master/project/reinforcement_q_learning_torch_example_tested.ipynb