Using simulated self-play to solve all OpenAI Gym classic control problems with Pytorch
Marton Trencseni - Thu 14 November 2019 • Tagged with python, pytorch, reinforcement, learning, openai, gym
I use simulated self-play by ranking episodes by summed reward. Game outcomes are divided in two by cutting at the median, winners are assigned +1 rewards, losers are assigned -1 rewards, like in games like Go and Chess. Unlike naive policy gradient descent used in previous posts, this version solves all OpenAI classic control problems, albeit slowly.