I have a reinforcement based neural network training on the OpenAI gym CartPole-v1 environment. For the structure and training algorithm, assume it is the same as the one in this article.
Typically, it averages becoming more effective, eventually perfectly solving the environment (500/500 reward for several hundred games in a row) but then starts to regress. Typically somewhere in the range of 100~150/500 total reward. This seems to happen regardless of learning rate, but I haven’t tested with very low learning rates (less than 0.01) because of the amount of time it takes to train.
Can anyone tell me why this happens? I can’t seem to find any literature on it, but perhaps I just don’t know the name.