Hindsight experience replay

A motivating example of hindsight experience replay is a bit flipping problem. Say start with a binary number as an initial state, an action can be flipping any arbitrary bit, and the goal is to reach a particular state. It is, of course, a toy problem that can be solved easily. However, say if we want to train a reinforcement model to reach the final goal. The problem is that only when a naive reward will only be awarded when we reach the final state. When the initial actor agent is random, it is very difficult to reach a final state, and so a reward will never be assigned, and thus the model can never be trained.

The simple idea of hindsight experience replay is to modify the goal in hindsight. So we can always reach the goal after one episode, and so training can proceed accordingly.

Leave a Reply

Your email address will not be published. Required fields are marked *