Machine learning system fails when the training data distribution is significantly different from testing data. For a “simple” problem like image classification, we can avoid this problem by including sufficient diversity for the training data. But for more complicated real-world problems, such as robotic AI, there are simply too many different possibilities that one cannot include all of them in the training data. To improve robustness, we may choose to perturb the domain (distribution of the training data) randomly. This is known as domain randomization. It is similar to data augmentation. However, it is applied in a global manner. For example, we may change the environment slightly by changing the gravitational force of the physical system.
Domain randomization often requires human involvement. One work by OpenAI used for solving Rubik cube is known as automatic domain randomization, where they designed different “curriculum” for the learning system and they only extend the distribution of the domain when the learner surpassed a certain predefined threshold.
Another possibility for domain randomization is by using adversarial networks. A network is trained simultaneously to adjust the domain parameters to reduce the learner’s performance. The problem of such an approach, however, is that it may lead to unrealistic domains not solvable by any learner.
Yet another recent approach is to modify the reward of the adversarial network. Rather than minimizing the learner’s score, the network tries to maximize its regret, which is defined as the difference between the reward of the “protagonist” and the “antagonist” learners. Both of them are trained at the same time. So the adversarial network has an incentive to create a solvable environment but with reasonable difficulty.