Self-supervised learning

A very good lecture by Ishan Misra summarized many self-supervised learning methods. Idea of self-supervised learning is very simple. We are trying to design some pretext tasks where the labels can be obtained for free. And instead of training a model with real task (down stream task), we will pretrain the model with these pretext tasks.

Let’s summarize some of basic ones below

  • For images
    • Take 8 surrounding patches from a center patch and predict which of the 8 locations those patches are actually belonged to. This is one of the earliest proposed pretext tasks.
    • Predict rotation of an image (no rotation, 90 degrees, 180 degrees, 270 degrees)
    • Fill in the blank (inpainting)
  • For videos
    • Predict whether a frame is between two adjacent frames or not (replace by an unrelated one)

Many more sophisticated approaches are based on contrastive learning. That is, basically trying to increase the distance between two unrelated patches (negative samples) and decrease the distance of related patches (positive samples). This includes PIRL and the more heavily trained SIMCLR. One thing that seemed to be important in the lecture is the memory bank trick. Instead of comparing features of two samples directly. One can have both features compare with a running average stored in the memory bank. I am not completely sure how they are going to implement it. But it seems quite similar to trick in double DQN to potentially remove bias.

===

what are some techniques to do self-supervised learning?

Self-supervised learning is a type of machine learning in which a model is trained on a dataset without explicit labels or annotations. Some common techniques for self-supervised learning include:

  • Predictive coding, in which a model is trained to predict the next frame in a sequence of images or videos
  • Contrastive learning, in which a model is trained to maximize the similarity of representations for different views of the same data, while minimizing the similarity of representations for different data
  • Generative adversarial networks (GANs), in which two models are trained simultaneously, with one model generating data and the other model trying to distinguish between real and fake data
  • Autoencoders, in which a model is trained to encode input data into a lower-dimensional representation and then decode it back to its original form

These techniques can be used to train models to learn useful features and representations from data without the need for explicit labels or annotations.

 

Leave a Reply

Your email address will not be published. Required fields are marked *