Let’s summarize a couple techniques for visualization. Here all networks are trained to do classification.
- The simplest one is to randomly black out part of the image and evalute the classification result. For example, the dog score should drop when we black out the dog face. This approach is readily applicable to any classifier regardless of structure.
- Another approach is through back-prop. We can simply start with the neuron of interest and propagate back. For pooling layer, only the “activated” connection is propagate back. And for matrix multiplication, it will just correspond to multiplication of the matrix transpose. For RELU unit, we can have several options
- Just handle like regular backprop, i.e., only pass gradient back when input to RELU is positive
- “Deconv” approach, pass gradient back when the gradient is positive (essentially take “inverse” RELU just as RELU)
- “Guided” backprop, pass gradient back only when both gradient is positive and input to RELU is positive
- Third approach is through optimization. Start with a randomized image and try to maximize the response of a target unit through backprop. It is similar to the second approach. But we can impose regularization in our cost function. For example, we can make the image to be smoother by applying bluring after each backprop update.