LeCun has a new course on deep learning this spring. I found two things he mentioned that worth jotting down. First, natural data lives in low-dimensional manifold. Probably I should have came across that before but it didn’t register earlier. Come to think of it. This is a very important fact.
Second, as it is often mentioned that a shallow model can also represent any function. But a deep model can be much more efficient to represent complex model. I think one analogy he gave is very nice. Consider writing a program with only two steps, you can have each step containing any instruction so that the overall program can perform any task. But the “instruction set” will be very complex. It is an apparent CISC vs RISC trade-off. And for an “extremely” shallow model, it is like building CISC cpu that can compute anything with two steps. One can imagine how complex the chip has to be.