I think I knew how to show it before. But suddenly just realize I can’t. Say let’s define $latex F(w) = \frac{1}{2 \pi} \int f(t) e^{-i wt} dt$ and the inverse transform $latex \hat{f}(t)= \int F(w) e^{i w t}dw = f(t)$. So we expect $latex f(t)=\frac{1}{2 \pi} \int \int f(\tau) e^{-i w\tau} d\tau e^{iwt}dw$ $latex…
Bandlimited channel
By Shannon-Nyquist theorem, a bandlimited signal can be completely reconstructed with $latex 2W$ samples per second. Think of the opposite way, this means that each second we have $latex 2W$ independent components that we can vary for each second. The total noise power per second is $latex 2 W (N_0/2) = N_0 W$ and as…
Confusing things
I guess I seem to be confused with the partition function from evidence. When they appear together, I don’t seem to be confused. But when they appeared in different contexts, I just got confused like falling into some illusion tricks. Given data and parameter , the evidence is simply . And , where the denominator…
Distributed representation
It sounds like a misnomer to me. I probably will just call it a “vector” representation. It doesn’t have the “distributed” meaning of scattering information into different places. For example, to recognize a cat with “distributed” representation, we may distribute features into like “does it has a tail?”, “does it have four legs?”, and “does…
LeCun’s first lecture
LeCun has a new course on deep learning this spring. I found two things he mentioned that worth jotting down. First, natural data lives in low-dimensional manifold. Probably I should have came across that before but it didn’t register earlier. Come to think of it. This is a very important fact. Second, as it is…
Framing and prospect theory
Asian disease problem illustrated that framing can alter one’s decision based on if we are emphasizing gain or loss. Prospect theory is just a fancy name to conjecture what happens when the utility function is indeed what economists believe.
One pager for proposal
A good slide from a workshop.
Free energy
When we model probability of a variable by , is often referred to as the free energy. The name is coming from historical reason. The Gibbs-Boltzmann distribution for a configuration is proportional to . And the closest reason I found is from here and is the Helmholtz free energy. Unfortunately, I doubt the above explanation…
Convex conjugate
I wasn’t aware that convex conjugate is just Legendre transformation. That is, $latex f^{*}(p)=\sup _{\tilde {x}}\{\langle p,{\tilde {x}}\rangle -f({\tilde {x}})\}\geq \langle p,x\rangle -f(x)$ Note that it is also known as the Legendre-Fenchel transformation. Btw, we have the Fenchel inequality $latex \langle p,x\rangle \le f(x)+f^*(p)$ directly from the definition since $latex f^{*}(p)=\sup _{\tilde {x}}\{\langle p,{\tilde…
Self-supervised learning
A very good lecture by Ishan Misra summarized many self-supervised learning methods. Idea of self-supervised learning is very simple. We are trying to design some pretext tasks where the labels can be obtained for free. And instead of training a model with real task (down stream task), we will pretrain the model with these pretext…