Maybe it is my communications background, it took me long time to really “understand” why we need MCMC. Consider a really simple inference problem of trying to recover state from data , in Bayesian inference, we try to compute the posterior distribution
In all textbooks, it will just explain that the denominator is hard to compute and so we need MCMC. But wait a minute, the denominator is just . It is independent of the thing that we need to estimate, namely, . Say if we are satisfied doing MAP, then solving
is the same as
So why we bother about the denominator?
Update: Yes, indeed. I think we don’t need to bother with that.
In most communications (decoding) problem, the state is usually discrete or quite low dimension. We can consider all (or most) combinations of realistically. After all, solving means that we need to consider all possible .
For very high dimensional solving MAP above can be as complex or even more complex than computing directly. For the latter problem, something not obvious is rarely explicitly stated in introductory textbook. The problem of computing is not just computing the probability alone, but which values of are worth computing. And that’s why we need MCMC. Basically, we perturb in the space so that we cover most of the high probability (i.e., random walk in space according to ). So a key issue in MCMC is to design a transition procedure/Markov chain (state transition model) so that the stationary probability converge to .