Question:
I was confused on the proof of Theorem 11.1.3 in yersterday’s class. And I am still confused after review the book and your notes many many times.
I think that one of key points of Theorem 11.1.3 is that, under any source distribution $latex Q(x)$, the upper bound and the lower bound is true for all possible types $latex P$. (Theorem 11.1.4 express $latex P$ and $latex Q$ seperately and use the conclusion of Theorem 11.1.3. Obviously, the bounds for the size of a type class $latex T(P)$ should be free from the souce distribution $latex Q(x)$.)
However, during the proof of Theorem 11.1.3, we always assume $latex P$ is the same as the source distribution.
(In our text book, the author always use the notation $latex P^n(T(P)$) and related theorems and corollary which are only applicable for $latex P=Q$ the source distribution)
Answer:
Let us first unify the notation here… let say $latex Q(.)$ is the source distrbution and $latex P(.)$ is the type distribution as we discussed in class.
You got it right. Apparently, $latex |T(P)|$ should not depend on $latex Q(x)$ (just as in 11.17).
Hmmm… I’m not sure I understand your exact question. But it seems to me that you feel uncomfortable why we only consider $latex P=Q$ in the proof. The reason is just because the type $latex Q$ is the most significant one and “occupies most of the probability”. For example, in 11.18, we can also consider $latex 1 \ge Q^n(T(P))$ instead (i.e., the probability of getting a type $latex P$ sequence out of source $latex Q(.)$ is less than 1). But then $latex Q^n(T(P))=\sum_{x\in{T(P)}} Q^n(x) = \sum_{x\in{T(P)}} 2^{-n(D(P||Q) + H(P))}$ (from 11.14)
and hence we have
$latex 1 \ge |T(P)| 2^{-n(H(P) +D(P||Q))}$ and thus $latex |T(P)| \le 2^{n(H(P)+D(P||Q))}$.
This bound is worse than that in the notes and the book as $latex D(P||Q)$ is bigger than zero in general. By considering $latex P=Q$, we get a better bound as in 11.22.
Whereas in 11.35, it is necessary to consider $latex P=Q$, because only then we get the maximum $latex Q^n(T(P))$ and that allows us to move on to 11.35 and 11.36
Using my current notation,
$latex 1=\sum_{P \in \mathcal{P}_n}Q^n(T(P))\le \sum_{P \in \mathcal{P}_n} max_P Q^n(T(P))$
$latex =\sum_{P \in \mathcal{P}_n} Q^n(T(Q)) \le (n+1)^{|\mathcal{X}|} Q^n(T(Q))$
$latex = (n+1)^{|\mathcal{X}|}|T(Q)|2^{-nH(Q)}$.
I hope this answers your question.