Came across of this video explaining robust PCA. It came with a book and it looks okay.
As PCA can be considered as decomposition of data matrix with the highest singular value. So what PCA is doing is simply low-rank matrix approximation of the data matrix.
So robust PCA is a simple idea that tries to decompose the data matrix $latex X$ into $latex L + S$, where $latex L$ is a low-rank matrix and $latex S$ is a sparse matrix.
The most natural formulation will be
$latex min_{L,S} rank(L) + \lambda \|S\|_0$. However, this is not computationally feasible, and so is relaxed to
$latex min_{L,S} \|L\|_* + \lambda \|S\|_1$, where the nuclear norm $latex \|L\|_*$ is just the sum of singular value of $latex L$.