Introduction to generalized linear model

As the name suggested, generalized linear model (GLM) is a generalization of the linear regression model. In a classic linear model, with input $latex X$, the output $latex Y$ is modeled by $latex Y|X \sim \mathcal{N} (B X, \sigma^2 I)$. GLM extends the model in two ways, first instead of having the mean as simple…

Dual positive semi-definite cone

The set of all positive semi-definite matrices $latex \mathcal S^+$ forms a cone. Because for any two matrices $latex A$ and $latex B$ in $latex \mathcal S^+$. $latex \theta_1 A + \theta_2 B$ is still PSD and thus in $latex \mathcal S^+$. PSD is self-dual In the context of cone, $latex \mathcal S^+$ is self-dual….

Pareto optimality

I always forget what does pareto optimality mean. I reviewed the slides of convex optimization course by Stephen Boyd recently and found the following very intuitive example. The curve below shows that to maintain a certain production volume, a factory needs to allocate its resources (fuel and labor) in the region . Essentially, we can…

Would it be A Scam?

The latest awareness in the bitcoin evolution Forex market certainly is the “Bitcoin Evolution. ” Novice out for a little while, but it has just taken off current weeks. Some traders own embraced this because it could basically gives them an edge when it comes to trading. The basic notion is if you set the…

Boosting

While bagging mitigates the variance problem, boosting can reduce bias through combination of weak learners. Boosting can be viewed as gradient descent in the functional space. That as the current classifier, by Taylor expansion, we have . So out of an ensemble of weak classifiers, we want to find the best as follows Denote ,…

Lipschitz continuous (function can’t change too quickly)

Always forgot the definition of this one. Wiki gave a very good explanation. It is just continuously differentiable function with additional constraint that the magnitudes of the derivatives cannot be too big (i.e., function can’t change too quickly).

Decision Tree

  Decision tree is a simple algorithm. Simply spliting data out of all possible features. A deeper tree will have high variance and shallower tree will have high bias. Naturally, we want to find a smallest tree to result to fit all data (we called a zero training error tree consistent). However, in general this…

Random Forest

It is basically bagging decision tree. Say we can bootstrap $latex M$ dataset from the original dataset. We then train each bootstrap dataset by a decision tree algorithm, to ensure diversity and reduce complexity, only $latex k \approx \sqrt{d}$ dimensions, where $latex d$ is original dimension. The nice thing of random forest is that it…

Bagging

The goal of bagging is to avoid overfitting (high variance). Instead of training one model, we can bootstrap the dataset for the training of multiple models. Then, the output will simply be the average (for regression) or the majority vote (for classification) of the outputs evaluated by the trained models . There are at least…

Transductive inference

Came across a presentation that mentioned transductive learning (in contrast to inductive learning). I didn’t know this terminology before. Wiki gave a very good introduction. The term was coined by Vapnik in the 90s, where he believes transductive is better. The original term came from psychology. Inductive reasoning came from logic, while transductive reasoning came…