[Home]

Table of contents


$ \newcommand{\v}{\vec} \newcommand{\hv}[1]{\hat{\vec #1}} $

Equivalence of the different formulations of ridge regression

We have presented 4 different formulations of ridge regression:
  1. ad hoc: $\hv \beta (\lambda) = (X'X+\lambda I) ^{-1} X'\v y$ for some $\lambda \geq 0.$
  2. Bayesian: Posterior mean under prior $\v b\sim N_p(\v 0, \tau^2 I)$ and model $\v y | \v \beta \sim N_n(X\v \beta , \sigma^2 I)$ for some $\tau\geq 0.$
  3. Soft bound: Minimizer for $$ \| \v y - X\v \beta \|^2 + \kappa \|\v \beta \|^2 $$ for some $\kappa \geq 0.$
  4. Hard bound: Minimizer for $$ \| \v y - X\v \beta \|^2 \text{ subject to } \|\v \beta \|^2\leq \delta $$ for some $\delta \geq 0.$
We shall now point out why these are all equivalent.

Ad hoc and Bayesian

The posterior density of $\v \beta $ is proportional to the product of the prior density and model density. Direct computation shows that the posterior is again a normal density with mean of the same form as the ad hoc form of ridge regression. This establishes equivalence between the ad hoc and Bayesian formulations.

Ad hoc and soft bound

The target function in the soft bound formulation is $$ \| \v y - X\v \beta \|^2 + \kappa \|\v \beta \|^2 = \v \beta ' (X'X + \kappa I) \v \beta - 2(X'y)'\v \beta + \text{ constant}. $$ Differentiate w.r.t. $\v \beta $ and equate to $\v0$ to arrive the ad hoc formulation.

Soft bound and hard bound

The hard bound formulation is the same as least squares method except for an additional constraint. If the least squares estimate already satisfies the constraint, then the hard bound formulation will return the least squares estimator itself, which is a special case of the ad hoc formulation (with $\lambda = 0$).

If the least squares estimator lies outside the hard bound constraint disc, then the constrained minimizer must lie on the circumference of the disc (since the target function if convex). So we might change the constraint from $\|\v \beta \|^2\leq \delta$ to $\|\v \beta \|^2= \delta.$ Now we can use Lagrange multiplier technique, which works with the modified target funciion: $$ \| \v y - X\v \beta \|^2 + \lambda (\|\v \beta \|^2 - \delta). $$ This differs from the soft bound target function by just a constant. This establishes the equivalence between the soft bound and the hard bound formulations.

Comments

To post an anonymous comment, click on the "Name" field. This will bring up an option saying "I'd rather post as a guest."