[Home]

Table of contents


Least squares method

Set up: We have a bivariate data $(x_1,y_1),...,(x_n,y_n).$ Suppose that the scatterplot shows a linear pattern. We want to fit a straight line of the form $y = \alpha + \beta x$ to the data. We want our line to pass "as close as possible to all the points as possible". This is a rather vague specification. There are a number of ways to make it precise. The most popular among them is the least squares approach. Suppose that we want to predict the value of $y$ for $x = x_i$ using the equation $y = \alpha + \beta x.$ The predicted value would be $\hat y_i = \alpha + \beta x_i.$ We measure the (unsigned) distance between $\hat y_i$ and $y_i$ as $$ (y_i-\hat y_i)^2 = (y_i - \alpha - \beta x_i)^2. $$ Then the total error is $$ \sum_{i=1}^n (y_i - \alpha - \beta x_i)^2 = S(\alpha,\beta),\text{ say.} $$ We want to choose $\alpha,\beta $ so that $S(\alpha,\beta)$ is minimised. This is called the least squares approach. We shall now outline two ways to minimise $S(\alpha,\beta).$

Calculus technique

First we differentiate $S(\alpha,\beta)$ partially wrt $\alpha $ and $\beta $ and equate the partial derivatives to zero. This gives two equations $$ \frac{\partial S}{\partial \alpha} = -2\sum(y_i-\alpha - \beta x_i) = 0, $$ and $$ \frac{\partial S}{\partial \beta } = -2\sum x_i(y_i-\alpha - \beta x_i) = 0. $$ Remember that our unknowns are $\alpha$ and $\beta,$ while the $x_i$'s and $y_i$'s are all known. So these are two linear equations in two unknowns. In matrix form these are $$ \left[\begin{array}{ccccccccccc} n & \sum x_i \\ \sum x_i & \sum x_i^2 \end{array}\right]\left[\begin{array}{ccccccccccc}\alpha\\\beta \end{array}\right] = \left[\begin{array}{ccccccccccc}\sum y_i\\ \sum x_i y_i \end{array}\right]. $$ Here the coefficient matrix is nonsingular if and only if $\frac 1n\sum x_i^2-(\overline x)^2\neq 0.$ This condition is natural, because, otherwise, all the points on the same vertical line, and slope of a vertical line is undefined.

Solving we get $$ \hat \beta = \frac{n\sum x_i y_i- \overline x\overline y}{n\sum x_i^2-(\overline x)^2 }, $$ and then $\hat \alpha $ may be obtained from $$ \overline y = \hat \alpha + \hat \beta \overline x. $$ Now, equating the first derivatives to zero, only ensures a stationary point. We still do not know if it is a maximum or minimum or something else, and even if it is a minimum, whether it is a global minimum, or just a local one. Second derivative tests (beyond our mathematical toolbox at present) will help resolve the first question, but not the second. We shall not discuss this any further here, because we still do not have the necessary math tools at our disposal.

Linear algebra approach

See this video (3:36 onwards).

For a proof of consistency of the normal equations see this video.

Using R

Let's say the $x$-values are stored in x, and the $y$-values in y. Then the following R code will fit a line:
lm(y ~ x)
You can overlay the least squares line on top of the scatterplot like this:
plot(x,y)
fit = lm(y ~ x)
abline(fit)