Numerical Integration

Last updated on: MON MAY 03 IST 2021

Numerical Integration

Suppose that we are given a function $f(x),$ and we are to compute $$ \int_a^bf(x)\,dx, $$ where $a,b$ are given numbers. One method is to first find an indefinite integral of $f(x):$ $$ F(x) = \int f(x)\,dx, $$ and then to compute $F(b)-F(a).$ This method depends on our ability to compute the indefinite integral $F(x).$ There are many situations where computing $F(x)$ is difficult or even impossible. In such cases, we resort to numerical integration or quadrature. A numerical integration method has the advantage that we do not need to find the indefinite integral, but it has the disadvantage that the answer may be approximate. There are many different methods of numerical integration. In this page, we shall learn two such techniques, both belonging to a family called Newton-Cotes quadrature.

Trapezium rule

First we split $[a,b]$ into a grid of equally spaced $x$-values, and evaluate $f(x) $ at those points:


Take a grid of $x$-values

Then we join the points by line segments:


Continuous piecewise linear approximation

Each piece is a trapezium:


This trapezium has area $\frac 12(y_{k-1}+y_k)\times\delta x$

We add the areas of all the trapeziums to approximate $\int_a^b f(x)\, dx.$

Trapezium rule Choose some positive integer $n.$ Split the interval $[a,b]$ into $n$ equal parts using the break points $$x_0,x_1,...,x_n,$$ where $x_i = a + i\delta x.$ Here $\delta x = \frac{b-a}{n}.$ Note that $x_0 = a $ and $x_n = b.$

Next, we compute $$ y_i = f(x_i)\mbox{ for } i=0,1,...,n. $$ Then the trapezium rule is the approximation $$\int_a^bf(x)\,dx\approx \frac{\delta x}{2}\{y_0+2(y_1+\cdots+y_{n-1})+y_n\}.$$

EXAMPLE: Compute $$ \int_1^2\frac{1}{\sqrt{2\pi}} e^{-x^2/2}\,dx, $$ by trapezium rule using $n=5.$ We split the interval $[1,2]$ into 4 equal parts and compute $y_i$'s.

i     y
-----------
0    0.2420
1    0.1826
2    0.1295
3    0.0863
4    0.0540

So by applying trapezium rule the integral is approximately $$ {0.25\over2}\times[0.2420 + 2\times(0.1826+0.1295+0.0863) +0.0540] = 0.1366. $$ It is instructive to compare this with the actual value, which is 0.1359.

To use a finer grid, let's harness the power of R. Suppose we want 14 subintervals (i.e., 15 poits in all, including the end points):

n = 14+1
dx = (2-1)/14
x = seq(1,2,len=n)
y = dnorm(x)

I just cheated, by using the readymade normal pdf that is already available in R. Of course, I could have used the more verbose:

y = 1/(sqrt(2*pi)) * exp(-x*x/2)

Anyway, I now need to partition the points into three parts:

the extreme points (just 2 of them)
the interior points ($15-2=13$ such)

dx*(y[1]+y[n] + 2*sum(y[2:(n-1)]))/2

EXERCISE: Write an R function to implement this method:

trapint = function(f,a,b,n) {
  ...
}

EXERCISE: We want to approximate $$ \int_0^2 e^{-t} t^{3.4}\, dt. $$ This is the incomplete gamma function evaluated at 2. Keep on applying trapezoidal rule with increasing $n$ until the approximation with two consecutive values of $n$ match up to the first 5 decimal places.

Simpson's rule

In trapezium rule we interpolated by straight lines, i.e., polynomials of degree $\leq 1.$ If we use polynomials of degree $\leq 2$, then we may expect better accuracy. To fit such a polynomial, we need three points. So we split the interval into an even number of subintervals, and fit a parabola (i.e., polynomial of degree $\leq 2$) over two consecutive subintervals, i.e., first parabola over subintervals 1 and 2, the next parabola over subintervals 3 and 4, etc.


$y=\sin x$ shown in black. Parabolas in red

We may work out the exact formulae for the fitted polynomials, then integrate them, and add. But there's a simpler method.

Focus on a single pair of consecutive subintervals, say $[x_0,x_1]$ and $[x_1,x_2].$ We can see that the answer is going to be like $\delta x\times(a y_0 + b y_1 + c y_2).$ (Why?)

If our integrand were indeed a polynomial of degree $\leq 2$, then this should give us the exact answer. In particular, this should give us the exact answer if the integrand were $1$ or $x$ or $x^2.$ This will give us three equations in three unknowns. Solving them you'll get $a = c=\frac 13$ and $b=\frac 43.$

Simpson's rule As in trapezium rule we need to subdivide the interval $[a,b]$ into $n$ equal parts. But for Simpson's rule $n$ needs to be an even number, say $2k.$ Evaluate $f(x)$ at the subdivision points to get $$ y_0,y_1,...,y_{2k}. $$ But now use the following formula $$\begin{eqnarray*} \int_a^bf(x)\,dx & \approx & \frac{\delta x}{3}\left[(y_0+y_{2k})\right.\\ & & + 2(y_2+y_4+\cdots+y_{2(k-1)}) \\ & & + \left. 4(y_1+y_3+\cdots+y_{2k-1} )\right]. \end{eqnarray*}$$

EXAMPLE: Now let us compute the integral from the last example using Simpson's rule. We shall again use $n=4.$ This time the value is $$\begin{eqnarray*} \frac{0.25}{3}\times\left[\right. \times (0.2420+0.0540)\\ & + & \left.2\times(0.1295) + 4\times(0.1826 + 0.0863)\right] = 0.1359, \end{eqnarray*}$$ which is correct up to 4 decimal places. Notice how Simpson's rule gives more accurate value here than the trapezium rule, though we have used the same $n$ in both methods.

The R version of Simpson's rule is quite simple. We start out just as for Simpson's rule:

dx = (2-1)/14
x = seq(1,2,len=15)
y = dnorm(x)

Now we need to partition the points into three groups:

the two extreme points: $x_0$ and $x_n$
the mid points: $x_1,...,x_{n-1}$
the boundary points (except the extreme two): $x_2,...,x_{n-2}.$

Since R starts its indices from 1, we have to be careful:

ext = c(1, 15) # A typo has been corrected here. Earlier I wrongly wrote 1:15 instead of 1,15.
mid = seq(2,14,2)
bdry = seq(3,13,2)

Make sure you understand these. The third argument in seq is the step size.

(dx/3) * (sum(y[ext]) + 4*sum(y[mid]) + 2*sum(y[bdry]))

Monte Carlo integration

This is a rather different approach that is conceptually very simple. Here we consider a definite integration problem as a problem of finding the area of a region. The approach is best explained by an example.

EXAMPLE: Suppose that we want to find the value of $\pi.$ We consider this as the problem of finding the area of a unit circle. First, we bound the circle in a simpler region, say a square, as shown below.


Circle in square

Now pretend that this square is a target board for dart throwing, and a child is throwing darts randomly at the board in a way that each point is as likely to be hit as any other point. Then the chance of a random dart landing in any given region is the area of that region divided by the area of the square (which is $(1+1)^2 = 4$). A typical throw of 16 darts may produce a result like this:


16 random throws

Here the event "hitting the circle" has occured for 10 out of 16 cases, so the sample proportion is $p_{16}=\frac{10}{16} = \frac 58.$ By laws of large numbers, we expect $$ \frac \pi4\approx \frac 58, $$ or $\pi\approx 2.5.$ This is of course a rather poor approximation, but the accuracy improves with increasing number of throws. As "randomly dart-throwing children" may not be easily available, we shall employ a computer for that purpose. We set up a coordinate system as follows.


Coordinate system

A random dart hit is now $(X,Y),$ where $X,Y$ are independent $Unif(0,1)$ random variables. Checking if the dart has hit the disc is simply checking whether $X^2+Y^2 < 1.$

Let's explore using R:

x = runif(1000, min=-1, max=1)
y = runif(1000, min=-1, max=1)
hit = (x*x + y*y < 1)
plot(x,y,col=hit+1,pch=20)
mean(hit)

A few points about the code:

hit is a 0-1 variable. If you print it, you'll get a sequence of TRUE's and FALSE's.
When we compute mean(hit) we are finding mean of the 0's and 1's, i.e., the proportion of 1's.
The col parameter is set to hit+1, i.e., a sequence of 1's and 2's. In R, colour 1 means black, and colour 2 means red. Colour 0 means "don't plot".

This produces a plot like this:


Hit points are shown in red

We shall now use the idea to approximate an integral of the form $\int_a^b f(x)\, dx.$ Since we shall be approximating integrls using probabilities, we have to make sure that $f$ does not change sign in the interval $[a,b].$ Let's assume that $f(x)\geq 0$ over the entire interval. Let $M>0$ be some known upper bound for $f.$ Then the graph may be put inside a rectangle as follows:


Graph in rectangle

Again we throw darts randomly at the rectangle and find the proportion of darts hitting the shaded region. Mathematically, we generate $X,Y$ independently with $X\sim Unif(a,b)$ and $Y\sim Unif(0,M).$ Then we check the proportion of cases for which $Y < f(X).$

The following R code snippet implements this idea for $f(x) = x^2$ over $[0,1]$ with upper bound $B=1.5.$

x = runif(1000)
y = 1.5 * runif(1000)
hit = y < x*x
plot(x,y,col=hit+1,pch=20)
mean(hit)

Try this for yourself. You may not really like the precision. Here is the plot I got:

EXERCISE: Can you work out the variance of the estimator? Is the estimator unbiased?

This technique may be used to approximate the area of any region that may be bounded in a box and for which we may test containment. It easily extends to higher dimensions (volume, hypervolume etc).

Comments

To post an anonymous comment, click on the "Name" field. This will bring up an option saying "I'd rather post as a guest."

Table of contents

Numerical Integration

Trapezium rule

Simpson's rule

Monte Carlo integration

Comments