It often happens that we have to solve a nonlinear equation,
$$
f(x)=0
$$
that we cannot easily solve analytically. A simple such example is $x = \cos x.$ Here $f(x) = x-\cos x.$
Any solution to $f(x)=0$ is
called a root or zero of $f(x).$ In this page we
shall learn two methods to approximately compute zeros of nonlinear functions.
All the methods will be iterative in nature, i.e., we shall generate
a sequence $x_0,x_1,x_2,...$ that (hopefully) converges to the
required root.
This is a very popular method that usually converges rapidly. It solves
the equation $f(x) = 0,$
assuming that we can compute $f'(x).$ The iterations start with an
initial guess $x_0$ and proceeds as
$$
x_{k+1} = x_k - \frac{f(x_k)}{f'(x_k)}.
$$
There are two equivalent ways to think about the Newton-Raphson
iterations. The first is using the following diagram:
Approximating $f(x)$ by the tangent at $x_k$
Here we draw the tangent through $(x_k,f(x_k))$ and use this as a
local approximation of $f(x).$ The point where the tangent hits the
$x$-axis is taken $x_{k+1}.$
The second way to look at the same thing is to approximate $f(x)$ using
Taylor approximation:
$$f(x) \approx f(x_k)+f'(x_k)(x-x_k).$$
Now if we want $f(x)=0$ then we need $f(x_k)+f'(x_k)(x-x_k) = 0,$
or
$$
x = x_k -\frac{f(x_k)}{f'(x_k)}.
$$
This motivates defining
$$
x_{k+1} = x_k+h = x_k -\frac{f(x_k)}{f'(x_k)}.
$$
EXAMPLE:
Let us solve $\cos(x)=x$ using Newton-Raphson method starting with
$x_0=1.$ Here $f(x) = \cos(x)-x,$ and hence $f'(x) =
-\sin(x)-1.$ So the Newton-Raphson iteration is
$$
x_{k+1} = x_k + \frac{\cos(x_k)-x_k}{\sin(x_k)+1}.
$$
A few iterations are as follows.
k x
-------------
1 0.7503639
2 0.7391129
3 0.7390851
4 0.7390851
We already see convergence.
The following J code lets you explore this.
It is possible to use Newton-Raphson method to solve a system of
nonlinear equations:
$$\begin{eqnarray*}
f_1(x_1,...,x_n)& = & 0\\
& \vdots & \\
f_n(x_1,...,x_n) & =& 0
\end{eqnarray*}$$
Note that the number of unknowns equals the number of equations. We can
write this using vector notation as
$$
\ff(\xx) = \z,
$$
where $\xx = (x_1,...,x_n)'$ and $\ff(\xx) =(f_1(\xx),...,f_n(\xx))'.$
The Newton-Raphson iteration for this system is
$$
\xx_{n+1} = \xx_n - (D\ff(\xx_n))^{-1}\ff(\xx_n),
$$
where $D\ff(\xx)$ is the Jacobian matrix with $(i,j)$-th entry
given by
$$
\frac{\partial f_i(\xx)}{\partial x_j}.
$$
EXAMPLE:
Let us solve the system
$$\begin{eqnarray*}
xy+x^2-y^3-1 = 0\\
x+2y-xy^2 -2 =0
\end{eqnarray*}$$
Here $f_1(x,y) = xy+x^2-y^3-1$ and $f_2(x,y) = x+2y-xy^2 -2.$ So
the Jacobian matrix is
$$
D\ff(\xx) = \left[\begin{array}{ccccccccccc}y+2x & x-3y^2 \\
1-y^2 & 2-2xy
\end{array}\right]
$$
This has inverse given by
$$
(D\ff(\xx))^{-1} = \frac{1}{(y+2x)(2-2xy)-(x-3y^2)(1-y^2)}
\left[\begin{array}{ccccccccccc}2-2xy & 3y^2-x\\
y^2-1 & y+2x
\end{array}\right]
$$
The following table shows a few sample iterations.
f=:3 :'...' creates a
fnction $f(y).$ The explicit formula for $f$ is
written in place of the ....
'r s'=:y will split $y\equiv(y_0,y_1)$ into
two components, and make r equal to $y_0$
and s equal to $y_1.$
To write single quotes inside single quotes use two single
quotes ''.
[ is the function $(x,y)\mapsto x.$ It
evaluates $y$ and then $x$ and then
returns $x.$
b %. A solves the system $A\vec x = \vec
b$ in the least squares sense assuming $A$ to be full
column rank.
For $n=2$ it was easy to invert the matrix analytically. For higher
dimensions we should not explicitly invert the Jacobian matrix. Instead,
we should solve the system
$$
(D\ff(\xx_n)) \yy= \ff(\xx_n)
$$
for $\yy$ at each step. We shall learn such solution techniques in the next page.
While the Newton-Raphson method is very efficient, it requires us to compute the derivative of $f.$ Thus, it is unsuitable
in cases where $f$ is not differentiable or where the derivative is too difficult to be computed. Now we shall learn
a method that may be used to solve the (nonlinear) equation
$$
f(x)=0,
$$
where $f(x)$ is continuous, but not necessarily differentiable.
The method is conceptually much simpler than the Newton-Raphson method. To understand it
suppose that the graph of $f(x)$ is as shown below.
$f(x)$ has a zero at $a$
Notice that here the graph of $f(x)$ cuts the $x$-axis. This is a requirement for the bisection method.
Suppose also that we know two numbers $l_0$ and $r_0$ such that
$f(x)$ has different signs at $x=l_0$ and $x=r_0.$
Then, by intermediate value theorem for continuous functions, we know that
$f(x)$ must have at least one zero in the interval $(l_0,r_0).$
To find such a zero, we proceed by stepwise guessing. Our first guess is
the midpoint of $(l_0,r_0),$ i.e.,
$$m_0 = \frac{l_0+r_0}{2}.$$
If $f(m_0)=0$
then we are done! Otherwise, we look at the two halves $[l_0,m_0]$ and $[m_0,r_0].$ Exactly one of them must
have opposite signs of $f$ at the two ends. Call this half $[l_1,r_1].$
For instance, if $sign(f(m_0))\neq sign(f(l_0))$ then we take
$l_1=l_0$ and $r_1=m_0.$
By the intermediate value theorem, we know that $f$ must have a zero in $(l_1,r_1).$
Now repeat the process, i.e., take $m_1$ as the midpoint of $(l_1,r_1),$ and take the
the appropriate half as $(l_2,r_2).$. Proceeding in this
way we get $(l_k,r_k)$ for
$k=1,2,3,...$ until the length of the interval is small enough.
EXAMPLE:
Let us apply the bisection method to solve the equation $\cos(x)=x.$
This is same as finding the zero of
$$
f(x) = \cos(x)-x.
$$
It is easy to see that $f(0) = 1$ and $f(\frac{\pi}{2}) =
-\frac{\pi}{2}.$ Since these have opposite signs we can start the
bisection method with
$$
l_0=0~~~\mbox{ and }~~~ r_0 = \frac{\pi}{2} = 1.5708.
$$
Our first guess is
$$
m_0 = \frac{l_0+r_0}{2} = 0.7854.
$$
Proceeding like this we get the following table.
We proceed until the interval is short enough, i.e.,
$(r_k-l_k)< \epsilon$ for some specified $\epsilon.$ In the
above table we have stopped once the $l_k$ and $r_k$ are equal
up to the first two decimal places. Thus, we see that the answer is 0.74
up to the first two decimal places.
The following J code explores this.
0&> is "less than 0". If $f(x,y)$ is some
function, then 5&f is the function $y\mapsto
f(5,y)$ and f&5 is $y\mapsto f(y,5).$
Here > is the greater than function, which returns 0
or 1.
-. is negation. It turns 0 into 1 and vice versa.
(c * n) + (-.@c * p) is a mathematical way to specify
if-else.
EXERCISE:
Use the bisection method to solve $2e^x-2x-3 = 0$ for $x\in(0,2).$
A certain trait in rabbits is controlled by a pair of alleles, $a$ and $A.$ Each rabbit receives one of these
from the father and another from the mother. Thus, the possible pairs are $aa,$ $aA$ and $AA$ (order is
immaterial). The probability that a parent gives an $a$ to the offspring is $p$ (unknown). So the probability
of an $A$ is $q = 1-p.$ The father's contribution is independent of the mother's, and so the probabilities of
$aa,$ $aA$ and $AA$ in the offspring are, respectively, $p^2,$ $2pq$ and $q^2.$ Our
aim is to estimate $p.$ Unfortunately, it is impossible to detect the pair an offspring has. It is only possible to
detect if an offspring has at least one $A$, i.e., whether $aa$ or $\{aA, AA\}.$ The probabilities are,
respectively, $p^2$ and $q^2+2pq.$ In a random sample of 100 offsprings, only 23 are without any $A.$
The probability of this is
$$L(p)=p^{46}\big(q^2 +2pq\big)^{77}.$$
The value of $p\in(0,1)$ for which this is the maximum is called the maximum likelihood estimator of $p.$ Find
it.
The file data.txt has $n=996$ random numbers that are generated from the density
$$f(x; p, a) = \frac{ a^p}{\Gamma(p)} x^{p-1}e^{-a x},\quad x>0$$
for unknown constants $p, a > 0.$ The principle of maximum likelihood estimation suggests estimating
$p,a $ by maximising
$$L(p,a) = \prod_{i=1}^n f(x_i; p,a),$$
where $x_1,...,x_n$ are the data in the file. Perform this estimation, and check your answer
graphically by overlaying the graph of $f(x;p,a)$ on the histogram of the data.
[Hint: You might find the R functions gamma, dgamma and
digamma useful here.]
Comments
To post an anonymous comment, click on the "Name" field. This
will bring up an option saying "I'd rather post as a guest."