[Home]

Joint distribution


$\newcommand{\v}[1]{{\mathbf #1}}$ Joint distribution

Quick primer on multivariate calculus (part 1)

Video for this section

We are going to use certain results from multivariate calculus that you will learn rigourously in the Analysis 3 course. For now, we shall only learn some definitions and results from multivariate calculus.

Graph

When we work with $f:{\mathbb R}\rightarrow{\mathbb R}$ we often think about its graph which we visualise as a curve. When we deal with $f:{\mathbb R}^2\rightarrow{\mathbb R}$ we visualise its graph as a surface.

Continuity

For $f:{\mathbb R}\rightarrow{\mathbb R}$ continuity means its graph has no break. Similarly, $f:{\mathbb R}^2\rightarrow{\mathbb R}$ is called continuous, when its graph is an unbroken surface, no hole, cut or gap. More rigourously, you can think of continuity in terms limits:

$f:{\mathbb R}^2\rightarrow{\mathbb R}$ is continuous at $\v a$ means, whenever $\v x\rightarrow \v a$ we have $f(\v x)\rightarrow f(\v a).$

Differentiability

We say that $f:{\mathbb R}\rightarrow{\mathbb R}$ is differentiable at some point $a$ if the graph is smooth above $x=a$ (i.e., may be well-approximated by a straight line passing through $(a,f(a))$ , and the line is not vertical. This line is called the tangent to the curve at that point. Any such line has equation of the form $y= f(a)+m\cdot(x-a).$ This $m$ is called the derivative of the $f$ at $a.$

Similarly, $f:{\mathbb R}^2\rightarrow{\mathbb R}$ is called differentiable at some point $(a,b)$ if the surface is smooth over that point (i.e., is well-approximated by a plane passing through $(a,b,f(a,b))$, which is not vertical. Any such plane has equation of the form $y= f(a,b)+m_1\cdot(x-a)+m_2\cdot (y-b).$ The pair $(m_1,m_2)$ (which is commonly considered as a $1\times 2$ matrix) is called the derivative of $f$ at $(a,b).$

It turns out that if $f$ is differentiable at $(a,b),$ then $m_1 = \frac{\partial f}{\partial x}$ and $m_2 = \frac{\partial f}{\partial y}$ at $(a,b).$

$\frac{\partial f}{\partial x}$ is obtained by differentiating $f(x,y)$ wrt $x$ along keeping $y$ fixed. Similarly for $\frac{\partial f}{\partial y}.$

EXAMPLE 1:  If $f(x,y) = xy^2+y + e^x,$ then $\frac{\partial f}{\partial x} = y^2+e^x.$ ■

Just the existence of $\frac{\partial f}{\partial x}$ and $\frac{\partial f}{\partial y}$ is not enough to guarantee the differentiability of $f.$ However, if the partial derivatives are also continuous over a neighbourhood of $(a,b),$ then $f$ must be differentiable at $(a,b).$

Mixed partials

We can also talk about the mixed partial derivatives $\frac{\partial^2 f}{\partial y\partial x}$ and $\frac{\partial^2 f}{\partial x\partial y}.$

Here $\frac{\partial^2 f}{\partial y\partial x}$ means $\frac{\partial}{\partial y}\left(\frac{\partial f}{\partial x} \right),$ and $\frac{\partial^2 f}{\partial x\partial y}$ means $\frac{\partial}{\partial x}\left(\frac{\partial f}{\partial y} \right).$

EXAMPLE 2:  If $f(x,y) = xy^2+y + e^x,$ then $\frac{\partial^2 f}{\partial y\partial x} = \frac{\partial}{\partial y}\left(\frac{\partial f}{\partial x} \right) = \frac{\partial}{\partial y}(y^2+e^x) = 2y.$

Also $\frac{\partial^2 f}{\partial x\partial y} = \frac{\partial}{\partial x}\left(\frac{\partial f}{\partial y} \right) = \frac{\partial}{\partial x}(2xy+1) = 2y.$ ■

Notice that they turn out to be equal in this example. This is mostly the case. There are pathological examples, where they are unequal. However, for all the cases we shall need they will be equal.

Problem set 1

EXERCISE 1:  For each of the following functions find $\frac{\partial f}{\partial x}$, $\frac{\partial f}{\partial y}$ $\frac{\partial^2 f}{\partial y\partial x}$ and $\frac{\partial^2 f}{\partial x\partial y}.$

  1. $f(x,y) = e^{-x^2-y^2+2x}.$
  2. $f(x,y) = \frac xy$
  3. $f(x,y) = \sin x+\cos y.$
  4. $f(x,y) = xy.$

Quick primer on multivariate calculus (part 2)

Video for this section

Iterated integrals

Just as we can differentiate $f(x,y)$ wrt a single variable at a time, we can integrate it wrt a single variable at a time, as well. This is called an iterated integral. The integrand is a function of two variables, $x,y.$ Each integral is done wrt one variable. When you do the inner integral, you treat the variable for the outer integration as a constant.

EXAMPLE 3:  $\int_0^1\int_0^{y^2} xy\,dxdy = \int_0^1\left. \frac{x^2y}{2} \right|_0^{y^2} dy = \frac 12\int_0^1 y^5\,dy = \frac{1}{12}.$ ■

Just as a single variable integrals may be thought of as an area, an iterated integral in two variables may be considered as a volume. The iterated integral above gives the volume under the surface over the region shown below:

Here we have integrated first wrt $x$ (the inner integral) and then wrt $y$ (the outer integral). We could have done it the otherway around: then the iterated integral would have been $$\int_0^1 \int_{\sqrt{x}}^1 xy\, dydx.$$ Check that this also gives the same answer.

In this example, both the iterated integrals give the same answer. This is the case for a very general class of integrans (including all nonnegative integrands). However, there are pathological examples where they may not be equal. In our course we shall always assume them to be equal.

Problem set 2

EXERCISE 2: Find $\int_0^1 \int_{\sqrt{x}}^1 xy\, dydx.$

EXERCISE 3: What is the volume under the graph of $f(x,y) = x^2+y$ over the region $[0,1]\times[1,3]?$ Try both orders of integration ($x$ followed by $y$, and also $y $ followed by $x$).

Joint density

Video for this section

Just as we had encountered joint distribution while learning about discrete random variables, we have the concept of joint probability density, as well.

Definition: Let $X,Y$ be jointly distributed random variables. We say that they have joint probability density $f:{\mathbb R}^2\rightarrow[0,\infty)$ if $$\forall a \leq b, c \leq d~~P\big( (X,Y)\in[a,b]\times[c,d] \big) = \int_a^b \int_c^d f(x,y)\, dydx.$$
If you are new to this "integral inside integral" notation, it is called an iterated integral.

EXAMPLE 4:  $\int_0^1\int_0^1 xy^2\,dxdy = \int_0^1\left[\int_0^1 xy^2\,dx\right]dy = \int_0^1\left[y^2\int_0^1 x\,dx\right]dy =\int_0^1\frac 12y^2\,dy =\frac 16.$ ■

To visualise a joint density function, think of its graph as a surface hanging like a roof over the $xy$-plane. Then, for any rectangle in the $xy$-plane, the probability of $(X,Y)$ being inside that rectangle is the volume of the "tent" with the rectangle as its floor, and the surface as its roof. Indeed, thanks to the probability axioms, we can use this "volume of tent" idea for floors of shapes other than rectangles as well (e.g., countable unions/intersections of rectangles, and their complements).

The following theorem is not unexpected.

Theorem If $f:{\mathbb R}^2\rightarrow[0,\infty)$ is joint density of some $(X,Y)$, then $\int_{-\infty}^\infty\int_{-\infty}^\infty f(x,y)\, dxdy = 1.$

Proof:This is because the double integral denotes $P(X\in{\mathbb R},\,Y\in{\mathbb R})=1.$.[QED]

EXAMPLE 5: If $f(x,y) = \left\{\begin{array}{ll}c&\text{if }x^2+y^2\leq 1\\ 0&\text{otherwise.}\end{array}\right.$ is a density, then find $c.$

SOLUTION: The total area under the density is the volume of the cylinder with unit radius and height $c.$ This volume is $\pi c.$ So we need $\pi c = 1,$ i.e., $c = \frac 1\pi.$ ■

EXAMPLE 6: Find $c\in{\mathbb R}$ such that $f(x,y) =\left\{\begin{array}{ll}c(x+y)&\text{if }0\leq x,y,\leq 2\\ 0&\text{otherwise.}\end{array}\right. $ is a density.

SOLUTION: We need $\int_{-\infty}^\infty\int_{-\infty}^\infty f(x,y)\, dx dy = 1,$ i.e., $$\int_0^2\int_0^2 c(x+y)\, dx dy = 1.$$ Now $$\int_0^2\int_0^2 c(x+y)\, dx dy = c\int_0^2\left[\int_0^2 c(x+y)\, dx\right] dy = c\int_0^2\left[ \frac 12x^2+xy\right]_0^2 dy=c\int_0^2( 2+2y)\, dy=8c.$$ So we need $8c=1$ or $c = \frac 18.$ ■

Problem set 3

::

EXERCISE 4: Find $c\in{\mathbb R}$ such that $f(x,y) =\left\{\begin{array}{ll}cxy&\text{if }0\leq x,y,\leq 2\\ 0&\text{otherwise.}\end{array}\right. $ is a density.

::

EXERCISE 5: If $ax+by$ is a density on the unit square, what are the possible values for $a,b?$

::

EXERCISE 6: Find $c\in{\mathbb R}$ such that $f(x,y) =\left\{\begin{array}{ll}ce^{-x-y}&\text{if }0\leq x,y,< \infty\\ 0&\text{otherwise.}\end{array}\right.$ is a density.

::

EXERCISE 7: Find $c\in{\mathbb R}$ such that $f(x,y) =\left\{\begin{array}{ll}cye^{-x}&\text{if }0\leq x,y,< \infty\\ 0&\text{otherwise.}\end{array}\right. $ is a density

[Hint]

Impossible since the integral is divergent.

::

EXERCISE 8: Find $c\in{\mathbb R}$ such that $f(x,y) =\left\{\begin{array}{ll}cxy&\text{if }(x,y)\in[-1,1]\times[0,1]\\ 0&\text{otherwise.}\end{array}\right. $ is a density.

[Hint]

Impossible, $f(x,y)$ takes negative values.

::

EXERCISE 9: [hpsjoint1.png]

::

EXERCISE 10: [hpsjoint2.png]

Computing probability using iterated integrals

Video for this section

So far we have been finding volumnes under the joint density graph using geometry. This works only for very simple shapes. For more complicated cases we need to use iterated integrals.

EXAMPLE 7:  Let $(X,Y)$ have density $f(x,y) = \left\{\begin{array}{ll}x+y&\text{if }0\leq x,y\leq 1\\ 0&\text{otherwise.}\end{array}\right..$ Find $P(Y\leq X^2).$

SOLUTION: The random point $(X,Y)$ always lies in the unit square. Our set of interest is shown in red below.
We need to integrate the density over this set. In other words, we are trying to find the volume of the tent with the density as its roof and the red region as its floor. This may be computed as follows: $$\int_0^1 \left[\int_0^{x^2} (x+y)\, dy\right] dx = \int_0^1 \left[ xy+\frac 12y^2 \right]_0^{x^2} \, dx=\int_0^1 x^3+\frac 12x^4\, dx = \frac 14 + \frac{1}{10} = \frac{7}{20}.$$ We could have done it the other way around, too: $$\int_0^1 \left[\int_{\sqrt y}^1 (x+y)\, dx\right] dy = \cdots.$$ This should also lead to the same answer (check!). ■

Problem set 4

EXERCISE 11: Let $(X,Y)$ have joint density $f(x)=\left\{\begin{array}{ll}cxy&\text{if }x,y\in[0,1],\,x\leq y\\ 0&\text{otherwise.}\end{array}\right..$ Find $P(Y< \sqrt{X}).$

EXERCISE 12: Let $(X,Y)$ have joint density $f(x)=\left\{\begin{array}{ll}c(x+y)&\text{if }x,y\in[0,1]\\ 0&\text{otherwise.}\end{array}\right..$ Find $P\left(Y< \frac 12\right).$

EXERCISE 13: Let $X,Y$ be IID $Unif(0,1).$ Find $P(X^2\leq Y \leq X).$

EXERCISE 14: If $(X,Y)$ has joint density $e^{-(x+y)}$ for $x,y>0,$ (and 0 else), then find $P(X^2+Y^2<1).$ Leave the answer in terms of a single-variable integral.

Joint CDF

Video for this section

We have already learned the definition of joint CDF in the last semester:

Definition: CDF If $X,Y$ are jointly distributed random variables, then their joint cumulative distribution function is defined as $F:{\mathbb R}^2\rightarrow[0,1]$, where $$F(x,y) = P(X\leq x,\, Y\leq y).$$
This definition does not care if $X,Y$ are discrete, continuous or has density or not.

Note that for any given $(x,y),$ the value of the CDF, $F(x,y)$ is the probability that the random point $(X,Y)$ lies in the infinite rectangle lying south-west of $(x,y):$
Since the CDF is defined in terms of probability, we can compute it by geometry in simple cases, and iterated integrals in more compicated cases.

EXAMPLE 8: Let $(X,Y)$ have uniform distribution over the unit square. Find its CDF, $F(x,y).$

SOLUTION: The values of $F(x,y)$ over certain regions of ${\mathbb R}^2$ should be clear, as shown below.
The unit square is shown in red
The red square is the floor of the tent. Since its area is 1, and the roof of the tent is flat, horizontal, the height must be $\frac 11=1$ to keep the total volume $1.$ So to find $F(x,y)$ for $(x,y)$ in the red region we just divide the area of the shaded rectangle by the red square.
This gives $F(x,y) = xy.$

Similarly, if $(x,y)$ is in the blue region, we need to consider only the red part of the shaded rectangle.
This gives $y.$

Similar consideration shows $F(x,y) = x$ over the green part.

So the CDF is $$F(x,y) = \left\{\begin{array}{ll}xy&\text{if }0< x,y\leq 1\\ x&\text{if }0<x\leq 1, y>1\\ y&\text{if }0<y\leq 1, x>1\\ 0&\text{if }x\leq 0\mbox{ or } y\leq 0\\ 1&\text{if }x, y>1\\ \end{array}\right.. $$
The graph of the CDF

In this example we could avoid integration because the distribution was uniform. The next example is more general.

EXAMPLE 9: Let $(X,Y)$ have density $f(x,y)=x+y$ over the unit square. Find its CDF, $F(x,y).$

SOLUTION: The red-blue-green break up remains the same here as in the last example, as the support of the distribution is the unit square. The values (0 and 1) of the CDF over the white regions are also as before.

For $(x,y)$ in the red region, $$F(x,y) = \int_0^x\int_0^y (u+v)\, dudv = \int_0^x\frac 12y^2+yv\,dv = \frac 12xy^2+xy.$$ Similarly, work out the values for the blue and green regions. ■

Problem set 5

EXERCISE 15: Compute the remaining parts of the CDF in the example above.

EXERCISE 16: Find the CDF of $(X,Y)$ is the joint density is $f(x,y) = \left\{\begin{array}{ll}e^{-x-y}&\text{if }x,y>0\\ 0&\text{otherwise.}\end{array}\right.$

Joint density from CDF

Video for this section

Finding the CDF from the density requires quite a bit of effort. But going the other way around is a lot easier.

Suppose that you are given a CDF, $F(x,y)$ for a distribution having a density. Then let $$f(x,y) = \frac{\partial^2}{\partial x\partial y} F(x,y)=\frac{\partial^2}{\partial y\partial x} F(x,y).$$ For $(x,y)$ where the partial derivatives fail to exist, set $f(x,y) = 0$ (or any arbitrary non-negative value). This $f(x,y)$ will be a density for CDF $F(x,y).$

EXAMPLE 10:  Let our CDF be $$F(x,y) = \left\{\begin{array}{ll}xy&\text{if }0< x,y\leq 1\\ x&\text{if }0<x\leq 1, y>1\\ y&\text{if }0<y\leq 1, x>1\\ 0&\text{if }x\leq 0\mbox{ or } y\leq 0\\ 1&\text{if }x, y>1\\ \end{array}\right.. $$ You are told that there is a density corresponding to it. Find one such density.

SOLUTION: Since we are about to differentiate wrt both $x$ and $y,$ the parts of $F(x,y)$ that do not involve both the variables must vanish. So we need to work with only the $xy$ part, which after the two differentiations would yield $1.$ So a density is $f(x,y) = \left\{\begin{array}{ll}1&\text{if }0<x,y<1\\ 0&\text{otherwise.}.\end{array}\right. $ ■

Problem set 6

EXERCISE 17: Find the joint CDF of $(X,Y)$ if $X\sim Bern(1/2)$ and $Y\sim Unif(0,1)$ and they are independent.

EXERCISE 18: Let $F(x,y)=\min\{x,y\}$ for $0\leq x,y\leq 1$ be the joint CDF of $(X,Y).$ Find $P\left(X\leq \frac 12, Y\leq \frac 12\right)).$

EXERCISE 19: If $(X,Y)$ have joint density $c(x^2+y)$ over the unit square, then find the joint CDF.

Properties of joint distribution: Non-decreasing

Video for this section

Theorem Let $F(x,y)$ be a bivariate CDF. Then
  1. for each fixed value of $y$, the function $x\mapsto F(x,y)$ is non-decreasing.
  2. for each fixed value of $x$, the function $y\mapsto F(x,y)$ is non-decreasing.

Proof: Fix any $y.$ Fix any $x_1 < x_2.$ Then $F(x_2,y)-F(x_1,y) = P(X\leq x_2, Y\leq y)-P(X\leq x_1, Y\leq y)=P(x_1< X\leq x_2, Y\leq y)\geq0.$

Hence the first result. Similarly for the other. [QED]

Theorem Let $F(x,y)$ be a bivariate CDF. Then $\forall x,y\in{\mathbb R}~~\forall a,b \geq 0~~ F(x,y)-F(x,y-b)-F(x-a,y)+F(x-a,y-b)\geq 0.$

Proof: Let $A = \{x-a < X \leq x,\, y-b < Y \leq y\}$, $B = \{x-a < X \leq x,\, Y \leq y\}$, $C = \{ X \leq x,\, y-b < Y \leq y\}$, and $C = \{ X \leq x-a,\, Y \leq y-b\}.$

Then $$\begin{eqnarray*} F(x,y) & = & P(A)+P(B)+P(C)+P(D),\\ F(x,y-b) & = & P(B)+P(D),\\ F(x-a,y) & = & P(C)+P(D),\\ F(x-a,y-b) & = & P(D). \end{eqnarray*}$$ So $F(x,y)-F(x-a,y)-F(x,y-b)+F(x-a,y-b)=P(A)\geq 0.$ [QED]

This property is stronger than the non-decreasing properties mentioned earlier.

Problem set 7

EXERCISE 20: Let $F(x,y)$ be CDF of $(X,Y).$ Then express $$\lim_{a,b\rightarrow0+} (F(x,y)-F(x-a,y)-F(x,y-b)+F(x-a,y-b))$$ as the probability of some familiar event.

EXERCISE 21: For a univariate CDF $F(x)$, the non-decreasing property was $\forall x\in{\mathbb R}~~\forall a>0~~F(x)-F(x-a)\geq 0.$ The proof was to note that this is $P(X\in(x-a,x])).$

For bivariate CDF $F(x,y)$ the non-decreasing property is $\forall x,y\in{\mathbb R}~~\forall a,b \geq 0~~ F(x,y)-F(x,y-b)-F(x-a,y)+F(x-a,y-b)\geq 0.$

The proof is to equate the lefd hand side to $P((X,Y)\in(x-a,x]\times(y-b,y]).$

Generalise this for trivariate CDFs. Drawing a picture would help. Remember the inclusion-exclusion principle.

Properties of joint distribution: Limits at $\pm\infty$, right continuity

Video for this section

Theorem Let $F(x,y)$ be a bivariate CDF. Then
  1. as $\min\{x,y\}\rightarrow \infty$, we have $F(x,y)\rightarrow 1$.
  2. as $\min\{x,y\}\rightarrow -\infty$, we have $F(x,y)\rightarrow 0$.

Proof: To show $$\forall \epsilon>0~~\exists M\in{\mathbb R}~~\forall x,y~~(\min\{x,y\}>M\Rightarrow F(x,y)>1-\epsilon).$$

Take any $\epsilon>0.$

Let $A_n\subseteq\Omega$ be defined as $A_n=\{X\leq n,\, Y\leq n\}.$

Then $A_n$'s increase and $\cup_n A_n = \Omega.$

So $P(A_n)\rightarrow 1.$ i.e., $F(n,n)\rightarrow 1$ as $n\rightarrow \infty.$

Hence $\exists M\in{\mathbb N}~~F(M,M)> 1-\epsilon.$

Choose this $M.$

Take any $x,y$ with $\min\{x,y\} > M.$

Then $F(x,y) \geq F(M,y) \geq F(M,M) > 1-\epsilon,$ as required.

This completes the proof of the first result.

The second result as a similar proof. [QED]

Theorem If $F(x,y)$ is the CDF of some $(X,Y)$, then $F$ is "north-east continuous" i.e., at each $(a,b)\in{\mathbb R}^2$ if $x_n\downarrow a$ and $y_n\downarrow b$, then $F(x_n,y_n)\rightarrow F(a,b).$

Proof: Let $A_n=\{X\leq x_n,\, Y\leq y_n\}$ and $A=\{X\leq a,\, Y\leq b\}.$

Since $x_n\downarrow a$ and $y_n\downarrow b$, we have $A_n\downarrow A.$

Hence the theorem follows by continuity of probability. [QED]

Problem set 8

EXERCISE 22: Let $(X,Y)$ have joint CDF $F(x,y).$ Let $x_n\uparrow a$ and $y_n\uparrow b.$ Then is it true that $F(x_n,y_n)\uparrow F(a,b)$?

EXERCISE 23: Let $(X,Y)$ have joint CDF $F(x,y).$ Find $\lim_{n\rightarrow \infty} F(x_n,y_n)$ in each of the following cases. Express the limit as the probability of some event in terms of $X,Y$, whenever possible.

  1. $x_n\rightarrow \infty, y_n\rightarrow \infty.$
  2. $x_n\rightarrow \infty, y_n\rightarrow -\infty.$
  3. $x_n\rightarrow -\infty, y_n\rightarrow -\infty.$
  4. $x_n\rightarrow -\infty, y_n\rightarrow \infty.$
  5. $x_n\equiv a, y_n\rightarrow \infty.$
  6. $x_n\equiv a, y_n\rightarrow -\infty.$
  7. $x_n\rightarrow \infty, y_n\equiv b.$
  8. $x_n\rightarrow -\infty, y_n\equiv b.$
  9. $x_n\uparrow a, y_n\uparrow b.$
  10. $x_n\downarrow a, y_n\uparrow b.$
  11. $x_n\uparrow a, y_n\downarrow b.$
  12. $x_n\downarrow a, y_n\downarrow b.$

Univariate vs multivariate CDF

Video for this section

Let $X$ be a random variable with CDF $F.$ Then the following two statements are equivalent:
  1. $F$ is continuous everywhere.
  2. $\forall a\in{\mathbb R}~~P(X=a)=0.$
Consider the corresponding statements in the bivariate scenario.
TheoremLet $(X,Y)$ have joint CDF $F(x,y).$ Consider the statements
  1. $F$ is continuous everywhere.
  2. $\forall (a,b)\in{\mathbb R}^2~~P(X=a,\,Y=b)=0.$
Here the first statement implies the second statement, but the converse is not true in general.

Proof: Let $a,b\in{\mathbb R}^2$ and $a_n\uparrow a$ and $b_n\uparrow b.$ We have $$F(a,b)-F(x_n,b)-F(a,y_n)+F(x_n,y_n)=P(X\in(a_n,a],\, Y\in(b_n,b]).$$ As $n\rightarrow \infty,$ the left hand side tends to $0,$ since $F(x,y)$ is continuous at $(a,b).$ Also the events $\{X\in(a_n,a],\,Y\in(b_n,b]\}\downarrow \{X=a,\, Y=b\}.$

So we have $P(X=a,\, Y=b)=0,$ as required.

A counterexample for the converse is discussed in the exercise below. [QED]

Problem set 9

EXERCISE 24: Let $X\sim Bernoulli\left(\frac 12\right)$ and $Y$ and density $$f(x)=\left\{\begin{array}{ll}1&\text{if }x\in[0,1]\\ 0&\text{otherwise.}\end{array}\right.$$ Let $X$ and $Y$ be independent random variables. Write down and sketch the CDFs $F_X(x)$ and $F_Y(y)$ of $X$ and $Y.$ Their joint CDF is $F(x,y)=P(X\leq x,\, Y\leq y) =P(X\leq x)P( Y\leq y) = F_X(x)F_Y(y).$ Find it and fill in the cells below with appropriate formulae for $F(x,y).$ One cell has already been filled in for you.

Is it continuous everywhere? What is $P(X=a,\,Y=b)$ for any given $(a,b)?$

Univariate CDFs are nondecreasing functions, and hence can have only countably many discontinuities.
[Because...]
You can put rationals in the gaps.
However, for bivariate or higher dimensional CDFs, the situation is drastically different.

EXERCISE 25:  There are different ways to approach a point in ${\mathbb R}^2.$ The following diagram shows some of them.

$(a,b)$ is the point at the centre.
In each case find $\lim_{(x,y)\rightarrow(a,b)} F(x,y).$ In each case the limit will be one of
$P(X < a,\, Y< b)$, $P(X \leq a,\, Y< b)$, $P(X < a,\, Y\leq b)$ and $P(X \leq a,\, Y\leq b).$

EXERCISE 26: (Continuation of the last exercise) In exactly three of the cases above we must have $\lim_{(x,y)\rightarrow(a,b)} F(x,y) = F(a,b).$ Which three?

EXERCISE 27: (Continuation of the last exercise) Argue that $F(x,y)$ is discontinuous at $(a,b)$ if and only if $P(X < a,\, Y< b) < P(X \leq a,\, Y\leq b).$

EXERCISE 28: (Continuation of the last exercise) Argue that $F(x,y)$ is discontinuous at $(a,b)$ if and only if $P(X \leq a,\, Y= b \mbox{ or }X = a,\, Y\leq b)>0.$

EXERCISE 29: (Continuation of the last exercise) Sketch the set $\{X \leq a,\, Y= b \mbox{ or }X = a,\, Y\leq b\}$ in the $XY$-plane for $(a,b) = (1,2)$ and also for $(a,b) = (1,3).$ Argue that either $F(x,y)$ has no discontinuity, or has uncountably many discontinuities.

Marginals

Video for this section

We can find the distribution of $X$ and $Y$ separately given the joint distribution of $(X,Y).$ The distributions of $X$ and $Y$ separately are called their marginal distributions. The term "marginal" is actually redundant. "Distribution of $X$ " is the same as "marginal distribution of $X$ ". The term is used just to contrast with "joint".
Theorem Let the joint CDF of $(X,Y)$ be $F(x,y).$ Let the marginal CDFs of $X$ and $Y$ be, respectively, $F_X(x)$ and $F_Y(y).$ Then

Proof: The event $\{X\leq x,\,Y\leq y\}$ increases to $\{X\leq x\}$ and $y\rightarrow \infty$ and to $\{Y\leq y\}$ as $x\rightarrow \infty.$

Applying continuity of probability, we get the result. [QED]

If $(X,Y)$ has a joint density, then we can obtain (marginal) densities of $X$ and $Y$ as follows.
Theorem If $(X,Y)$ has a joint density $f(x,y)$, then a marginal density of $X$ is given by $$f_X(x) = \int_{-\infty}^\infty f(x,y)\, dy$$ and a marginal density of $Y$ by $$f_Y(y) = \int_{-\infty}^\infty f(x,y)\, dx$$ provided these are continuous and $\forall x\in{\mathbb R}~~\int_{-\infty}^x f_X(t)\, dt = F_X(x)$ and $\forall y\in{\mathbb R}~~\int_{-\infty}^y f_Y(t)\, dt = F_Y(y).$

Proof: Enough to show that $\forall a\leq b\in{\mathbb R}~~P(a\leq X\leq b) = \int_a^b f_X(x)\, dx.$

Take any $a\leq b\in{\mathbb R}.$

Then $$P(a\leq X\leq b) = P(a\leq X\leq b,\, -\infty < Y < \infty) = \int_a^b \int_{-\infty}^\infty f(x,y)\, dy\, dx = \int_a^b f_X(x)\, dx,$$ as required.

Similarly for $f_Y(y).$ [QED]

Problem set 10

::

EXERCISE 30: [hpsjoint4.png]

::

EXERCISE 31: [hpsjoint5.png]

::

EXERCISE 32: [hpsjoint6.png]

A subtle difference between joint PDF and PMF

Video for this section

Note that if $X$ and $Y$ are jointly distributed discrete random variables, then immediatly we are assured of having their joint PMF. But not so in case of densities. Even if $X$ and $Y$ each has its own density, still $(X,Y)$ may fail to have a joint density.

EXAMPLE 11: Suppose $X$ has density $f(x)=\left\{\begin{array}{ll}1&\text{if }x\in(0,1)\\ 0&\text{otherwise.}\end{array}\right.$ and $Y = X.$ Then show that $(X,Y)$ does not have a joint density.

SOLUTION: Here the CDF of $(X,Y)$ is $$ F(x,y)=P(X\leq x,\, Y\leq y) = P(X\leq\min\{x,y\}) = \left\{\begin{array}{ll}0&\text{if }\min\{x,y\}<0\\ \min\{x,y\}&\text{if }0\leq \min\{x,y\} < 1\\ 1&\text{if }\min\{x,y\} \geq 1\\\end{array}\right.. $$ Hence, if $(X,Y)$ indeed had a joint density, then a joint density would be given by $f(x,y)$, where $$f(x,y) = \frac{\partial^2}{\partial x\partial y} F(x,y).$$ This forces $f(x,y)\equiv 0,$ which is not a PDF. ■

However, if $(X,Y)$ has a joint density, then both $X$ and $Y$ must also have (marginal) densities.

Problem set 11

EXERCISE 33: If $X$ has density as above, then does $(X,X^2)$ have a joint density?

EXERCISE 34: Does there exist a CDF such that if $X$ has that CDF, then $(X,X)$ has a joint density?

Independence

Video for this section

We already know the following general definition of jointly distributed random variables being independent:
Definition: Independence Let $X_1,...,X_n$ be jointly distributed random variables. We say they are (mutually) independent if for all $\{i_1,...,i_k\}\subseteq \{1,...,n\}$ and any $B_1,...,B_k\subseteq{\mathbb R}$ we have $$P(X_{i_1}\in B_1, ..., X_{i_k}\in B_k) = P(X_{i_1}\in B_1)\times\cdotsP( X_{i_k}\in B_k).$$
Incidentally, it is not enough to have $P(X_i\in B_i, X_j\in B_j) = P(X_i\in B_i)P( X_j\in B_j)$ for all $i\neq j.$ If only this holds, then we call $X_1,...,X_n$ only pairwise independent, which is weaker than mutual independent.

So in particular if $X,Y$ are independent, then $$\forall x,y\in{\mathbb R}~~P(X\leq x,\, Y\leq y) = P(X\leq x)\times P(Y\leq y).$$ In other words, the joint CDF factors into the marginal CDFs: $$\forall x,y\in{\mathbb R}~~F(x,y) = F_X(x)F_Y(y).$$ We had mentioned last semester that CDF characterises the entire distribution (i.e., if we know the probabilities of all events of the form $\{X\leq x\},$ then we can work out $P(X\in B)$ for every event $B$). So the next theorem is anticipated.
Theorem Two jointly distributed random variables $X,Y$ are independent if and only if $$\forall x,y\in{\mathbb R}~~F(x,y) = F_X(x)F_Y(y).$$
This is the general case. Now, if there is a joint density, then that can be factored into marginal densities, as well:
Theorem Two jointly distributed random variables $X,Y$ having joint density $f(x,y)$ are independent if and only if $$\forall x,y\in{\mathbb R}~~f(x,y) = f_X(x)f_Y(y),$$ for some marginal densities $f_X$ and $f_Y.$

Proof: If part: For any $x,y\in{\mathbb R}$ we have $$F(x,y) = P(X\leq x,\,Y\leq y) = \int_{-\infty}^y\int_{-\infty}^x f(x,y)\, dx\,dy =\int_{-\infty}^y\int_{-\infty}^x f_X(x)f_Y(y)\, dx\,dy = \left[\int_{-\infty}^xf_X(x)\,dx\right]\times\left[\int_{-\infty}^y f_Y(y)\,dy\right] = F_X(x)F_Y(y).$$

Only if part: Let $X,Y$ be independent. Let $f_X$ and $f_Y$ be densities for $X$ and $Y.$ Then for any $[a,b]$ and $[c,d]$ we have $$\int_a^b\int_c^d f_X(x)f_Y(y)\,dy\,dx =\int_a^b f_X(x) \, dx \int_c^d f_Y(y)\,dy = P(X\in[a,b])P(Y\in[c,d]) = P(X\in[a,b],\,Y\in[c,d]).$$ Hence $f_X(x)f_Y(y)$ is a joint density for $(X,Y).$ [QED]

As in the discrete case, here also we have the result that if $X,Y$ are independent, and $E(X), E(Y)$ exist, then $E(XY)$ exists and equals $E(X)E(Y).$ The proof is straightforward using factorisation of joint density.

Problem set 12

EXERCISE 35: We toss two fair coins independently, and define 3 random variables $X,Y,Z$ based on the outcomes as follows. $X=1$ or $0$ according as the first toss shows head or not. Similarly, $Y=1$ or $0$ according as the second toss shows head or not. $Z=X$ if $Y=1$, else $Z=1-X.$ Show that $X,Y,Z$ are pairwise independent, but not mutually independent.

EXERCISE 36: If two independent random variables $X,Y$ have marginal densities $f(t) = e^{-\lambda t}$ for $t>0$ (and 0 else), then find the joint density of $(X,Y).$

EXERCISE 37: $(X,Y)$ is distributed uniformly over the unit disc in ${\mathbb R}^2.$ Are $X,Y$ independent?

EXERCISE 38: If the joint density of $(X,Y)$ is of the form $f(x)g(y),$ then show that $X$ and $Y$ must be independent. Also show that $f_X\propto f$ and $f_Y\propto g.$

EXERCISE 39: If $(X,Y)$ are independent, then is it true that the joint CDF is the product of the marginal CDFs?

EXERCISE 40: Let $(X,Y)$ have joint density $f(x,y) = g(x) h(y),$ where $g(\cdot)$ and $h(\cdot)$ are not necessarily density functions. Find margial densities of $X$ and $Y$, and show that $X$ and $Y$ must be independent.

Expectation using joint density

Video for this section

Suppose $X,Y$ are jointly distributed. We often need to find the expectation of $h(X,Y)$ for some given function $h(x,y).$ For this we can of course employ the definition by first defining a new random variable, $Z=h(X,Y)$. This will take us back to the univariate set up, that we already know to handle. But a simpler alternative exists.
Theorem Let $(X,Y)$ have joint density $f(x,y).$ If $h(X,Y)$ is a non-negative random variable, then $$E(h(X,Y)) = \int_{-\infty}^\infty\int_{-\infty}^\infty h(x,y)f(x,y)\, dx\,dy.$$ This always exists (though may be $\infty$).
This theorem is the obvious generalisation of the univatiate density case, and like that will be proved once we learn about Lebesgue integral later in this course. If $h(X,Y)$ can take both positive and negative values, then we proceed in the usual way.
Theorem Let $(X,Y)$ have joint density $f(x,y).$ Let $h(X,Y)$ be a random variable. Let $$\begin{eqnarray*} h(x,y)_+ & = & \max\{h(x,y),0\},\\ h(x,y)_- & = & \max\{-h(x,y),0\}. \end{eqnarray*}$$ Then $$E(h(X,Y)) = E(h(X,Y)_+)-E(h(X,Y)_-),$$ unless both the expectation on the rhs are $\infty$, in which case $E(h(X,Y))$ is undefined.

Proof: This follows immediately from the general definition of expectation. [QED]

Again, as in univriate density case, we have a simpler formula for the special case where $E(|h(X,Y)|) < \infty.$
Theorem Let $(X,Y)$ have joint density $f(x,y).$ If $h(X,Y)$ is a random variable with $E(|h(X,Y)|) < \infty$,, then $$E(h(X,Y)) = \int_{-\infty}^\infty\int_{-\infty}^\infty h(x,y)f(x,y)\, dx\,dy.$$ This must be finite.

We have the definition of covariance and correlation as before: $$cov(X,Y) = E[(X-E(X))(Y-E(Y))] = E(XY)-E(X)E(Y),$$ and $$cor(X,Y) = \frac{cov(X,Y)}{\sqrt{V(X)V(Y)}}.$$ Cauchy-Schwartz inequality is also the same: $$cov(X,Y)^2\leq V(X) V(Y),$$ where iequality hold if and only $V(aX+bY) = 0$ for some $a,b\in{\mathbb R}.$ The proof that we showed in Probability I was general. An immediate coensequence is that $-1\leq cor(X,Y) \leq 1.$ Also $cor(X,Y)=1$ if and only if $V(Y-aX) = 0$ for some $a>0.$ SImilarly, $cor(X,Y)=-1$ holds if and only if $V(Y-aX) = 0$ for some $a < 0.$ Check if you remember the proofs.

Problem set 13

EXERCISE 41:  Let $(X,Y)$ be uniformly distributed over $S=\{(x,y)~:~0\leq x\leq 1,~x\geq y\geq0\}.$

  1. Sketch the set $S$ as a shaded subset of ${\mathbb R}^2.$
  2. Write down a joint density for $(X,Y).$
  3. Evaluate $cov(X,Y).$

EXERCISE 42:  Find $E(X^2Y)$ when $(X,Y)$ has joint density $$f(x,y) = \left\{\begin{array}{ll}x+y&\text{if }0< x,y < 1\\ 0&\text{otherwise.}\end{array}\right.$$

Expectation of product of independent random variables

Video for this section

Last semester we had worked with simple random variables, and had seen the result that if $X,Y$ are independent simple random variables, then $E(XY)=E(X)E(Y).$ This result is useful and can be generalised to other random variables, as well. However, for a general random variable expectation may be infinite or undefined. In order to avoid $\infty\times 0$ situations, we restrict ourselves to only the random variables with finite expectations. Then we have the following theorem.
TheoremIf $X,Y$ are independent, and $E(X)$, $E(Y)$ both are finite, then $E(XY)$ must also be finite, and $E(XY)=E(X)E(Y).$

Proof: While the theorem holds for general random variables, we shall prove it here only for the case when $X,Y$ both have densitities, $f_X(x)$ and $f_Y(y)$, say.

Since $X,Y$ are independent, hence $(X,Y)$ must have joint density $f(x,y) = f_X(x)f_Y(y).$

We have to work with expectation of $XY$, which may take both positive and negative values. So we cannot immediately apply the integration formula for expectation.

Here this result will come to our help: for a random variable $Z$ we have $E(Z)$ finite if and only $E(|Z|)< \infty.$ We had seen its proof in Probability I (easy proof: $Z=Z_+-Z_-$ and $|Z|=Z_++Z_-$). The advantage of working with $E(|Z|)$ instead of $E(Z)$ is that $|Z|$ is non-negative, and hence always has well-defined expectation (though possibly $\infty$).

We shall first show that $E(|XY|)< \infty,$ which will show the $E(XY)$ is finite, and will let us apply the integration formula.

Since $|XY|$ is non-negative, hence we can use the integration formula for it: $$\begin{eqnarray*} E(|XY|) & = & \int_{-\infty}^\infty\int_{-\infty}^\infty |xy| f(x,y)\, dx\, dy\\ & = & \int_{-\infty}^\infty\int_{-\infty}^\infty |xy| f_X(x)f_Y(y)\, dx\, dy\\ & = & \int_{-\infty}^\infty |x|f_X(x)\,dx\int_{-\infty}^\infty |y| f_Y(y)\, dy\\ & = & E(|X|)E(|Y|)<\infty. \end{eqnarray*}$$

Now we are entitled to use the integration formula for $E(XY),$ and so the same logic as above gives $$\begin{eqnarray*} E(XY) & = & \int_{-\infty}^\infty\int_{-\infty}^\infty xy f(x,y)\, dx\, dy\\ & = & \int_{-\infty}^\infty\int_{-\infty}^\infty xyf_X(x)f_Y(y)\, dx\, dy\\ & = & \int_{-\infty}^\infty xf_X(x)\,dx\int_{-\infty}^\infty y f_Y(y)\, dy\\ & = & E(X)E(Y). \end{eqnarray*}$$ [QED]

Two points to be noted about this theorem:
  1. The converse is not true. We have seen counterexamples even in the simple case.
  2. If $X,Y$ are jointly distributed, and both $E(X)$ and $E(Y)$ are finite, even then $E(XY)$ may fail to be finite, or even exist. The exercises below will give some counterexamples.

Problem set 14

EXERCISE 43: Real analysis tells us that $\sum \frac{1}{n^3} < \infty.$ Call this $c.$ We can manufacture the following PMF from this: $$p(x) = \left\{\begin{array}{ll}\frac{1}{c x^3}&\text{if }x\in{\mathbb N}\\ 0&\text{otherwise.}\end{array}\right.$$ Let $X$ be a random variable with this PMF. Let $Y = X.$ Show that $E(X), E(Y)$ are both finite, though $E(XY)$ is not finite.

EXERCISE 44: Play with the last exercise to come up with a counterexample where $E(X), E(Y)$ are both fnite, but $E(XY)$ does not exist.

[Hint]

Let $X$ take value $(-1)^n n$ with probability $\frac{1}{cn^3}$ where $c = \sum \frac{1}{n^3} < \infty.$ Take $Y = |X|.$

EXERCISE 45:  Let $X$ have density $f(x) =\left\{\begin{array}{ll}\frac 12&\text{if }|x| < 1\\ 0&\text{otherwise.}\end{array}\right. $ Let $Y = X^2.$ Are $X,Y$ independent? Show that $E(XY) = E(X)E(Y).$

Expectation, variance of random vectors

Video for this section

If $X_1,...,X_n$ are jointly distributed random variables, then $\v X=(X_1,...,X_n)'$ is called a random vector, and is usually considered as an $n\times 1$ column vector. We define $E(\v X)$ as $$E(\v X) = \left[\begin{array}{ccccccccccc}E(X_1)\\\vdots\\E(X_n) \end{array}\right].$$ The motivation is again from statistical regularity. If you take many IID replications of $\v X$ and average them, then (under very general conditions) the average will converge to $E(\v X).$

The case of dispersion is slightly trickier. We shall start with the main definition and provide the motivation later.
Definition: Dispersion matrix Let $\v X$ be a random vector with components $X_1,...,X_n.$ Then its dispersion matrix or variance matrix or variance-covariance matrix is defined as the $n\times n$ symmetric matrix with $(i,j)$-th entry $cov(X_i,X_j).$ In particular, its $i$-th diagonal entry is $V(X_i).$

Problem set 15

EXERCISE 46: If $X_1,...,X_n$ are IID with mean 2 and variance 5, and $\v X = (X_1,...,X_n)'$, then find $E(\v X)$ and $V(\v X).$

EXERCISE 47: If $X_1\sim Binom\left(10,\frac 13\right)$ and $X_2=10-X_1$ and $\v X = (X_1,X_2)'$ then find $E(\v X)$ and $V(\v X).$

EXERCISE 48: Let $\v X = (X_1,X_2,X_3)'$ have $$V(\v X) = \left[\begin{array}{ccccccccccc} 3 & 2 & 1\\ 2 & 4 & 2\\ 1 & 2 & 5 \end{array}\right].$$ Find $cor(X_1,X_3).$

EXERCISE 49: In the last problem, also find $V(X_1-3X_2)$ and $cov(X_1+X_2,X_3).$

Motivation behind the definition of dispersion matrix

Video for this section

To motivate the definition of dispersion matrixconsider the bivariate case. Let the two components of our random vector be $X$ and $Y.$ If we take many IID replications, we get points like $(X_1,Y_1),...,(X_n,Y_n).$ Think of these like a scatterplot.
If you look at this cloud of points from position A, then the points appear more scatterred than when we look from B. This is an interesting feature of multivariate dispersion, it depends on how you look at it. A good measure of dispersion should not depend on the direction you are looking from, rather it should capture the comprehensive picture, from which we should be able to work out the dispersion from any desired direction. To achieve this, imagine a ruler placed on the scatterplot with its 0 mark at the origin. Parallel rays of light are shining perpendicularly down on the ruler from both sides, casting shadows of the points on the ruler:
Light rays (shown in red) are dropping perpendicularly on the ruler
Then each bivariate point reduces to a single number along the scale, and we may compute variance of the numbers to measure the dispersion when looking from that particular direction. To quantify the placement of the ruler, imagine a unit vector $\v u$ along the ruler from its 0 mark (at the origin) reaching up to its 1 mark.
Projecting a typical point perpendicularly on the ruler
Then a point $\v v \equiv (X,Y)$ will project to the vector $$\frac{\v v'\v u}{\v u'\v u} \v u,$$ shown in blue. Measured in units of $\v u$, this will show up at the mark $\frac{\v v'\v u}{\v u'\v u}$ of the ruler.

Now, $\frac{\v v'\v u}{\v u'\v u} = pX+qY$ for some $p,q\in{\mathbb R}.$ Then the variance of $pX+qY$ will be $p^2 V(X)+q^2 V(Y) + 2pq\, cov(X,Y),$ which may be written as (check!) $$\left[\begin{array}{ccccccccccc}p & q \end{array}\right]\left[\begin{array}{ccccccccccc}V(X) & cov(X,Y)\\cov(X,Y) & V(Y) \end{array}\right]\left[\begin{array}{ccccccccccc}p\\q \end{array}\right].$$ Here $p,q$ are controlled by the ruler. Notice that the matrix in the middle does not involve $p,q.$ Thus, it contains information about dispersion for every possible way of placing the ruler. This matrix is indeed the dispersion matrix we defined above.

Here we talked about 1-dim projection of 2-dim. In general, we do 1-dim projection of $n$-dim data. To see this visually on a computer screen you may like to consider 2-dim projection of 3-dim data. Run this R code on this data set to get a 3D scatterplot that you can turn with your mouse. Depending on how you turn it, the points will appear to be closely/loosely clustered.

Problem set 16

EXERCISE 50: Consider the toy bivariate data set $(1, 2), (3, 4), (2.1, 3.1), (4, 5).$ Draw the scatterplot. Imagine that we are looking down as shown. Guess the variance as seen from that direction. Check your guess by actual computation.

EXERCISE 51: Let the dispersion matrix of $(X,Y)$ be $\left[\begin{array}{ccccccccccc}1 & 0\\0 & 2 \end{array}\right]$. Find $\theta\in [0,\pi)$ such that $V(\cos (\theta) X + \sin(\theta) Y)$ is maximum. When is the variance minimum.

Properties of expectation, variance of random vectors

Video for this section

The following facts are immediate from the definition.
Theorem If $E(\v X)=\v\mu$ and $V(\v X) = \Sigma,$ then for any matrix $A_{m\times n}$ and any vector $\v b_{m\times 1}$ we have
  1. $E(A\v X+\v b) = A\v\mu+\v b$,
  2. $V(A\v X+\v b) = A\Sigma A'$

Proof: Let $\v Y = A\v X+\v b.$ Then $\v Y = (Y_1,...,Y_m)'$, where $Y_i = \sum_j a_{ij}X_j + b_i.$ Here I have denoted the $(i,j)$-th entry of $A$ by $a_{ij}.$

Now compute $E(Y_i)$ and $cov(Y_i,Y_j)$ directly to get the result.

By the way, the $(i,j)$-th entry of $A\Sigma A'$ is $\sum_r\sum_s a_{ir} \sigma_{rs} a_{js}.$ [QED]

Theorem Any dispersion matrix is NND. In other words, if $\Sigma_{n\times n}=V(\v X)$, then $\forall \v\ell\in{\mathbb R}^n~~\v\ell'\Sigma\v\ell\geq 0.$

Proof: By the last theorem, $\v\ell'\Sigma\v\ell=V(\v\ell'\v X)\geq 0.$ [QED]

The converse is also true:
Theorem If $\Sigma_{n\times n}$ is any NND matrix, then it $V(\v X)$ for some random vector $\v X_{n\times 1}.$

Proof: Let $U_1,...,U_n$ be independent with $\forall i~~V(U_i)=1.$ Then $\v U= (U_1,...,U_n)'$ has $V(\v U) = I_n.$

Since $\Sigma$ is NND, hence $\Sigma = AA'$ for some $A_{n\times n}.$

Let $\v X = A\v U.$ Then $V(\v X) = A I_n A' = AA' = \Sigma.$ [QED]

Problem set 17

EXERCISE 52: Show that for $\left[\begin{array}{ccccccccccc}a & b\\b & c \end{array}\right]$ to be a dispersion matrix, a necessary condition is that $b^2\leq ac.$ Is it a sufficient condition?

EXERCISE 53: Show that $V(\v X)$ is singular if and only if $P(a_1X_1+\cdots+a_n X_n=c)=1$ for some constants $a_i$'s and $c$ such that not all $a_i$'s are zero.

EXERCISE 54: If $\v X=(X,Y)'$ and $V(\v X)$ is a singular, then how will a scatterplot of replications from $\v X$ look like? Here we are running the random experiment underlying $\v X$ repeatedly, and getting $(X_1,Y_1), (X_2,Y_2),...,(X_n,Y_n),$ and plotting these $n$ points as a scatterplot. Your job is a to identify some geometric pattern in the plot.

Conditional density (intuition)

Video for this section

So far distributions with densities behave very similarly to the discrete distributions, with integration replacing summation. But we cannot follow the same path for conditional distribution. If $(X,Y)$ are jointly discrete then we defined the conditional PMF of $X$ given $Y=y$ as $x\mapsto P(X=x|Y=y) = \frac{P(X=x,Y=y)}{P(Y=y)},$ and we did this only for those $y$ for which $P(Y=y)>0.$

But if $(X,Y)$ has a joint density, then $P(Y=y)$ is always 0. So we employ a little trick. For a discrete random variable $X$ with PMF $f(x)$ we have $\forall a\in{\mathbb R}~~f(a) = P(X=a).$ But had $f(x)$ been a density, then we could not write this anymore, since $\forall a\in{\mathbb R}~~P(X=a)=0.$ However, if $f$ is continuous at $a$, then we could think $$P(X\approx a) = P\left(X\in\left(a-\frac \epsilon2,a+\frac \epsilon2\right) \right) \approx f(a) \epsilon.\hspace{1in} \mbox{(*)}$$ Similarly, if $(X,Y)$ has joint density $f(x,y)$, which is continuous at $(a,b)$ we can say $$\begin{eqnarray*} P(X\approx a,\, Y\approx b) & = & P\left(X\in \left(a-\frac \epsilon2, a+\frac \epsilon2\right) ,\, Y\in \left(b-\frac \epsilon2,b+\frac \epsilon2\right) \right)\\ & \approx & f(a,b)\epsilon^2. \end{eqnarray*}$$ So instead of working with $P(X=a | Y=b)$ we shall instead work with $$P(X\approx a | Y\approx b) = \frac{P(X\approx a,\, Y\approx b)}{P(Y \approx b)} \approx \frac{f(a,b) \epsilon^2}{f_Y(b) \epsilon} = \frac{f(a,b)}{f_Y(b)} \epsilon.$$ The similarity between this and (*) immediately leads us to define conditional density of $X$ given $Y=y$ as $$f_{X|Y}(x,y) =\frac{f(x,y)}{f_Y(y)}.$$ Since we are dividing by $f_Y(y)$, we naturally want $f_Y(y)>0.$ But $f_Y$ being a density can be given arbitrary non-negative value at any fixed $y.$ To uniquely specify $f_Y(y)$ we naturally assume continuity of $f_Y$ at that point. So we arrive at thefollowing definition.
Definition: Conditional density Let $(X,Y)$ have joint density, $f(x,y).$ Then we define conditional density of $X$ given $Y=y$ as $$f_{X|Y}(x,y) =\frac{f(x,y)}{f_Y(y)}.$$ Here we are assuming that $y$ is such that $f_Y$ is continuous and positive at $y.$
It is obvious that this is a density, since it is non-negative, and $\int_{-\infty}^\infty f_{X|Y}(x,y)\, dx = \frac{\int_{-\infty}^\infty f(x,y)\, dx}{f_Y(y)}=1.$ The most glaring difference between conditional PDF and conditional PMF is that the conditional PDF is not a conditional probability, since $P(Y=y)=0.$ Due to the same reason, $\int_a^bf_{X|Y}(x,y)\, dx$ does not give $P(X\in [a,b]|Y=y),$ as $P(Y=y)=0.$

Problem set 18

EXERCISE 55: Find conditional density of $X$ given $Y=\frac 12$ if the joint density of $(X,Y)$ is

$$f(x,y) = \left\{\begin{array}{ll}x+y&\text{if }0< x,y < 1\\ 0&\text{otherwise.}\end{array}\right.$$

Conditional density (rigour)

Video for this section

We defined conditional density in a heuristic way. However, the theorem of total probability is still valid perfectly rigourously:

Total probability$\int_c^d \int_a^bf_{X|Y}(x,y)f_Y(y)\, dxdy=P(X\in [a,b], Y\in[c,d]).$

Proof: This is obvious from the definition of $f_{Y|X}(x,y):$ $$\begin{eqnarray*} \int_c^d \int_a^bf_{X|Y}(x,y)f_Y(y)\, dx\,dy & = & \int_c^d \int_a^b\frac{f(x,y)}{f_Y(y)}f_Y(y)\, dx\,dy\\ & = & \int_c^d \int_a^b f(x,y)\,dx\,dy\\ & = & P(X\in [a,b], Y\in[c,d]).\end{eqnarray*}$$ [QED]

It is this theorem that justifies the definition of conditional PDF.

As in the discrete case, here also we have concepts of conditional expectation, conditional variance etc.

Definition: If $(X,Y)$ has a joint density $f(x,y),$ then $E(X|Y=y) = \int_{-\infty}^\infty x\,f_{X|Y}(x,y)\, dx$ and $$V(X|Y=y) = E((X-E(X|Y=y))^2|Y=y).$$
The tower property also works as before, as do the relation between conditional and unconditional variances:
Theorem If $(X,Y)$ has a joint density, then
  1. $E(X) = E(E(X|Y)).$
  2. $V(X) = E(V(X|Y)) + V(E(X|Y))$.

Proof: Enough to show the first, since the other two follow from it (as we have already seen last semester).

Let $f(x,y)$ be a joint density of $(X,Y).$ Then $$E(X|Y=y) = \int_{-\infty}^\infty xf_{X|Y}(x,y)\, dx = \frac{\int_{-\infty}^\infty xf(x,y)\, dx}{f_Y(y)}.$$

So $$E(E(X|Y)) = \int_{-\infty}^\infty\frac{\int_{-\infty}^\infty xf(x,y)\, dx}{f_Y(y)}f_Y(y)\, dy = \int_{-\infty}^\infty\int_{-\infty}^\infty xf(x,y)\, dx\, dy =E(X),$$ as required. [QED]

Problem set 19

EXERCISE 56: If $(X,Y)$ is uniformly distributed over the triangle $\{(x,y)~:~0\leq x \leq y,\, 0\leq y\leq 1\}.$ Guess a conditional density of $X$ given $Y=y?$ First try to guess, and then check it from the definition.

EXERCISE 57: Let $X|Y=y$ have density $f_{X|Y}(x,y) = \left\{\begin{array}{ll}c_y x^2&\text{if }x\in[0,y]\\ 0&\text{otherwise.}\end{array}\right.$, where $c_y$ is free of $x.$ Let $Y$ be uniformly distributed over $[0,1]$. Find $f_{Y|X=x}(y,x).$

EXERCISE 58: If $(X,Y)$ has joint density $f(x,y)=\left\{\begin{array}{ll}x+y&\text{if }0\leq x,y\leq 1\\ 0&\text{otherwise.}\end{array}\right.,$ then find $E(X|Y=y)$ and $V(Y|X=x).$

Exchangeable distribution

If $X_1, X_2, X_3$ are IID, then the joint distribution of $(X_1,X_2,X_3)$ is the same as that of $(X_2,X_3,X_1)$ or $(X_1,X_3,X_2)$ or any other permutation of the three random variables. This "invariance under permutation" property is called exchangability, and is found in many joint distributions other than the IID set up.

Definition: Exchangeable We say that the jointly distributed random variables $X_1,...,X_n$ are exchangable if for any permutation $\pi$ of $(1,...,n)$ the joint distribution of $(X_1,...,X_n)$ is the same as that of $(X_{\pi(1)},...,X_{\pi(n)}).$
Here is a non-IID example.

EXAMPLE 12:  In a box we have 10 balls 4 of which are black, the rest being light magenta (with a tinge of yellow on one side). 2 balls are drawn one by one using SRSWOR. Let $X_1=$ the indicator of the $i$-th selected ball being black ($i=1,2$). Then show that $X_1,X_2$ are exchangeable.

SOLUTION:
$X_2=0$$X_2=1$
$X_1=0$$\frac{6\times5}{10\times9}$$\frac{6\times4}{10\times9}$
$X_1=1$$\frac{4\times6}{10\times9}$$\frac{4\times3}{10\times9}$
Since this matrix is symmetric, hence the result. ■

Obviously such brute force computation will be infeasible if the number of random variables increase. So you will need to proceed more systematically to answer the next problem.

EXERCISE 59:  We have $n$ balls $m$ of which are dark purple (the rest being of a nondescript colour). We draw an SRSWOR of $k$ balls. Let $X_i=$ the indicator of the $i$-th selected ball being dark purple. Show that $X_1,...,X_k$ are exchangeable.

EXERCISE 60: Consider Polya's urn scheme (5 black 5 white to start with, 1 ball drawn at each step, replaced and 1 more ball of the observed colour added). Let $X_i=$ indicator of the $i$-th drawn ball being black. Show that $X_1,X_2,...,X_n$ are exchangeable for $n\in{\mathbb N}.$

Theorem If $X_1,...,X_n$ are exchangeable, then for any $\{i_1,...,i_k\}\subseteq \{1,...,n\}$ the joint distribution of $(X_{i_1},...,X_{i_k})$ depends only on $k,$ and not on $i_1,...,i_k.$

Proof: Let $F(x_1,...,x_n)$ be the joint CDF of $(X_1,...,X_n).$

Let $\pi$ be any permutation $\{1,...,n\}$ with $\pi(1)=i_1, ..., \pi(k)=i_k.$ Then by exchangeability $F(x_1,...,x_n)$ is the joint CDF of $(X_{\pi(1)},...,X_{\pi(n)})$ as well.

Then the joint CDF of $(X_{i_1},...,X_{i_k})$ is $F(x_1,...,x_k,\infty,...,\infty),$ which is free of $i_1,...,i_k,$ as required. [QED]

Exchangeable random variables allow for symmetry arguments. The next problem is one example.

Problem set 20

EXERCISE 61:  If $X_1,...,X_n$ are exchangeable positive random variables with finite expectations, then find $E((X_1+X_2)/(X_1+\cdots+X_n)).$

EXERCISE 62: Three dice are rolled and their outcomes are called $X_1,X_2$ and $X_3.$ Let $Y_1 = X_1+X_2,$ $Y_2 = X_2+X_3,$ and $Y_3 = X_3+X_1.$ Is $(Y_1,Y_2,Y_3)$ exchangeable? Justify your answer.

EXERCISE 63: A box contains 10 balls numbered 1 to 10. A ball is drawn at random, and its number noted. Without replacing the ball, another ball is drawn at random from the rest, and its number is also noted. If the two numbers are $X$ and $Y$, respectively, then is $(X,Y)$ exchangeable?

EXERCISE 64: (Continuation of the last exercise) Solve the last problem if at each step the ball with number $i$ on it is selected with probability proportional to $i.$

Joint distribution of mixed type

So far we have been considering joint distributions of $(X,Y)$ where either both $X$ and $Y$ were discrete, or both had densities. It is possible to work with joint (and conditional) distributions when one is discrete and the other has density. Let us start with an example.

EXAMPLE 13:  I pick a random coin. So its probability of head is also a random variable (just as the height of a randomly selected person is considered random). Let $\Pi$ denote this random variable. Now I toss this coin 5 times. Let $X$ be the number of heads. Then what is the joint distribution of $(\Pi,X)?$ Also find the conditional distribution of $\Pi$ given $X=x.$

SOLUTION: If I tell you the value of $\Pi$ (say $\Pi=0.5$), then clearly $X\sim Binom(5,0.5).$

So, in general, $Y|\Pi=p\sim Binom(5,p).$

Hence the joint distribution has a probability density-cum-mass function (which is often called just a density with some abuse of notation): $$g(p,x) = \left\{\begin{array}{ll}f(p)\binom{5}{x} p^x(1-p)^{5-x}&\text{if }p\in(0,1),~~x\in\{0,1,...,5\}\\ 0&\text{otherwise.}\end{array}\right.$$ To find $P(\Pi\in A,\, X\in B)$ just sum this over $p\in A$ and integrate over $x\in B.$

Conditional density of $\Pi$ given $X=x$ is $g(p, x) / P(X=x),$ where $P(X=x) = \int_{-\infty}^\infty g(p,x)\, dp.$ ■

Miscellaneous problems

::

EXERCISE 65: [hpsjoint3.png]

::

EXERCISE 66: [hpsjoint7.png]

::

EXERCISE 67: [hpsjoint8.png]

::

EXERCISE 68: [hpsjoint9.png]

::

EXERCISE 69: [hpsjoint10.png]

::

EXERCISE 70: [hpsjoint11.png]

Here $X_i = U_{(i)}$ in our notation, and $R=U_{(n)}-U_{(1)}$ is the range of the $U_i$'s.

I think this problem should better be attacked after learning about order statistics in the next chapter.

::

EXERCISE 71: [rossipmjoint1.png]

::

EXERCISE 72: [rossipmjoint2.png]

::

EXERCISE 73: [rossipmjoint3.png]

::

EXERCISE 74: [rossipmjoint4.png]

::

EXERCISE 75: [rossipmjoint5.png]

::

EXERCISE 76: [rossipmjoint6.png]

::

EXERCISE 77: [rossipmjoint8.png]

::

EXERCISE 78: [rossipmjoint9.png]

::

EXERCISE 79: [rossipmjoint10.png]

::

EXERCISE 80: [rossipmjoint11.png]

::

EXERCISE 81: [rossipmjoint12.png]

::

EXERCISE 82: [rossipmjoint13.png]