[Home]

Table of contents


Histogram

We have seen how barplot of proportions show statistical regularity. Incidentally, there is a name for this: relative frequency. The actual number of times an outcome occurs is called the frequency. For instance, if you toss a coin 100 times and get 45 heads and 55 tails, then the frequencies are 45 and 55, while the relative frequencies are 0.45 and 0.55. Often we like to show the frequencies as a table, like
ValueFrequency
Head45
Tail55
Total100
It is called a frequency distribution table. Similarly, you construct a relative frequency distribution table.

As we have already mentioned, the relative frequency distribution table (or equivalently, its graphical representation, the barplot) converges, as the sample size goes to infinity. The limit is called the probability mass function (pmf).

Here the outcomes were discrete, taking only finitely many values. Often we encounter continuous outcomes, that can take any value in an interval, like the height of a person. The barplot technique cannot be used in these cases.

So we discretise the continuous output into a finite number of intervals or bins, before applying the barplot technique.

We start with a frequency distribution for the discretised data, where the bins play the role of values. This called a frequency distribution for grouped data. The bins are called classes. They are adjacent to each other, and we need some convention to decide about which class gets the boundary point. A typical example could be like
[0,1], (1,2], (2,4], (4,6].
We generally work with data that take value in a bounded interval $(a,b).$

Then we compute the relative frequencies. Finally we divide the relative frequency of a class by the class width to get the relative frequency density for that class. A barplot for this is called a histogram, which is powerful graphical device to harness statistical regularity.

Data from a distribution

We say that $X_1,...,X_n$ constitute a random sample from a distribution if they are the outcomes of repeated independent trials of the same random experiment, and their barplot or histogram converges to that distribution. We also say that $X_1,...,X_n$ are independently and identically distributed (IID) from that distribution.