Whatever data we collect is like a cup of water from a vast ocean. The cup of water is all that we have to base our inference on, but it is not the water in the cup that we want to draw inference about. The target of our inference is the entire ocean.The statistical term for the cup of water is a sample, the ocean being called the population. The very term population conjures up the vision of the totality of all the people living in a country. While this is indeed a very important example, statistics uses the term "population" in a broader sense: Suppose that I toss a coin. Thus is a random experiment that I can repeat as many times as I like. A statistician likes to think of this as drawing "head"s and "tail"s from an infinite population consisting of many, many "head"s and "tail"s. Since the population is infinite, we cannot really say that the chance of obtaining a "head" is the total number of heads divided by the population size. Instead, we pretend that God is handing out the "head"s and "tail"s randomly with certain probabilities. The "population" then is not just an infinite set, it is the entire random experiment. This approach may appear a bit wierd at first, and may take some time to digest. But that's how you should learn to think in order to study statistics. The idea if statistics is to repeat the experiment a large number of times (or, equivalently, to draw a large sample from the population) and use statistical regularity to learn about the random experiment (or, equivalently, the population).
EXAMPLE: If we measure the amount of dust or suspended particulate matter (SPM) in air everyday in the same location we see random fluctuations in the values. Clearly, the values are not independent. Here is one way to statistically model the data:
Let $\epsilon_t = $ the amount of fresh SPM generated on day $t.$ We assume $\epsilon_t$'s are IID from some random experiment. We link these with the observed data as follows: $$ X_t = \epsilon_t + \theta_1 X_{t-1} + \theta_2 X_{t-2}. $$ Thishas the interpretation that the amount of SPM is partly due to the residual SPM from the last two days plus the fresh SPM generated today. The constants $\theta_1$ and $\theta_2$ are the fractions determining how much the SPM of the last two days influence today's SPM. This model has three unknowns: the random experiment from which the $\epsilon_t$'s were generated, $\theta_1$ and $\theta_2.$ The job of the statistician is to collect lots of $X_t$'s (i.e., measure $X_t$'s over many days) and then somehow use statistical regularity to find these unknown quantities. /// From where did we get this model? Is there any theory that SPM indeed behaves in this way? Not really. It is just a model, a mathematically simple way to approximate the random behaviour of $X_t$'s. In statistics we start by assuming some such model, and estimate the unknown quantities based on the data. Then we compare the actual data with the fitted model. If the fitted model exlains the behaviour of the data well, then we are happy, else we look for some other model. This is much like fitting a polynomial to a scatterplot. We start by fitting a straight line: $y = \alpha + \beta x,$ i.e., by choosing the values of $\alpha$ and $\beta $ that gives the best possible fit (according to some suitable criteria). Then we draw this best line on the scatterplot, and decide if it is a good fit. The best fit need not be a good fit, just as the best swimmer in India is not a good swimmer according to the Olympic standard. If our best fitting line is indeed a good fit, we are happy. Otherwise we look for a different model, say all polynomials of the form $\alpha + \beta x + \gamma x^2.$ Again, the same titual follows: we pick those values for the parameters $\alpha$, $\beta $ and $\gamma$ that give the best fit (within this class of models), and check its goodness-of-fit. This is the general statistical workflow: