[Home]

Table of contents


The R code for the Mendel model done in class was a bit tough for many of you. That is understandable. In this page I shall discuss some R techniques used in it.

Plot and locator

The locator() function allows you to interactively select a point from a plot. To see its effect start a new plot:
plot(1:10, 1:10)
This just creates scatterplot with the points ($(k,k)$ for $k=1,...,10.$ Now issue the command
locator(1)
The prompt will go away, indicating that the program is running waiting for input from you. Visit the plot window, take your mouse over the plot region. You will notice the mouse cursor changing to a crosshair. Click at some point. Immediately the R prompt will return and coordinates (wrt the axis used in the plot) of the click point will be shown on screen. The 1 passed to locator tells it to return after a single click. If you want to click thrice, then pass 3. Then after three clicks a list will be returned with two fields in it: x and y, each a vector of length 3. Here is how you access these:
p = locator(3)
p$x   #All the x values
p$y   #All the y values
p$x[1] #x component of the first click
p$y[1] #y component of the first click
The function points adds points to an existing plot. Points outside the current plot region are silently ignored.
plot(1:10,1:10)
points(5,6.col='red')
points(c(2,4,2), c(3,9,1), col='blue') #Plots (2,3), (4,9) and (2,1)

EXERCISE:  Use the plot, locator and points functions to draw a point at a clicked point.

EXERCISE:  Repeat the above exercise, but for two clicks.

If you use locator() with specifying the number of clicks, then it allows any number of clicks (you have to terminate by right-clicking). The length function will return the length of a vector. For example,
x = c(3,4,-1,0,0)
length(x)   # returns 5. 

EXERCISE:  Use locator() without specifying the number of clicks. Click 5 times (and terminate by right-clicking). Then use length to find the number of clicks.

For loop

If someVector is any vector, then
for ( x in someVector ) {
   blah blah using x
}
will carry out blah blah using x for each entry x in someVector. For instance,
myVector = 3,4,7,-9,0)) {
for (x in myVector) {
  print(sin(x))
}
will print the sine of all the values in the vector. Of course, you could have achieved the same effect using just:
sin(myVector)
This latter form is also more efficient. It automatically loops over all the entries. It is called an implicit loop, as opposed to an explicit loop, using for. Explicit loops should be avoided if implicit loops suffice. Explicit loops are needed only when repeating some multi-step complicated computation.

In our case, we want to run a simulation many times. So use a loop like for( i in 1:1000) {...} that will repeat the ... part 1000 times.

EXERCISE:  Use explicit loop to print the squares of the first 10 natural numbers. Then use implicit loop to achieve the same.

Often we need to save the result of the computation after each pass of the loop. To take a trivial example, suppose we again want to compute the sine value for each entry in a vector, but instead of printing the values on screen, we want to save them in another vector, y:
myVector = c(3,4,-9, 0, 3 , 7)
y = numeric(6) # 6 is the length of myVector
for (i in 1:6) {
  y[i] = sin(myVector[i])
}
The line y = numeric(6) creates a numeric vector of length 6, containing only zeroes. You also have used y = rep(0,6) to acieve the same effect. If you want to avoid explicit mention of 6, then you should write y = numeric(length(myVector)) or y = rep(0,length(myVector)).

This trivial example is only for illustration purpose. In this example the following implicit loop is preferable
y = sin(myVector)

EXERCISE:  Pick variable name that is not currently in use, say ttt. Now execute the following code:

for( i in 1:10) {
   ttt[i] = sin(i) 
}
The error message will tell you why we need to start with a line like ttt = numeric(10) or ttt = rep(0,10).

Random experiment

All random experiments with a finite sample space can be performed using the sample function. Here is the most general form that need:
sample(c(1,1.5,2,3),2,rep=T,prob=c(3,2,5,6))
Let`s understand this carefully. The first argument is a vector giving the sample space. Here it is $\{1,1.5,2,3\}.$ This could also be {Male, Female}, in which case the first input to sample would have been c('Male','Female'). The second input is the number of times you want to carry out the experiment. The third input rep=T says that the sampling is done with replacement. The last input gives the probabilities (upto a normalising factor which R will compute automatically).

This is the most general form. In most applications, however, the sample space will be like $\{1,...,n\}$ for some $n.$ Also the probabilities will all be the same. Then we simply write:
sample(n,2,rep=T)
instead of the more elaborate
sample(1:n,2,rep=T,prob=rep(1,n)) # Here rep(1,n) means (1,...,1) n times
On some rare ocassions, we shall need to draw a random sample without replacement, like drawing two persons randomly from $n$ persons (the two persons must be distinct):
sample(n,2)

EXERCISE:  Roll a biased die 100 times, where $P(i) = P(7-i) \propto i$ for $i=1,2,3.$ Do not compute the constant of proportionality explicitly. Find the mean.

The sample function is particularly suitable when the probabilities are all equal, or are specified in some patternless way. If, however, the probabilities vary according to some standard distribution, then you will be better off using some function specific to that distribution. R has most of the standard distributions built in. To carry out those random experiments the function always starts with the letter r followed by an abbreivation of the distribution, like rbinom for binomial, rpois for Poisson, and so on. You need to look up the help of those functions to learn about how the specify the parameter values. For instance, commonly we talk about the Binomial($n,p$)distribution. R calls $n$ the size and $p$ the prob.

The following lines create a barchart based on the relative frequency distribution of a $Poisson(5)$ random sample of size $1000.$
barplot(table(rpois(1000,lambda=5))/1000)

EXERCISE:  Make a similar barplot based on a random sample of size 100 from a Binomial($10,0.3$) distribution. What is the sample mean?

Counting number of cases satisfying some given condition

Suppose that we have a vector, x, some of entries are positive. We may count all those using the following technique:
sum(x > 0)
Details of why it works were given in the R tutorial. Here > is a Boolean operator. Other Boolean operators include: < , <=, > , >=, ==, !=. You may combine Boolean operations using & (and), | (or) and ! (not). For instance, to count all entries lying in (3,10] we use:
sum(x > 3 & x <= 10)
There is another Boolean operator %in% which is the "belongs to" operator. For instance, 4 %in% c(3,6,4,1) returns TRUE, while 6 %in% c(4,5,2) returns FALSE.

EXERCISE:  Roll a die 1000 times. Count the number of times you get an odd number. You may use %in% or you may use the modulo operator & (e.g., 5&3 is 2).

EXERCISE:  I roll two dice 100 times each:

x = sample(6,100,rep=T)
y = sample(6,100,rep=T)
Count the number of times the sum of the two outcomes exceed 7.

Structure of a simulation code

We shall combine the techniques discussed above to create the basic structure of a simulation code. You have a random experiment, and you want to do some math with its outcome, and save the result. You repeat this a large number of times (say $N$) to obtain a long vector of results. Finally look at the results through the lens of statistical regularity, i.e., draw histogram or compute mean, or whatever.

The basic structure is
result = numeric(N)
for(i in 1:N) {
  #Perform th random experiment once
  #Do the math with the outcome
  result = #Whatever you want to save
}
hist(result)
mean(result)
For instance, we may toss a coin 100 times in each pass, and save the number of heads obtained. If we repeat this 10 times, then we shall end up with a vector nhead of length 10, where nhead[i] stores the number of heads in the $i$-th batch of 100 tosses. Here is a code to do this:
nhead = numeric(1000)
for( i in 1:1000) {
   tosses = sample(c("Head","Tail"), 100, rep=T)
   nhead = sum(tosses=="Head")
}
mean(nhead)
table(nhead) # frequency distribution
barplot(table(nhead))

EXERCISE:  A Roulette wheel in a casino consists of a rotating wheel on which a ball rolls. The wheel has some boxes attached to its periphery, marked 0,1,2,...,32. When the wheel stops rolling, the ball drops in one of the boxes. The outcome of this random experiment is the number written on the box. I have bought a ticket priced Rs 100 to play this. I shall get Rs 500 if the number is divisible by 5. Simulate to find my expected payoff per game.

EXERCISE:  Use simulation to approximate the expected value of $(X-\bar X)^3$ where $X\sim Poisson(3).$

EXERCISE:  Use simulation to approximate the expected value of $(X-\bar X)^3$ where $X\sim Poisson(\lambda)$ for $\lambda \in [1,5].$ Make a plot of the expected value against $\lambda.$

[Hint: Take a grid of values of $\lambda$ in $[1,5]$ and apply the last exercise for each.