Each group of 5 or 6 students will do a single project. They have
to either come up with their own project idea (to be ratified by
me), or they have to choose one from the following list of
projects. Same project may be chosen by more than one group. You
may use R or Python or Java or C or C++. No project specific
library is allowed.
Each project group should submit a single report in pdf
along with code etc (separately via email/github link/etc). The
report must contain exploration of the data. There
will be a project presentation (about 20min to 30 min per project
group), where each member has to present a part. No separate viva
(except questions from the classmates and me during the presentation).
The deadline for submission of project is the last
day before the semestral exam week starts. Submit report (problem
description, algo, findings etc) and code. Do not include code
in your report.The projects are often tough, while your expertise in
statistics is still rudimentary. So do not despair if your
results are not satisfactory. Your performance will be judged in
terms of how much sincere effort you have put into it. Doing a
Google search, and implementing a sophisticated algo will fetch
less credit than trying on your own and possibly coming up with an
inferior technique.
Chroma-keying: See
this fun
video to understand chroma-keying or green-screening as
it is popularly called. Your task will be to work with still
photos with a green (or whatever colour you wish) background,
and to remove it keeping the foreground as intact as
possible. The photos used MUST be taken by the project group
(not downloaded).
Denoising audio: Record yourself talking in a room
with mild background noise (e.g., with an electric fan or noisy
AC on). Ocassionally pause during talking. Your job is to
automatically process the recorded to audio to identify these
pauses, and to make the pauses absolutely silent.
Classifying iris flowers: Come up with the ``best
possible'' classifier that will classify the iris flowers using
Anderson's iris data. Coming up with a reasonable definition of
``best possible'' is a major part of the challenge.
Missile launch planning: This impressive project is
somewhat marred by the fact that the missile involved is just a rubber
band. The launcher consists of a ruler along which the band is
pulled to some known extent before being released. Your job is
to collect data and come up with a formula that tells you how
much to stretch in order to reach a given distance. You must
also give some error bound.
Distance estimation: This one will require an app
that you need to download from here
(updated for Android 11+). The app records the ambient audio
for a fixed period (say 2 secs), and then shows the standard
deviation of the values. Hopefully that is the loudness of the
sound. Now stand at different known distances from a noise source, and
see the value shown on the app. Your job is to come up with a
formula to estimate the distance from any given app reading. You
must provide some suitable error bar.
Skew angle estimation: Any QR code or bar code reader has to
do a skew angle estimation, i.e., estimate the angle at which the object is
tilted w.r.t. the camera. Take a page of parallel lines, and
photograph them from different angles. Try to come up with an
estimation technique. Check by rotating the image by that angle.
Develop an online casino: Learn about at least 5
different gambling schemes used in real casinos, and implement
them using simulation. There should be an option to compute the
long term average and standard deviation for each of them.
Rifle shuffle: Simulate riffle shuffle using the
model done in class. Then plot the probabilities of various
events as the number of shuffles increase to see how many
shuffles are needed to reeach complete mixing.
Shannon's mind reading game: Claude Shannon, the
father of informaton theory, is credited with inventing this
magic trick. An ordered deck is rifle shuffled thrice. A card is
picked at random, and inserted back randomly. The deck is then
handed back to the magician, who finds the card. The trick is
that there will be just one singleton rising sequence with a
``high probability''. This singleton must be the chosen
card. Use simulation to find the probability of success of this
trick.
Separating a mixture:
Consider this random experiment: toss a fair coin, if head, then
generate from $N(\mu_1,\sigma^2_1)$ else generate
from $N(\mu_2,\sigma^2_2).$
The data consist of IID outcomes of this random experiment. Devise some way to estimate
$\mu_1,\mu_2,\sigma^2_1$ and $\sigma^2_2$ from the
data. Implement in R.
Applying discrete arc sine law for sojourn times on various sports