[Home]

Table of contents


How I make my educational videos

Overview

I use my Samsung Galaxy J7 Nxt mobile phone to shoot them. I do not use any separate audio recorder (because I do not have any). I use only the front camera, because it has fixed focus. I mount the phone on a tripod in landscape orientation while shooting. The tripod is an invaluable tool.

I use a variety of free tools to create the superimposed pictures. These are Inkscape (for 2D drawings), GIMP (occasionally for touching things up), ArtOfIllusion (for photorealistic 3D animations/stills). These are all artistic softwares, and often are not enough for the mathematical objects I need to create. Then I write own program to create the basic shapes, which are then loaded into one of these for the artistic finishing touches. My programs are mostly written in R and J. All the screencasts are done in Kazam (no audio, since my laptop audio input does not work).

After shooting the parts I use the free software kdenlive to do the post-processing.

Details

My videos are nowhere even close to what a good educational video should be. Even then, the process to produce them is somewhat intricate. I shall split up the details in a number of sections.

Design

A good educational video, in my opinion, should have 5 characteristics. I decided upon these after comparing various online educational videos (BBC, Khan Academy, ThreeBlueOneBrown, FilmMakersIQ and others). These guidelines are for educational video makers like myself, who have not much control on the topic to be presented, and cannot afford to get expensive visuals (a close up video of a blue whale deep in the Pacific, does not need my guidelines to make itself popular!).
  1. The video should show the presenter talking to the viewer. This is because human facial expressions and gestures constitute one of the most potent languages known to mankind. What is more, most of us actually enjoy reading this language. Simply by talking right into the camera, the presenter can convey so much more information than is possible with voice alone. If there is one reason why Khan Academy videos suck, while BBC videos don't, this is it. Also, it is not the beauty of the presenter's face that attracts the viewer. It is the natural gesture that goes with normal oral communication (eye contact, hand movement, facial expressions) that count. So a screencast with a talking face looking not at the audience does not help at all.
  2. The video must show variation. The video occupies only an insignificant small rectangle in front of the viewer. There are plenty of interesting things happening before his eyes outside that rectangle: birds flying, trees moving, people moving around. The content of the video has to compete with all these distractions to keep the viewer's attention glued to itself. Nobody like to stare at some dumb old scene all the time. So it is important to vary the visuals.
  3. The video must be well punctuated. Just as a book should have chapters and sections and subsections and bulleted items for easy navigation, so should an educational video. Suitable audio jingles, fades and change of background should be used for this purpose.
  4. A video must be restful to the eyes. Educational videos like the ones shown in threeblueonebrown are based on catchy themes that naturally attract the viewer. But when we want to cover a typical coursework in a video, you cannot always rely on the topic being catchy. Often you'll need to drag your audience through details that are not so appetizing. When you suspect that this is the case, your video must move slowly, allowing longer fades, holding the visuals longer on screen, and occasionally injecting soothing visuals like birds chirping.
  5. A video must use Foley and reverse-Foley effects. Foley effects means "you hear what you see". If a paper is crumpled, you must hear the corresponding sound. Most often the real sound is so feeble, that it is not audible. Then a similar sound must be added later during editing. This apparently silly thing adds unbelievably to the charm of a video. However, even more important for an educational video is what I have called the reverse-Foley effect: the audience must see what they hear. Information flows to the viewer via two channels, audio and video. It is too easy for the presented to get carried away by a topic, and just keep on talking. Then more information is channeled via the audio and the video merely shows the presented standing. Such a skewness puts strain on the audience leading to quick fatigue. A good video must distribute the info more or less symmetrically over the audio and video channels.

Three types

AN educational video typically uses three types of parts based on the denseness of the information presented: For the first type, it is enough to show the presenter in close up (or medium shot to show hand gestures). No visual other than the body language is generally called for.

For the second type, the best way is to show the presenter in front of a black/white board, and record the entire teaching session.

The third type is the most interesting, and offers the maximum in terms of what an educational video can do. The presenter keeps on talking while various diagrams/ animations etc are superimposed in a synchronized manner. This is where the editing phase becomes tough. But if you can manage it, it is surely worth it. Indeed, the aim of an educational video maker should be to avoid the second type as much as possible, and replace it with the third type.

Scripting

Making a video is a somewhat long process. So careful planning or scripting helps. This includes what to say, what to show, deciding upon camera angles and backgrounds, etc etc. While scripting may look like a good idea (and professional video makers cannot think of video making without a script), it is nevertheless is a source of trouble in itself. But still making a script has its definite advantages. The most important of these is the ability to do "coverage shoot" that I shall discuss later. But all in all, I generally prefer to work without a script.

I start by chalking out the following points:

Shooting

The shooting part, surprisingly, is the least troublesome phase of the whole process of making an educational video. It is done almost in real time (e.g., if I am shooting a 30 min video, I'll typically finish the shooting in 45 min or so). This is partly because I use spontaneous flow instead of following a script. So the experience is more like acting on stage and not acting in a movie. The speed of the shooting owes part of its origin to the fact that the fan has to remain switched off during the shoot (for the sake of audio quality), and I do not enjoy sweating in front of the camera.

Here is how I do the shooting.

First I mount the phone on the tripod. Landscape mode, front camera. Then I think about what I want to say in the first shot. I keep it simple: one one idea per shot. If the idea changes, then so must the shot. That provides a natural punctuation. I look into the camera (hard to do, as my eyes like to look at my image on the phone, and not at the camera lens, which is a tiny inconspicuous dot near the margin). I stand pretty close to the camera, so that my voice is clearly picked up. This causes my head to look more bulbous than it actually is (Hey, now you know that I am much more handsome than I look in the videos).

Once I believe that I have finished my first idea, I stop recording (stop, not pause, because I like to keep my files small, as it helps me avoid loading problems later).

Once my first shot is over, I quickly think about a natural continuation to the next idea, and shoot that in a different setting (different camera angle, different corner of the room etc). I think that I should use some 4 different settings all through the video. Each setting should be for one type of ideas, e.g., motivation, derivation, critical thinking. But I have not tried out this type-to-setting mapping in any of my videos yet.

Coverage shoot

This is a smart idea that I learned while working before I learned the term "coverage shoot". It means shooting the same thing from multiple angles. Then later you might mix those different angles during editing. You see this all the time in movies: a dialogue between A and B is partly shown over A's shoulder, partly over B's shoulder, and partly from the side. However, in an educational video of the type discussed here, the presenter has to look straight at the audience (i.e., into the camera) all the time. That does not leave much scope for exciting coverage shoots. There are two exceptions:

Screencasts

I do all my screencasts using the free software Kazam. Since my laptop audio input does not work, I record the audio separately using the phone. In fact, here I do something like a coverage shoot. I keep the video camera of the my phone focused on me, while I run Kazam on my laptop. Then I say "O..K" very clearly, while typing the letters on screen. While editing these help me to sync the camera video with the screencast.

Also, since I have both footage of my face and the screencast, I can move between them during editing. But I have found that making the screencast 50% transparent and superposing it on the footage of my face keeps the best of both worlds.

PIPs

PIP (or Picture In Picture) stands for all the little pictures/animations are superimposed on the main video. They are what makes an educational video stand out. They are are essential for the discussion shots. I prefer to use PIPs with transparent background and 50% transparency. This, I believe, makes them merge better with the video, and also save screen space as I need not find a separate screen space for myself. I use different techniques to suit different requirements: ArtOfIllusion is an artists' program. Often I need to model objects that are complicated but have simple mathematical descriptions. One example is the soap film frame (or the film itself). Then I first create an .obj file, which is a super simple format for describing 3D objects. It is just a list of points followed by a list of triangular faces. Such a file may be created using a text editor (or output from R or J). Then I import it inside ArtOfIllusion, and choose the artistic details (camera angle, lighting, texture, colour etc).

Editing

I use the free software kdenlive for all the editing. Editing for me mostly means dumping all the shots on the timeline, and chopping off the extra bits at the two ends (where I extend my hand to stop recording). Occasionally, I need to remove a little coughing or faltering. Since my distance from camera is not always the same, the audio volume tends to differ from shot to shot. I manually adjust the volume to achieve consistency.

This much is pretty easy. The hard part is to insert the PIPs.

For this I first play a shot in my editor, carefully marking out all the time points where PIPs are to be added. So I get a list of time points together with brief descriptions of what I intend to put there. Then I write an R program to generate all the I generally steer clear of special effects, because they are rather time consuming and end up creating an alienated environment that is not desirable in an educational video. Here are the few special effects that I do use occasionally: