Checkpoint Alfa: Theoretical foundation

Mathematics and statistics are the core foundations of data science theory. There are numerous topics to cover and I’ll start with those listed below first, with additions along the way as required. A lot of the material is already familiar to me, but there are some topics, like Bayesian inference, that are new and will require more study.

Linear Algebra

You cannot mention linear algebra and not immediately think of Professor Gilbert Strang from the Massachusetts Institute of Technology (MIT). For the three of you that have never heard of him, here’s a seven minute interview recorded in August 2019 to get to know him better.

MIT has kindly made his entire undergraduate 18.06 Linear Algebra course available on its Open Course Ware (OCW) website. In particular, the video lectures are especially useful.

Fair warning though, there are 34 lectures with an average duration of 45 minutes, so that’s a 26 hour commitment just to watch all the videos. To put this in context, it’s just slightly longer than watching all six movies from The Hobbit and The Lord of the Rings trilogies.

Piece of cake? Well, maybe two pieces.

Multivariable Calculus

Sticking with MIT OCW, why not go through the 18.02 Multivariable Calculus video lectures as well? 35 lectures x 45 minutes means another 26 hours of exciting mathematics. Alternatively, you could watch all ten Star Wars movies back-to-back, with time in between to eat, shower and even take a nap.

Optimisation

Changing scenery and moving west to Stanford University, we find video lectures by Professor Stephen Boyd on Convex Optimisation. I’ve watched the first few lectures and Professor Boyd seems like a funny guy, and that always helps get me through heavy material.

Part I comprises 19 lectures x 75 minutes = 24 hours, Part II comprises 18 lectures x 75 minutes = 23 hours, for a grand total of 47 hours. Which is probably how long it would take to binge watch all movies from the Marvel Cinematic Universe.

Exploratory Data Analysis (EDA)

EDA is an approach of summarising the main characteristics of data to gain understanding and insight.

This involves calculating basic statistics like mean, and standard deviation, visualising distributions using histograms and scatterplots and much more.

Many of these techniques are obvious, but in order to pick up best practices, there is no better start than to learn from the person who pioneered the field — John Tukey. Amazon has limited quantities of his 1977 classic text, so act fast.

Bayesian Inference

I’m new to the Bayesian world and googling around seems to indicate that “Bayesian Data Analysis” by Andrew Gelman et al is the book to read. But a large number of hits also point to “Doing Bayesian Data Analysis” by John K. Kruschke, affectionately known as the “puppy book”. I know which one I’ll read first!

Clearly a lot of material to go through and absorb, but building a strong theoretical foundation is always a good start to a learning journey.