Kaggle started in 2010 as a platform offering machine learning competitions and has grown to become a leading data science and machine learning community.
It has leveraged its competition roots to offer adjacent services including a hosted platform for Jupyter Notebooks, an active discussion forum, a user-contributed dataset repository and even provides learning material and job postings.
Kaggle was acquired by Google in March 2017, which is usually a sign that they are now a big fish in a big pond.
I was planning to sign-up for Kaggle further down in my data science learning journey, only after I had built up sufficient foundational knowledge and skills. However, I decided to take a quick look under the hood, and found that it was surprisingly easy to get started.
Everyone who signs-up starts out as a Novice.
You then progress up the food chain to become a Contributor, Expert, Master and finally reach the summit to become a Grandmaster.
There are four categories where you can progress and each are tracked individually. The main focus of Kaggle is their Competitions, so that’s where many Kagglers focus their attention on. But you can also gain recognition by submitting interesting Datasets, creating useful Notebooks and contributing in Discussions.
Progress is tracked through a medal system and moving up the ladder is based on the total number of Bronze, Silver and Gold medals you acquire.
Each category has its own definition of Bronze, Silver and Gold. Competition medals are based on how many people take part and where you rank, while Dataset, Notebook and Discussion medals are based on votes by the community for each submission.
I like the way Kaggle has designed their progression system and how they’ve expanded beyond just organising competitions.
For those new to data science, diving into the deep-end by straightaway competing against experienced and practising data scientists may be too daunting, and could be a reason why many are hesitant to even sign-up and get started on Kaggle.
I know that for me, I had planned to join Kaggle only after I’d traveled further down the road. But after understanding what Kaggle had to offer and how they designed their system, I decided that it was useful to participate as early as possible.
And taking the first step up the ladder from Novice to Contributor couldn’t have been easier.
It literally takes at most 20 minutes.
Seriously.
After signing up and providing basic information including a phone number where they can send you an SMS to verify your account, you just have to complete four simple tasks.
“Make 1 comment” and “Cast 1 upvote” are trivial and can be completed in less than a minute. Just go to the Discuss section of Kaggle and fire away.
“Run 1 kernel” just requires you go to the Notebook section, create a new hosted Jupyter notebook, write some simple code and run it. Since I had already created a basic notebook to test my Anaconda installation, I simply cut-and-pasted the five lines of Python code into Kaggle and ran it there. Two minutes, tops.
“Make 1 competition or task submission” sounds a bit more daunting, and I believe it’s the step that’s holding back many Novices from progressing to being Contributors. On the surface, it sounds like you’re expected to participate in an actual live competition. But, in reality, it’s actually much easier. You’re just expected to submit one set of results for any competition, including the “Titanic: Machine Learning from Disaster” practice competition.
Start by playing the 6:36 minute introductory YouTube video by the infectiously enthusiastic Rachel Tatman, Kaggle Data Scientist. Watch her video and you’ll know what I mean.
And then follow along the well-written Titanic Tutorial by Alexis Cook, Head of Kaggle Learn.
In “Part 1: Get started”, you’ll get to the section “Your first submission” within five minutes, where she guides you on how to download and submit a sample competition results file. Which happens to allow you to fulfill the “Make 1 competition or task submission” requirement.
Once you do that, your profile will now be upgraded to Contributor status across all four categories!
The right navigation bar on the Home page will also start tracking the medals you’d need to progress to the next level for each category.
You should of course continue with the Titanic Tutorial and aim to reach as high a score as possible, but submitting your first set of results would already have moved you one tier up the progression ladder.
Every additional move you make after that will get you one step closer to being a Grandmaster and topping the ranking leader boards.
If you add up the Competition numbers across all tiers from Novice to Grandmaster, you can infer the following distribution.
Tier | Number | Percentage | Percentile |
---|---|---|---|
Grandmaster | 188 | 0.1% | 99.9 |
Master | 1,449 | 1.0% | 98.9 |
Expert | 5,815 | 4.1% | 94.8 |
Contributor | 55,777 | 39.0% | 55.8 |
Novice | 79,869 | 55.8% | – |
Total | 143,098 | 100.0% | – |
Some interesting observations:
- More than half of the 143,098 people who have signed up to Kaggle don’t spend the extra 20 minutes of effort to progress from Novice to Contributor
- Reaching Expert puts you among the top 5.2% tier of all Kagglers
- Reaching Master puts you among the top 1.1% tier of all Kagglers
- Reaching Grandmaster puts you among the top 0.1% tier of all Kagglers
This is of course over-simplifying things when it comes to actual rankings, because Kaggle has designed a ranking system using points that decay over time. Rankings are provided only for those with Expert and above tiers.
The good news is that Kaggle will display both your current and highest-ever achieved rank on your profile, so you’ll be able to maintain some recognition even if you’ve been inactive over an extended period of time.
So, what’s next for me after reaching the Contributor tier?
The obvious one is to continue with the Titanic Tutorial and improve my classification model to achieve the best competition result possible.
After that, there are two additional “Getting Started” competitions that I can continue practising on: Housing Prices: Advanced Regression Techniques and Digit Recognizer.
One step above in difficulty will be the “Playground” competitions targeted at newcomers, for example: New York City Taxi Trip Duration, Dogs versus Cats and Leaf Classification.
In parallel, I’ll also start exploring the material in Kaggle Learn and weave them into my self-training curriculum.
A meaningful milestone to aim for will be to take part in actual competitions and earn at least two Bronze medals in order to progress to the Expert tier. Let’s see how long that will take me.
Definitely a lot of interesting things to explore within Kaggle, and I’m glad that I brought forward the Kaggle checkpoint of my learning journey.
All image credits: Kaggle