Picking up Python through Kaggle Learn

One unexpected benefit of joining Kaggle was the discovery of an introductory Python course on Kaggle Learn. It’s free to use and consists of short tutorials and hands-on notebook exercises that highlight the key aspects of the language.

The course has a focus on data science applications and is targeted at those with some prior coding experience.

There are a total of seven lessons and they can be comfortably completed within a day, which means that they are not a comprehensive deep-dive into the language. They do, however, provide a good starting foundation to acquire more Python coding experience and skill.

The lessons, created by Colin Morris, Kaggle Data Scientist, are:

  • Hello, Python: A quick introduction to Python syntax, variable assignment and numbers (Tutorial, Notebook)
  • Functions and Getting Help: Calling functions and defining our own, and using Python’s own built-in documentation (Tutorial, Notebook)
  • Booleans and Conditionals: Using booleans for branching logic (Tutorial, Notebook)
  • Lists: Lists and the things you can do with them, including indexing, slicing and mutating (Tutorial, Notebook)
  • Loops and List Comprehensions: For and while loops, and a much loved Python feature: list comprehensions (Tutorial, Notebook)
  • Strings and Dictionaries: Working with strings and dictionaries, two fundamental Python data types (Tutorial, Notebook)
  • Working with External Libraries: Imports, operator overloading and survival tips for venturing into the world of external libraries (Tutorial, Notebook)

There were a few aspects of Python that I found particularly interesting.

List Comprehensions

For example, the user-defined function below uses “for” loops.

def count_negatives(nums):
    """
    Return the number of negative numbers in the given list.
    >>> count_negatives([5, -1, -2, 0, 3])
    2
    """
    n_negative = 0
    for num in nums:
        if num < 0:
            n_negative = n_negative + 1
    return n_negative

The same functionality can be achieved using list comprehensions, resulting in more compact code.

def count_negatives(nums):
    return len([num for num in nums if num < 0])

Or alternatively,

def count_negatives(nums):
    # Note: Python calculates something like True + True + False to be equal to 2.
    return sum([num < 0 for num in nums])

String Manipulation

Functions like str.split(), str.rsplit() and others allow for string manipulations like the example below.

Notice the useful ability to chain functions like word.rstrip(‘.,’).lower() that removes ‘.’ and ‘,’ from each word and then converts them to lower case, all within the same line of code.

def word_search(doc_list, keyword):
    """
    Takes a list of documents (each document is a string) and a keyword. 
    Returns list of the index values into the original list for all documents 
    containing the keyword.

    Example:
    doc_list = ["The Learn Python Challenge Casino.", "They bought a car", "Casinoville"]
    >>> word_search(doc_list, 'casino')
    >>> [0]
    """
    found_index = []
    for doc in doc_list:
        words = doc.split()
        processed = [word.rstrip('.,').lower() for word in words]
        if keyword.lower() in processed:
            found_index.append(doc_list.index(doc))
    return(found_index)

Note again the use of list comprehensions which allow for compact, yet still readable, code to be written.

processed = [word.rstrip('.,').lower() for word in words]

The use of str.split() and str.join() enable date formats to be changed easily, although the unusual syntax for str.join() needs some getting used to.

datestr = '1956-01-31'
year, month, day = datestr.split('-')
'/'.join([month, day, year])

>>> '01/31/1956'

The building and formatting of complex strings is facilitated by the use of str.format(), as can be seen in the example below.

pluto_mass = 1.303 * 10**22
earth_mass = 5.9722 * 10**24
population = 52910390
#      2 decimal points   3 decimal points, format as percent     separate with commas
"{} weighs about {:.2} kilograms ({:.3%} of Earth's mass). It is home to {:,} Plutonians.".format(
planet, pluto_mass, pluto_mass / earth_mass, population,
)

>>> "Pluto weighs about 1.3e+22 kilograms (0.218% of Earth's mass). It is home to 52,910,390 Plutonians."

Dictionaries

Dictionaries are a built-in Python data structure for mapping keys to values. A simple example from the tutorial is shown below, where ‘one’, ‘two’ and ‘three’ are keys and 1, 2 and 3 are their corresponding values.

numbers = {'one':1, 'two':2, 'three':3}

Instinctively, I feel that dictionaries would come in very handy in storing, manipulating and accessing complex data. The example below, also from the tutorial, hints at the usefulness of this data type and a deeper dive is needed to better understand its full potential.

planets = ['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune']
planet_to_initial = {planet: planet[0] for planet in planets}
planet_to_initial

>>> {'Mercury': 'M',
 'Venus': 'V',
 'Earth': 'E',
 'Mars': 'M',
 'Jupiter': 'J',
 'Saturn': 'S',
 'Uranus': 'U',
 'Neptune': 'N'}

There are 13 other courses in the Kaggle Learn catalogue, ranging from machine learning to data visualisation to natural language processing.

Based on my experience so far in taking the Python course, I believe that the other courses will provide a baseline working knowledge of key data science topics. They are a practical and efficient way to kickstart any learning journey, and I plan to complete as many of them as possible.

Thank you, Kaggle Learn!

One comment

Leave a Reply

Your email address will not be published. Required fields are marked *