One of the first tasks is to set up a programming environment on my personal laptop so that I can start coding as quickly as possible. For data science purposes, you can’t go wrong choosing the Anaconda Individual Edition.
It’s open source, free for individuals, has more than 20 million users worldwide and comes with all the key components and packages needed to get started in less than 30 minutes.
Installing Anaconda is very straightforward, just choose your platform (Windows, MacOS or Linux), download the installer and choose the default options during installation.
After the installation is complete, you can launch Anaconda Navigator and get started immediately. But it’s worth the time to view the “Individual Edition Tutorial” created by Anaconda in their “Getting Started” page.
You’ll have to provide your email address and some personal information, and Anaconda will send you a link to an online recorded webinar. During the short 11:31 minute tutorial, you’ll get an introduction to, and demos of, Anaconda Navigator, Jupyter and Conda Package Manager.
Anaconda Navigator is a simple GUI that allows you to access and manage the various applications and open source packages that come pre-installed with the distribution. This includes integrated development environments (IDE) like Jupyter, Spyder and RStudio and packages like pandas, scikit-learn, matplotlib among numerous others.
Jupyter provides a notebook-like interface for you to code and display output (including visualisations) in a narrative manner that is conducive to describing and explaining results to third-parties, and hence my IDE of choice. It’s not as good in terms of debugging functionality as traditional IDEs, but it gets the job done.
Conda Package Manager allows you to find, install and manage the thousands of software packages available for download. It includes functionality to set up different environments for different projects, and also granular control of specific versions of programming languages and packages to use in each environment.
Here are some screenshots from the tutorial, but it’s best if you watch and listen to the full tutorial.
As you watch the tutorial, it would be useful to follow along and replicate each step, especially the part from 03:40 to 05:04 where a new Jupyter Notebook is created and sample Python code is entered and executed to read a CSV file (using pandas), display the first few lines of the data and generate a scatterplot (using matplotlib).
In addition to writing your first “data science program”, it also serves as a check that your installation was successful and your programming environment was set up properly.
And that’s all you need to get started — 5 minutes to download and install Anaconda Individual Edition, 12 minutes to watch the tutorial and 3 minutes to create a Jupyter Notebook and replicate the sample Python code, which adds up to a total of 20 minutes.
So why did I say 30 minutes earlier?
Well, the extra 10 minutes was to figure out how to fix an annoying issue with the display resolution of Anaconda Navigator. You may or may not face the same issue, but if you do, you’ll know what I mean.
When I first launched Anaconda Navigator, the default display settings resulted in parts of the screen being cut-off i.e. display was too big. I will name this the Papa Bear setting. The developers built in a switch (File > Settings > Enable high DPI scaling) which could be unchecked, but doing so made the display too small i.e. Mama Bear.
Apparently this has been a long-standing issue, but the Anaconda developers response was: “Thank you for reporting this issue. Unfortunately, this is unlikely to be fixed soon in Navigator itself. However, you can do most everything you can do in Navigator on the command line.”
It took me about 10 minutes to Google around to see if anyone else faced the same problem, figure out how they managed to solve it and to test it out myself.
Basically, the workaround was to keep the “Enable high DPI scaling” setting checked (i.e. the default setting), exit and then re-launch Anaconda Navigator not by clicking on its icon in Windows, but by using Anaconda Prompt to set a display variable first (QT_SCALE_FACTOR) and then launching Anaconda Navigator via command line. I found the 0.8 value by trial-and-error and seeing which value fit my screen better.
This happens to be a good example of how to find solutions (or workarounds) to future coding issues that you will inevitably face, and to appreciate the challenges when using open source software.
Don’t get me wrong.
Open source software and packages are amazing and provide the common person ready access to feature-rich and battle-tested code that would otherwise take too long to build individually.
But it does require patience and the acceptance that sometimes, it is what it is. And that, my friend, is how a lot of code in the world is written.
Image credits: Anaconda