Setting Up the Python Data Science Environment

The data science quest begins with a single step.  Mine started by installing the right software.

If you want to start exploring radiology informatics in Python, you’ll first need to set up your environment. Anaconda by Continuum is free and one of the best ways to get started.

Anaconda

Anaconda is a package management software for data science, making it extremely easy to get your computer set up the right way to get started. Detailed download instructions are here, but I’ll provide an abbreviated set of steps here.

First, download the appropriate version for your operating system. Go through the steps to install.

I chose the Python 3.5 version for 64-bit Windows. You should pick the one that works for your computer.

img_57a4da7016d98

Let’s install this just for the local user for now.

img_57a4da65adf8a

Both adding to PATH and setting Anaconda as default Python are good ideas. However, seasoned Python developers may have their own preferred settings.

You have now just set up a powerful data analysis platform! Anaconda is best thought of as a package management system for data science. Through it you can keep all your analytic packages up to date in a centralized fashion.

Jupyter

I highly recommend the Jupyter notebook for data science because it keeps the source code and the output in the same place and allow you to share them easily. You’ll see that many posts in this blog are written using Jupyter as the backend.

First, let’s launch Anaconda Navigator. I’m assuming a Windows 10 environment, but with minimal adjustments this works just as well in Mac or Linux. In Windows you can find this on the Start Menu.

Anaconda comes with a lot of very cool tools, but we’ll be working with Jupyter. Click Launch to run Jupyter.

After a command line runs some initialization processes, a web browser will launch. Jupyter is now up and running!

Jupyter runs a local web server and uses a local browser as a way to access it. This might seem a little awkward at first, but it has many advantages, including cross-platform compatibility, minimalist design, and remote access with minor configurative changes.

You are now all set to start some analytical goodness! (What, did you think it was going to be more complicated than that?)

Howard Chen
Associate Informatics Officer at Cleveland Clinic Imaging Institute
(Howard) Po-Hao Chen, MD MBA is the Associate Informatics Officer at the Cleveland Clinic Imaging Institute and a musculoskeletal radiology subspecialist. He has an interest in data-driven radiology, quality improvement, and innovation. Howard has an MD and MBA from Harvard University, and he finished training with fellowships in musculoskeletal radiology, nuclear medicine, and clinical imaging informatics in June 2018 from University of Pennsylvania.

2 Responses to “Setting Up the Python Data Science Environment

  • Thanks for sharing this great tutorial. It will be very helpful for Data Scientists around the globe who want to use the Python programming language for their data science projects.

Trackbacks & Pings

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.