Quick Tutorial for Jupyter

Jupyter is a powerful data analytic notebook that runs in a browser.  Previously, I went over how to set one up as part of the Anaconda package.

In this post we will explore the various buttons and options you can do in Jupyter.

Creating a new notebook

Once launched, you may see some sample notebooks (depending on your installation), as well as a set of buttons.

 

Let’s create a new folder in the current directory to house our notebooks.  Click on New and select Folder

 

The new folder will show up as “Untitled Folder,” so check the box next to its name and Rename it to My Projects

Click on the folder’s title to enter it.  Here we can create a new notebook in any language of our choosing.  We’ll be creating one using Python 3.

Aside: It is possible to install more programming language support in Jupyter. Jupyter uses “Kernels” which allow you to install support for additional programming languages.  For instance, the IRKernel adds support for R.  In the future when you become more comfortable, you can create a variety of notebooks like so.

Your First Python Program

The jupyter notebook is useful in that you can start writing code immediately without worrying about many of the logistical points.

Let’s write a simple program that prints out your name.

Type in the following code into the box:

my_name = "Howard"
print ("Hello", my_name)

And hit Ctrl-Enter

Making Edits

So my name is Howard, but yours is probably not.  To quickly modify the program and see the new outputs, change Howard to your name and then hit Ctrl-Enter.

You should see an immediate change in the output.  Jupyter allows you to rapidly change code and see its effects.

You can also click on the “Untitled” title of the notebook to change it to something else.

Your First Data Plot

Jupyter also allows you to integrate visualizations directly into the notebook.  Let’s make one right now using a package called matplotlib.

Create a new “paragraph” either by clicking on the + button or just start typing in the box below your first paragraph.

Then, type in this code:

%matplotlib inline
from matplotlib import pyplot as plt
# These numbers I just made up for this exercise.
years = [1950, 1960, 1970, 1980, 1990, 2000, 2010]
cxr_vol_in_thousands = [10.2, 18.3, 31.9, 92.5, 180.6, 343.7, 485.3]
# create a line chart, years on x-axis, volume on y-axis
plt.plot(years,
         cxr_vol_in_thousands,
         color='blue',
         marker='o',
         linestyle='solid')

As you start typing these more sophisticated code you will notice some of the more subtle editing functions Jupyter provides such as syntax coloring and auto-bracketing.

In the later posts we will go over how to use the specific commands.  This post focuses on getting you familiar with basic functions in Jupyter.

Notice that the notebook keeps all the old code you entered and incorporates both the new code and its graphical output all in one place.

Conclusion

Jupyter is a powerful web-based notebook that is indispensable for the data explorer and data scientist.  I highly recommend using it for rapid data exploration.  This blog focuses on radiology informatics and data science so will only peripherally explore these tools and the programming language in favor of digging into the data (the fun part).  If you want to dig deeper into Jupyter, consider the full documentation.

Howard Chen
(Howard) Po-Hao Chen, MD MBA is a radiology chief resident at Hospital of the University of Pennsylvania. He has an interest in data-driven radiology, quality improvement, and innovation.

Howard will finish training with fellowships in musculoskeletal radiology and nuclear medicine in June 2018 from University of Pennsylvania.

Trackbacks & Pings

Leave a Reply

Your email address will not be published. Required fields are marked *