Azure Notebooks in Preview Status – And It’s Pretty Awesome

The hardest thing learning data science is getting started. The hardest thing about starting is learning how to set up all the tools. What if you could get it all set up, get storage space, computing time, and programming environment, all for free?

Microsoft missed the mobile devices boat (Surface is making a real comeback for sure), but they are determined not to make the same mistake with cloud.

The Redmond-based technology giant is investing seriously in data analytics, machine learning, and the data revolution.  Microsoft is getting in the game early, and it is doing well, at least according to Forrester. Their latest offering?  A free data analytic platform for the masses.

Stemming from the excellent IPython project, Jupyter is an open source web-based notebook platform that has since built support around other data analysis languages such as R, Scala, and supports platforms such as Hadoop and Spark.  For the Python-minded data scientist, Jupyter is one of the most powerful tools (and here’s how to install it on your computer). A Jupyter notebook is not unlike other standard notebooks such as OneNote or Evernote: you can write down standard text and insert images where appropriate, and the software takes care of the backend. Jupyter takes the concept further: you can send blocks of code into a “kernel,” and the notebook will retrieve the output and place it back in the notebook, even if it is a graph, table, or nonstandard text output. If you care more about manipulating data than you care about the actual organization of the source code – as most scientists are – Jupyter makes an excellent choice.

The difficulty with setting up your own computing platform is two-fold:

  1. You have to install all the pre-requisites yourself.  Jupyter may be a powerful development platform, but you need to install Python and manage its packages separately.  There are ways to make the process easier, but you are ultimately responsible for the inner plumbings of the computing system.
  2. Your platform is stuck on your computer.  Jupyter is great for collaboration, but it requires proper configuration which usually ends with either exposing your computer IP address to the world-wide web or investing in a web server

Azure Notebooks are essentially cloud-based Jupyter notebooks, but this alone means a great deal.  Microsoft Azure will take care of the maintenance and package installation (pandas, nltk, ‘scikit-learn’, among others, come pre-installed). These notebooks are not bound to your computer and are accessible on any web browser, making it a great tool for project collaboration. All of this – for the convenient price point of free.

I have not attempted to run very large datasets on Azure Notebooks, although I suspect production-level big data analysts will find the free offering insufficient. For the rest of us who explore datasets and create visualizations for scientific and academic work, Azure Notebooks can save a lot of headache. I find the notebooks responsive and the environment very similar to a standard Jupyter running off the local computer. Microsoft also ffers a variety of sample notebooks exploring tools and features I did not even know existed.

All of the quests in Python and some of those in R in Radiology Data Quest exercises can be explored through a Jupyter Notebook, so you no longer have an excuse – dive in and geek out with us!

Howard Chen
(Howard) Po-Hao Chen, MD MBA is a radiology chief resident at Hospital of the University of Pennsylvania. He has an interest in data-driven radiology, quality improvement, and innovation.

Howard will finish training with fellowships in musculoskeletal radiology and nuclear medicine in June 2018 from University of Pennsylvania.

One Response to “Azure Notebooks in Preview Status – And It’s Pretty Awesome

Leave a Reply

Your email address will not be published. Required fields are marked *