NLP Basics – Preparing Radiology Report for Tokenization

We all want order in life, but working with radiology reports computationally sometimes is like giving an Easter bunny to a baby: it gets really messy really fast.

A metaphor where chocolate, babies, and NLP somehow all fit in the same sentence.  Source: Twistitle.com

Indeed, radiology reports are structurally messy. There is a lot of structured metadata that tells you about the examination, but ultimately the radiologist’s knowledge is encoded in one long stream of text. Depending on the practice, reports are often organized in as many ways as there are radiologists in the practice.

Natural language processing (NLP) is the art of turning this mess into insight.

Diagnostic radiology reports are considered unstructured data, and one of the first steps to gain insight from any diagnostic radiology report is to figure out its structure.  With an annotated structure, the radiology report is like a box of chocolates: when you know what you’re gonna get, it’s just much better.

Annotated box of chocolates. Sorry, Ma Gump – sometimes you do wanna know what you’re gonna get.  Source: Collegehumor

In this quest, we will go over techniques you can employ in data analytic projects to automatically extracting the history, finding, and impression (or any other section). This is the first step towards being able to analyze specific sections of the text.  You can download the CSV file used in this quest and follow along.

To follow along, set up your computer using the following Python tutorials: Alternatively, start a free Jupyter notebook from Azure Notebooks.
Howard Chen
Associate Informatics Officer at Cleveland Clinic Imaging Institute
(Howard) Po-Hao Chen, MD MBA is the Associate Informatics Officer at the Cleveland Clinic Imaging Institute and a musculoskeletal radiology subspecialist. He has an interest in data-driven radiology, quality improvement, and innovation. Howard has an MD and MBA from Harvard University, and he finished training with fellowships in musculoskeletal radiology, nuclear medicine, and clinical imaging informatics in June 2018 from University of Pennsylvania.

One Response to “NLP Basics – Preparing Radiology Report for Tokenization

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.