NLP Basics – Preparing Radiology Report for Tokenization
We all want order in life, but working with radiology reports computationally sometimes is like giving an Easter bunny to a baby: it gets really messy really fast.
Indeed, radiology reports are structurally messy. There is a lot of structured metadata that tells you about the examination, but ultimately the radiologist’s knowledge is encoded in one long stream of text. Depending on the practice, reports are often organized in as many ways as there are radiologists in the practice.
Natural language processing (NLP) is the art of turning this mess into insight.
Diagnostic radiology reports are considered unstructured data, and one of the first steps to gain insight from any diagnostic radiology report is to figure out its structure. With an annotated structure, the radiology report is like a box of chocolates: when you know what you’re gonna get, it’s just much better.
In this quest, we will go over techniques you can employ in data analytic projects to automatically extracting the history, finding, and impression (or any other section). This is the first step towards being able to analyze specific sections of the text. You can download the CSV file used in this quest and follow along.