Last month Rob High, the CTO of Watson, announced on his blog a new package that allows users to leverage some of Watson's tools from within the R framework.

In this post, we will install this package and use it to explore the ability for Watson to automatically recognize text within images (English only, for now). We will use it to determine if a complete abdominal US study has all of the required organs, and if not, to tell us which one(s) are missing.

Installation of CognizeR

The CognizeR package is available for download directly from Github. You will note that, unlike other packages we have used before, this one is not yet available in the CRAN repository and must be installed manually. Alternatively, as we will use here, the devtools package and CURL allow direct package installation from Github. The installation instructions are described in the Readme.md file.

  • Sign up for a free Bluemix account which provides access to Watson's services. The first month is free. After that, a limited number of daily API calls to Watson's services are free but high-volume users can pay for more access. Detailed instructions for how to signup and get the appropriate credentials can be found in the links from Readme.md file. We will use the Visual Recognition service for this project.
  • It is important to store the credentials needed to access the API. There are many ways to store passwords, including a R environment file. Here we are just going to save our credential key in a Rdata file for easy use.
In [1]:
IMAGE_API_KEY <- "********************"
save(IMAGE_API_KEY, file="key.Rdata")
  • Install CURL and then the CognizeR package.
In [2]:
install.packages("devtools")
install.packages("https://github.com/jeroenooms/curl/archive/master.tar.gz", repos = NULL)
library("devtools")
devtools::install_github("ColumbusCollaboratory/cognizer")

The Images

The Watson API for image recognition can analyze images in either jpg or png format. I extracted images from a complete abdominal US exam and manually removed all of the PHI to anonymize the images. You can download them here.

Let's look at an example image. To display an image in R we read it in using the png library (there is also a jpeg library) and display it using the grid.raster() function.

In [3]:
library("png")
library("grid")
image_text_path <- "./Images/"
image_list <- list.files(image_text_path, pattern="^US*", full.names=FALSE)
grid.raster(readPNG(paste(image_text_path,image_list[1], sep="")))

We can see a bunch of text on the image. There is also a black stripe across the top of the image where the patient's PHI originally appeared.

Image Inversion

I spent some time playing with Watson's ability to extract the pertinent text from these US images prior to writing this post. Some aspect of the underlying algorithm appears to like black text on a white background better than the reverse. Therefore, before we start the text analysis we will flip all of the images.

In [4]:
for (i in 1:length(image_list)) {
  raw <- readPNG(paste(image_text_path, image_list[i], sep=""))
  writePNG(1-raw, target=paste(image_text_path, "Neg", image_list[i], sep=""))
}
image_list_neg <- list.files(image_text_path, pattern="Neg*", full.names=TRUE)
grid.raster(readPNG(image_list_neg[1]))

Watson

Now that we have everything setup correctly, we can run them through Watson to see what it detects in each. It is amazing that this API is so simple it requires only one line to interface with Watson's servers and get the analysis.

In [5]:
library("cognizer")
image_text <- image_detecttext(image_list_neg, IMAGE_API_KEY)

The output of Watson's algorithm is very organized but not in a human-friendly format. If we dig into the dataframe structure of the results for one of the images we find the data of interest under images and then words.

In [6]:
str(image_text[[1]])
image_text[[1]]$images$words
List of 2
 $ images          :'data.frame':	1 obs. of  3 variables:
  ..$ image: chr "NegUS1.png"
  ..$ text : chr "fr [32nd]\nm2\nrs\ngen\n[try] pancreas"
  ..$ words:List of 1
  .. ..$ :'data.frame':	7 obs. of  4 variables:
  .. .. ..$ line_number: int [1:7] 0 0 1 2 3 4 4
  .. .. ..$ location   :'data.frame':	7 obs. of  4 variables:
  .. .. .. ..$ height: int [1:7] 20 20 18 16 20 24 24
  .. .. .. ..$ left  : int [1:7] 16 52 962 12 12 256 336
  .. .. .. ..$ top   : int [1:7] 76 76 77 100 220 740 740
  .. .. .. ..$ width : int [1:7] 32 56 23 32 40 48 128
  .. .. ..$ score      : num [1:7] 0.938192 0.000776 0.99011 0.979501 0.991375 ...
  .. .. ..$ word       : chr [1:7] "fr" "32nd" "m2" "rs" ...
 $ images_processed: int 1
Out[6]:
  1. line_numberlocation.heightlocation.xlocation.ylocation.widthscoreword
    10201676320.938192fr
    20205276560.00077620832nd
    311896277230.99011m2
    421612100320.979501rs
    532012220400.991375gen
    6424256740480.0599906try
    74243367401280.986517pancreas

This shows us that Watson found 7 words in the image.

  • The 1st column, line_number, is Watson's listing of each line in the image. We are not going to use this piece of information
  • The 2nd-5th columns are actually their own variable named "location" and define the bounding box for the text, in pixels. Column 2 is the height of the box and column 5 is the width of the box. Column 3 and 4 represent the upper left x and y coordinates, respectively.
  • Column 6 shows Watson's confidence that this is an English word, on a scale of 0 to 1
  • Column 7 gives the word itself

This output structure is identical to what we would receive if we used Python, Node.js or any other language to query Watson. The difference is that it is normally returned as a JSON object instead of this R data structure. The gritty details are listed on Watson's API page https://www.ibm.com/watson/developercloud/visual-recognition/api/v3/#recognize_text

Checking for Completeness

In order to document completness for an abdominal ultrasound study, the following organs must be imaged:

  • Liver
  • Gallbladder
  • Common Bile Duct
  • Pancreas
  • Spleen
  • Kidneys
  • Aorta
  • Inferior Vena Cava

We can loop through the text data returned by Watson to see if everything is present. First, define the list of terms we want to search for in the images. We also need to recognize that the terms are sometimes abbreviated. Of course, these search terms could be changed depending on the particulars of a radiology practice. We then create an array for each organ representing whether it has been found or not, initialized to all 0s.

In [7]:
terms <- list("liver",
             c("gallbladder","gb"),
             c("common bile duct","cbd"),
             "pancreas",
             "spleen",
             "kidney",
             c("aorta","ao"),
             c("inferior vena cava","ivc"))
found <- rep(0, length(terms))

Now we cycle through the images to determine if any organ labels are there. We start by looping over each image. Within that loop, only search for the organs that have not been found in prior images (variable todo). Loop over our pre-defined terms for these organs and, if they are found in this image, set the found flag to 1.

In [8]:
for (im_text in image_text) {
  todo <- which(found == 0)
  if (length(todo) > 0) {
    image_words <- im_text$images$words[[1]]$word
    for (t in todo) {
      organ <- terms[[t]]
      result <- match(organ,image_words)
      if (length(which(!is.na(result))) > 0) {
        found[t] <- 1
      }
    }
  } else {
    break
  }
}

Now we finish by checking if all the terms have been found, and if not, output an error and the missing organs.

In [9]:
notfound <- which(found == 0)
if (length(notfound) > 0) {
  print("Exam not complete. Missing terms")
  for (i in notfound) {
    print(terms[[i]])
  }
} else {
  print("Complete Exam")
}
[1] "Exam not complete. Missing terms"
    "aorta" "ao"   
    "inferior vena cava" "ivc"               

It seems that the aorta and inferior vena cava were not found. However, we can look at image 31, and the text associated with it, to see that the labels were there but Watson's image recognition was not able to separate them based on the way they were displayed on the image.

In [10]:
image_text[[31]]$images$words[[1]]$word
grid.raster(readPNG(paste(image_text_path,image_list[31], sep="")))
Out[10]:
  1. 'fb'
  2. '35hz'
  3. 'gs'
  4. 'm2'
  5. 'sbxj'
  6. 'sbxj'
  7. 'o'
  8. '55'
  9. 'i'
  10. 'low'
  11. 'gerl'
  12. 'try'
  13. 'aoxiyc'

Conclusion

So how did Watson do? It correctly identified much of the text throughout the US images. We did have to invert them but that is trivial from a computational perspective.

However, Watson is looking for English words so some abbreviations are not recognized or recognized incorrectly. We saw this with the abbreviations for the aorta and inferior vena cava. Similar issues were seen with dates present in other images I fed to Watson. When this feature is expanded beyond English words I think it will be very accurate.

One important consideration is that the Bluemix APIs, as currently implemented, are not HIPAA compliant. Microsoft also has some artifical intelligence services with relatively easy access that do claim to be HIPAA compliant but I have not explored them yet.

In summary, this example has barely scratched the surface of the possibilities that Watson has to offer. It is exciting to have such a powerful tool integrated into the R framework. The ability to access cloud-based resources such as Watson will greatly expand the power available to data scientists.

Unfotunately, the text recognition feature showcased here has reverted to a closed beta status. The other aspects of the API, also compatible with this R plugin, continue to be available. It is not clear why the Bluemix team have chosen to move the text recognition back to closed beta but there are many possibile avenues for exploration with the other available functions.

In [11]:
paste("Author:", Sys.getenv("USER"))
paste("Last edited:", Sys.time())
R.version.string
Out[11]:

Author: Joe Wildenberg
Last edited: 2016-09-10 13:31:15
R version 3.2.2 (2015-08-14)