Images and Outcomes

Machine Learning and AI

The National Programme for IT (NPfIT) or Connecting for Health (CfH) as it became known was an overall left at least one positive legacy. PACS (Picture Archiving and Communications Systems)

PACS is the radiology world's digital storage and distribution system for radiology imaging. While most countries in the wetsern world are now entirely digital in their radiology imaging, this was emphatically the case in the UK by 2009 thanks in part to the National Programme.

The history of PACS and the economic forces that brought it to fruition when they did is an interesting subject in itself that I may write up one day: that and the accident of history that keeps the RIS (Radiology Information Systems) that manages scheduling and reports as a separate system in the radiology world - a real information systems oddity that is worth a mention.

A Wealth of Data

In the UK, we have records of radiology procedures dating back for 10, 15 more years. These are really full records. Digital images in a format that is rich in analyzable data and easily correlated with the radiologists' reports. This is an absolute dream for any machine learning algorithm. The same is true in many Western countries, but there is an added NHS benefit. The radiology data practices in the NHS are, relatively speaking fairly well standardised. There are standard terminologies and classifications. Of course there are variations, but on the whole the way one radiology department describes its data is very much the same as any other across the UK.

This is so ripe for machine learning. Image analysis tools paired up with report data and further, we even have vast amounts of outcomes data for patients that could be correlated.

The technical challenges are easily surmountable. There are challenges for privacy of patient data, and permissions for secondary use. But I think these could be addressed and a significant and meaningful body of data turned into information through AI. The privacy and consent issues are dealt with elsewhere in these pages (for example in the Aggregate or Federate? discussion.

Approaching the Problem

There are so many approaches to investigate this set of data with AI, it's hard to know where to start. However, let's look at a typical Neural Net learning approach. In a Neural net, a set of layered weighted parameters that can be minimised to find the closest match. The key is to have verifiable outcomes for test data, so that the weighted parameterisation at each layer can be optimised to find matches: this optimisation is called a linear regression, to find the minimum notional distance between the object you're guessing and the object type you know. Most often, the machine needs to be taught by human input to determine the classification of outcomes. For example, a human could identify a load of different sweet wrappers, the neural net could optimise its outputs based on image content of a set of random sweets pictures - and guess which picture is which sweet.

So that's ..

Get a training dataset
let the Neural Net learn on the data
use the AI (and continue to learn)

Here is the key: we already have a huge set of radiology images with known classified outcomes. The training dataset is enormous! Most often, it is the gathering of the training data that is the hardest step. In radiology, the image space is bounded (we don't have random pictures of cars to throw out), the format of the data is high quality and we have correlated reports (and outcomes).

Why do it?

In the following sections I'll outline some interesting studies that could be performed, just to show what a rich area for research this is. But why do it? The answer is clear - we have a shortage of radiologists in the UK. This medical specialty is being overwhelmed by the demand for diagnostic imaging. If machine reading and analysis can help alleviate this problem then this is worth doing.

This is nice but is it NICE?

You don't have to look far to find a proliferation of imaging A.I, in the main they stem form the particular expertise of an individual or small group of people who have developed algorithms that work in a very narrow field - to give a very specific set of results. This is really the opposite of "looking at radiology Imaging" as a whole. But it is understandable: people predicate the validity of their methods on their own esteemed reputations. There is nothing wrong with this per se, but it does lead to isolated examples and for sure variable quality. Many of the 'research programs' cited are small in sample size and limited in their academic scrutiny. Please assign this (my) assertion to the "Harsh but Fair" category.

What Strikes me most is that purveyors and distributors of these algorithms make big claims for their efficacy (both in terms of diagnostic ability and in economic terms).

If these were new drugs being introduced in the U.K. then they would be under the scrutiny NICE. We need a NICE-like process for analytics, algorithms (programmes or generated through A.I.). If we do not monitor and manage this then what is to stop any bright spark making a claim and marketing it. I am so mistrustful of marketing.

The introduction and use of these supplementary processes for radiology reading needs to be consider before diving in too deep.

Methods and Investigations

This section outlines a few concepts and ideas for interesting (and I think beneficial) research that could be followed up on. Each of these areas could be written up as a separate paper. These are just a few ideas and they will be filled out as this section develops.

Image spatial (multi) frequency analysis

With a judicious culling of zero spaces and any embedded text (I won't repeat this on every section), image pixels could be analysed over a set of multi-frequency (spatial frequencies). A simple histogram of image content due to this multi-frequency analysis could be then input as the base set of image data against the known radiology reports, and known outcomes. I would be particularly interested to see if a noise characteristic led to any misdiagnoses.

Principal Component Analysis

This is the main type of facial recognition image analysis. SImply applying this to radiology images is the basis of many emerging AI or deep learning approaches to Radiology Imaging. Applying this to the large amount of historic data would surely be revealing.

Wavelets

Like spatial frequency analysis, wavelet analysis takes an image a reduces it to a number of contributions from a tuned set of orthogonal wavelets. The choice and tuning of the wavelet tells you something about the image. This research could simply be a repeat of the multi-spatial-frequency work (above), but a Neural Learning of the best Wavelets to apply would also be very interesting.

Multi Parameter Characterisations

There are many ways to parametrise (or compress) and image. All of them are revealing in their own way. It would be interesting to characterise images as a set of 'main contributions' in several image analysis techniques (say multi-frequency and principal component) and use this new set of main charatersations as the input so some machine learning. There is probably a way to consider which sets of image analysis work together the best (like multi-frequency analysis and wavelet are too similar, so why do both?).