3 ways to ensure AI algorithms are really working

2016 10 03 14 26 17 320 Computer Binary 400

Artificial intelligence (AI), deep learning, and related technologies have emerged as solutions that can really make a difference in almost every medical field imaginable, from radiology to pharmaceuticals to diagnosis. But the question remains: How do we know that these algorithms are actually working?

Traditional AI consists of many man-made rules, and as human logic created those rules, they shouldn't be too difficult to comprehend using simple human reason.

However, deep learning, a particular subset of AI that is also one of the most promising technologies, is in many ways a radical departure from the traditional logic of machine learning. Deep learning takes AI to the next level through the use of a complex neural network leading to a "black box" solution with results that can appear very difficult to comprehend. Therefore, it is crucial that we establish methods to understand and validate the results of deep learning so we can harness its potential.

Elad Walach, co-founder and CEO of Aidoc.Elad Walach, co-founder and CEO of Aidoc.

Validating the results of deep learning presents a monumental challenge. The key question is: Is the algorithm generalizable? Meaning, if we would test the algorithm in a different setting, would it still perform the same?

Humans, by nature, are much better at generalizing. This means that if you set some man-made rules (as in traditional machine-learning approaches), you can expect those results to work well enough in other settings. For example, if you set a threshold on the contrast of the image (Hounsfield units in CT images), you should get roughly the same results no matter what scanner you use. But if the computer came up with millions of parameters, you can't really know if changing the voltage on the scanner might change the whole analysis.

As gathering data is hard, some developers are tempted to take shortcuts by training algorithms on images from only a few medical centers, leading to minimal data differentiation. In such cases, the accuracy of the algorithms is true only for those types of clinical settings, while performance in other settings isn't clear.

Different settings entail different scanners, different populations, and even different types and frequency of diseases -- for example, if you're building an algorithm on data from an outpatient facility, it won't necessarily work for a trauma center.

Therefore, with deep learning -- for which varied datasets are key to obtaining proper results but are hard to come by -- proper validation of deep-learning results is crucial and can even mean the difference between life and death.

3 types of validation

We propose three types of validation to support radiology solutions based on deep learning and enable their beneficial adoption as soon as possible:

  1. Statistical: Also known as standalone performance testing. The idea behind statistical validation is to assess the accuracy of the algorithm's consistency in various clinical settings with highly different parameters (for example, trauma center versus regular medical center; using different scanner types with different resolutions). In addition, it is best to single out a number of clinical settings and avoid using them for training the algorithm, instead utilizing them only for testing and validation -- since a golden rule for validation is determining if the algorithm works "out of the box" in a new setting.
  2. Explanatory: Explanatory validation aims at exploring exactly how the deep-learning algorithm works by focusing on where it doesn't work. For example, if the algorithm's accuracy is 96%, an important question to ask is: Where is it expected to fail? With specific types of data? In specific patients? In cases in which the clinical context is more significant than just the image? By researching where an algorithm fails, the general logic under the hood can thus be exposed, allowing clinicians to understand when to rely more on the solution's results -- and perhaps, more important, when to rely less. In this case, specific hospitals can determine if their particular conditions are well-suited to specific algorithms.
  3. Outcome: The ultimate manifestation of AI technology is impact on clinical outcomes. We believe that deploying deep-learning solutions in actual workflows is required for assessing their true clinical impact. Unfortunately, peer-reviewed research is lacking this space. That is why we believe it's important to put a major focus on this gap, working with leading medical centers on clinical studies to determine the impact of specific deep-learning algorithms on the radiology workflow and patient outcomes. At the end of the day, it doesn't matter if you have an amazing algorithm -- what matters is that you can move the needle on a clinically significant process.

Though healthcare AI is an exciting space with endless opportunities for improving patient lives, the key for generating value is developing working solutions rather than hypothetical ideas that require much investment with few results. With solutions powered by deep learning, robust validation is necessary for concluding whether solutions do in fact meet desired expectations. Without real rigor, the industry and patients themselves lose out.

Elad Walach is the co-founder and CEO of Aidoc, a healthcare AI start-up focused on using deep learning to relieve the bottleneck in medical image diagnosis. Walach began his career in the elite Israeli Defense Force's Talpiot technology program. He served as a researcher in the Israeli Air Force's algorithmic division, where he raised through the ranks, reaching the position of algorithmic research leader, and led several teams focused on machine learning and computer vision projects from inception to execution. Walach holds a Bachelor of Science in mathematics and physics from the Hebrew University of Jerusalem and a Master of Science in computer science with a focus on deep learning from Tel Aviv University.

The comments and observations expressed are those of the author and do not necessarily reflect the opinions of AuntMinnie.com.

Page 1 of 364
Next Page