Almost every week, we see articles in the public or scientific media claiming that smart machines are now doing medical diagnosis. A very recent example appeared in the September 25th edition of Medical News with the title “Artificial intelligence as effective as professionals at diagnosing disease.”
The purpose of a new article I just published on arXiv is to clarify some misconceptions in this area and explain that Disease Labeling via Machine Learning is NOT quite the same as Medical Diagnosis.
Allow me to start with my experience and credentials. As some of you may remember, I was fortunate to be one the pioneer mathematicians involved in developing Artificial Intelligence (AI)-based support software for medical diagnosis and treatment, including for Endocrinology/Infertility, Emergency and Critical Care (MEDAS) , Space Medicine, and Arthritis. For several years my office was literally at the USC Institute of Critical Care Medicine, under the leadership of the late M.H. Weil, MD. By 1990, long after I moved on, MEDAS had reached a “90% level agreement with the gold standard diagnosis of the attending physician” and grew to cover 350 disorders by means of 6000 features organized in hierarchical Bayesian networks. A number of my publications describe the mathematics and algorithms – along with medical examples – for topics such as diagnosing multiple co-existing disorders, clinician-oriented information acquisition, sensitivity analysis of Bayesian decisions to inaccuracies in conditional probabilities, and evaluating multi-membership classifiers. These and other articles also present my conceptual views on AI software to support medical decision making based on lessons I learned from my colleague physicians and mathematicians, over many years working closely with them.
First, let us distinguish point solutions, addressing very limited sub-tasks in medical diagnosis such as EKG or MRI interpretation, from AI solutions to support the full workflow of medical diagnosis. Intelligent automation of AI point solutions is reaching very high levels of performance and accuracy and they have tremendous potential for patient care in terms of quality and cost. Clearly, however, reaching a patient diagnosis takes much more than one finding from an EKG or MRI, and so using ‘medical diagnosis’ as the title for such solutions is an overstatement.
In terms of the full workflow of the physician’s job – diagnosis, prognosis, treatment – a key step in medical diagnosis is giving the patient a universally recognized label (e.g. Appendicitis) which essentially assigns the patient to one or more classes of patients with similar body failures. However, two patients having high probability of the same disease label may still have differences in their feature manifestation patterns implying differences in the required treatments. Additionally, in many cases, the labels of the primary diagnoses leave some findings unexplained. Diagnosis is not complete until every abnormal finding is clinically explained, and the patient’s overall situation is clinically understood to the level that enables the best therapeutic decisions currently available. Medical diagnosis is only partially about probability calculations for label X or Y. It is about reaching a clinical understanding of the overall situation of the patient for the purpose of delivering the right treatment.
Most contemporary machine learning models are data-centric, and evidence so far suggest they can reach expert level performance in the disease labeling phase. Nonetheless, like any other mathematical technique, they have their limitations and applicability scope. Primarily, data-centric algorithms are knowledge-blind and lack anatomy and physiology knowledge that physicians leverage to achieve complete diagnosis. The article I just published advocates to complement data-centric algorithms with intelligence to overcome their inherent limitations as knowledge-blind algorithms. Machines can learn many things from data, but data is not the only source that machines can learn from. Historic patient data only tells us what the possible manifestations of a certain body failure are. Anatomy and physiology knowledge tell us how the body works and fails. Both are needed for complete diagnosis.
With the understanding that medical diagnosis is only partially about probability calculations for the various labels and that disorder labeling and medical diagnosis are not quite the same, the article proposes the Double Deep Learning approach, along with the initiative for Medical Wikipedia for Smart Machines. It leads to AI diagnostic support solutions for complete diagnosis beyond the limited data-only labeling solutions we see today. Would any ‘data-only’ AI professional agree to be treated by a doctor who admits having forgotten all he learnt in medical schools about anatomy and physiology even when the doctor is supported by the ultimate ML software based on millions of patient records? AI for medicine will forever be limited until their intelligence also integrates anatomy and physiology.
Let me conclude with my view on the human/computer roles in medical diagnosis as I expressed it in an article on AI for complex decision-making applications in business, military, or government. “… the emphasis so far in AI work has been on proving machine’s ability to outperform human, or at least be comparable. With business results as the focus of Business AI, the question is not whether machine is smarter than human or the opposite, but rather how to create a man-machine team that performs better than any individual team member alone. The emphasis should be on identifying and allocating to each team member those tasks in which he has relative advantage over the other; meaning the focus should shift from what machines can do to what machines should do. This way they amplify each other’s abilities leading to higher business value. These points, almost verbatim, first appeared in 1980 in an IEEE article that described the MEDAS system we developed for Emergency and Critical Care and Space Medicine.
To summarize, a typical doctor-patient dialogue involves anatomical/physiological explanations of the patient situation along with statistical facts such as: “in most cases we see, ….”. In response to statistics, a patient may respond with: “Doctor it is now me you are talking about, Jim, statistics of 1. What is happening / likely to happen with me?” At this point, AI solutions based on data-only algorithms run into a dead end and cannot offer any intelligent support to the physician, beyond possibly literature references to sources where he may find more information.