Artificial intelligence recommendations for the diagnosis and management of acute complaints during virtual, primary-care visits generally align with those from physicians and could even surpass them although each approach shows unique strengths, research suggests.
The study indicates that AI can enhance clinical decision-making in a virtual, urgent care setting for common, acute symptoms where the AI system has previously shown high diagnostic accuracy.
Reporting in the Annals of Internal Medicine, Dan Zeltzer, PhD, from Tel Aviv University, and co-workers acknowledged that further investigation is needed to understand whether AI can help with more complex care needs.
Nonetheless, they proposed: “Thoughtful integration of AI into clinical practice, combining its strengths with those of physicians, could improve the quality of care.”
Yet an editorial accompanying the study, Jerome Kassirer, PhD, from Tufts University School of Medicine in Boston, maintained the need for vigilance.
“As recommendations emerge for the use of powerful AI programs in medicine, great caution should be the watchword,” he wrote.
“In contrast to other diagnostic and therapeutic methods, AI programs are an impervious black box,” he continued.
“When colleagues and I applied decision analysis in difficult clinical situations and a counterintuitive result emerged, we could climb back up the decision tree and recheck the assumptions that were the basis for the result.
“Given the unique data architecture and obscure internal connections used with AI, even the program’s developer may be unable to determine how a flawed judgment emerged.”
The researchers compared initial clinical recommendations made by the K Health Clinical AI platform to final physician decision-making at a virtual primary and urgent care clinic run by Cedars Sinai, in which physicians are presented with AI-based intake, diagnosis, and management assistance.
The study included 461 real primary care visits that involved acute urinary, vaginal, respiratory, eye, and dental symptoms, for which AI has shown high diagnostic accuracy.
AI-guided medical intake questions occurred via a structured chat. When confidence was sufficient, AI presented diagnosis and management recommendations in the form of prescriptions, laboratory tests, and referrals.
Physicians could review the AI-generated recommendations before making the final clinical decision.
It was not possible for the researchers to know whether they did before making their final care decisions.
Four expert adjudicators reviewed the 461 visits and, using a checklist to rate both the AI and physician recommendations, often rated the AI recommendations higher than those of the physicians.
In approximately two-thirds of cases, physicians made identical clinical decisions to the AI. But in the remaining third, the AI recommendations were rated as superior twice as often as they were rated inferior to those of physicians and received only about half as many “potentially harmful” ratings as the physicians (2.8% vs 4.6%, respectively).
The AI diagnosis and management recommendations were more likely to be rated as optimal compared with physicians, at a corresponding 77.1% and 67.1%.
AI was particularly strong at adhering to clinical guidelines, recommending appropriate laboratory and imaging tests, and recommending necessary in-person referrals.
Indeed, it outperformed physicians in avoiding unjustified empirical treatments and recognizing key risk factors that could trigger a change in diagnosis or management.
On the other hand, physicians were particularly good at adapting to evolving or inconsistent patient narratives, where the information disclosed during the consultations differed from the information provided during the chat intake questionnaire.
They also appeared to show better judgment in avoiding unnecessary referrals to the emergency department and in accurately diagnosing conditions requiring visual assessment.
The researchers noted: “Because the interface in use at the time did not optimize physician viewing of these recommendations and we do not know whether physicians used them, we believe our findings represent a conservative estimate of the potential for AI to improve care in this setting.”