A large language model (LLM) can measure the quality of primary care and flag areas for improvement among children prescribed medication for attention deficit hyperactivity disorder (ADHD), using the vast quantities of information stored in clinical notes.
The open-source LLaMA model, described in Pediatrics, was able to assess how closely side effects were being monitored after the prescription of ADHD medications and how well clinicians adhered to practice guidelines.
“The model showed excellent performance in identifying notes that contained documentation of side effects inquiry,” said researcher Yair Bannett, MD, an assistant professor of pediatrics at Stanford.
Traditional methods for capturing clinical practice such as chart reviews are labor intensive and not conducive to real-time improvements, explained Bannett.
LLMs are a type of AI that offers the opportunity to assess quality of care at scale, which could lead to timely detection of targets for improvement.
Bannett and team chose to test this novel application of LLMs by applying the open-source LLaMA model to one element of care for ADHD, a highly prevalent condition commonly managed in primary care.
In a retrospective study, they looked at the medical records of 1201 children aged 6 to 11 years who were seen at 11 pediatric primary care practices in the same healthcare network. All children had at least one ADHD diagnosis and at least two prescriptions for ADHD medication.
Two clinicians reviewed the charts from a random sample of 119 participants to determine if the 501 clinical notes for this group included documentation showing the monitoring of side effects.
The model was then trained on 411 of these notes—equivalent to 80% of the sample—with the remaining 90 notes kept in a “hold-out” group to check that the model could find these inquiries.
The team then deployed the LLM on the remaining 1189 patients, incorporating 15,127 notes in the patients’ charts, and examined its performance in a “deployment test” of 363 notes.
LLaMA accurately classified notes that contained an inquiry into side effects, with a sensitivity of 87.2%, specificity of 86.3%, and area under curve of 0.93 in the holdout test set.
There was no model bias with respect to participants’ sex or insurance status, and characteristics were mostly similar whether or not a side effects inquiry was recorded.
Documented inquiries into side effects were significantly less common for telephone encounters than those arising in the clinic or through telehealth, at a corresponding 51.9% versus 73.0%.
In seven of the 11 primary care practices (PCPs), more than half the ADHD encounters were completed by phone but only two of these practices regularly documented asking about side effects.
“When speaking with PCPs, we learned that only these two practices have a workflow of asking about side effects when receiving refill requests over the phone,” noted Bannett.
Inquiries were also more common with encounters following stimulant than nonstimulant prescriptions, at 61.4% versus 48.5%, respectively. Overall, 999 participants received stimulant medications, 54 nonstimulant medication, and 148 both types.
Barrett noted that, when speaking with PCPs, many feel less confident about nonstimulant management.
In a commentary article accompanying the research, Robert Grundmeier, MD, and Kevin Johnson, MD, both from the University of Pennsylvania, call it a “compelling study” that demonstrates the potential of LLMs to enhance quality of care assessments.
“Applying an open-source LLM to clinical notes from a community-based pediatric network provides a scalable solution for monitoring clinician adherence to guidelines on side effects management following ADHD medication prescriptions,” they maintain.
“This work advances our understanding of the role of artificial intelligence in automating and scaling clinical practice assessments, overcoming traditional limitations such as the labor-intensive nature of chart reviews.”