Credit: Pixabay/CC0 Public Domain
A team of Mass General Brigham researchers has developed one of the first
fully autonomous artificial intelligence (AI) systems capable of screening for
cognitive impairment using routine clinical documentation.
The system, which requires no human intervention or prompting after
deployment, achieved 98% specificity in real-world validation testing. Results
are published in npj Digital
Medicine.
Alongside the publication, the team is releasing Pythia, an open-source tool that enables any health care
system or research institution to deploy autonomous prompt optimization for
their own AI screening applications.
"We didn't build a single AI model—we built a digital clinical team," said corresponding author
Hossein Estiri, Ph.D., director of the Clinical Augmented Intelligence (CLAI)
research group and associate professor of medicine at Massachusetts General
Hospital. "This AI system includes five specialized agents that critique
each other and refine their reasoning, just like clinicians would in a case
conference."
Challenges in cognitive impairment detection
Cognitive impairment remains significantly underdiagnosed in routine
clinical care, and traditional screening tools and cognitive tests are highly
resource-intensive to administer and difficult for patients to access. Yet
early detection has become increasingly critical, especially with the recent
approval of Alzheimer's disease therapies that are most effective when
administered early in the disease.
"By the time many patients receive a formal diagnosis, the optimal
treatment window may have closed," said co-lead study author Lidia Moura,
MD, Ph.D., MPH, director of Population Health and the Center for Healthcare
Intelligence in the Department of Neurology at Mass General Brigham MGB
Neurology Department.
How the AI system works
To better capture at-risk patients, the Mass General Brigham team developed
an AI system that runs on an open-weight large language model that can be
deployed locally within hospital information technology infrastructure. It
employs five agents that each serve different functions and work
collaboratively to make clinical determinations and refine them to address
errors and improve sensitivity and specificity.
These agents operate autonomously in an iterative loop, refining their
detection capabilities through structured collaboration until performance
targets are met or the system determines it has converged. No patient data are
transmitted to external servers or cloud-based AI services.
The study analyzed more than 3,300 clinical notes from 200 anonymized
patients at Mass General Brigham. By analyzing clinical notes produced during
regular health care visits, this innovative system can turn everyday
documentation into a chance to screen for cognitive issues, helping identify
patients who might need a formal assessment.
"Clinical notes contain whispers of cognitive decline that busy
clinicians can't systematically surface," said Moura. "This system
listens at scale."
Performance and limitations of the system
When the AI system and human reviewers disagreed, an independent expert
re-evaluated each case. Among the disagreement cases, the expert validated the
AI's reasoning 58% of the time—meaning the system was often making sound
clinical judgments that initial human review had missed.
"We expected to find AI errors. Instead, we often found the AI was
making defensible judgments based on the evidence in the notes," said
Estiri.
Analysis of cases in which the AI was incorrect revealed systematic
patterns: documentation limitations where cognitive concerns appeared only in
problem lists without supporting narrative, and domain knowledge gaps where the
system failed to recognize certain clinical indicators. The system excelled
with comprehensive clinical narratives but struggled with isolated data lacking
context.
Although the system achieved 91% sensitivity under balanced testing, its
sensitivity decreased to 62% under real-world conditions (with a prevalence of
33% positive cases), while specificity remained high at 98%. The researchers
reported these calibration challenges to provide transparency and guide future
efforts to improve clinical reliability.
"We're publishing exactly the areas in which AI struggles," said Estiri. "The field needs to stop hiding these calibration challenges if we want clinical AI to be trusted."
Provided
by Mass
General Brigham
Source: Autonomous AI agents developed to detect early signs of cognitive decline

No comments:
Post a Comment