← Back to news
ResearchPUBMEDFriday, April 3, 2026 · April 3, 2026

Systematic benchmarking demonstrates large language models have not reached the diagnostic accuracy of traditional rare-disease decision support tools.

WHY IT MATTERS

If you're waiting for a diagnosis for a rare genetic disease, this research shows that AI chatbots alone shouldn't replace traditional diagnostic tools—your doctor should continue using proven methods alongside any new technology.

Researchers tested whether artificial intelligence chatbots like ChatGPT could diagnose rare genetic diseases by comparing them to a traditional diagnostic tool called Exomiser. They tested seven different AI models on over 5,000 real patient cases. The study found that even the best AI chatbots were not as accurate as the existing diagnostic tool at identifying the correct disease.

Systematic benchmarking demonstrates large language models have not reached the diagnostic accuracy of traditional rare-disease decision support tools. Abstract: Large language models (LLMs) show promise in supporting differential diagnosis, but their performance is challenging to evaluate due to the unstructured nature of their responses, and their accuracy compared to existing diagnostic tools is not well characterized. To assess the current capabilities of LLMs to diagnose genetic diseases, we benchmarked these models on 5213 previously published case reports using the Phenopacket Schema, the Human Phenotype Ontology and Mondo disease ontology. Prompts generated from each phenopacket were sent to seven LLMs, including four generalist models and three LLMs specialized for medical applications. The same phenopackets were used as input to a widely used diagnostic tool, Exomiser, in phenotype-only mode. The best LLM ranked the correct diagnosis firs Authors: Reese et al. Journal: European journal of human genetics : EJHG MeSH: Humans, Rare Diseases, Benchmarking, Diagnosis, Differential, Language, Large Language Models

Read the original at pubmed
artificial intelligencegenetic diagnosisdiagnostic accuracyrare disease diagnosisclinical decision support

Related news

Researchbiorxiv · April 14, 2026
Preprint: CoNVict: An Agentic AI System for Copy Number Variation Prioritization in Rare Disease Diagnosis
Scientists created a new AI system called CoNVict that helps doctors figure out which genetic changes are actually causi