CACRC-Pan: A Concept Annotated Case Report Corpus and Processing Pipeline for Rare Pancreatic Diseases.
WHY IT MATTERS
This tool could help doctors diagnose rare pancreatic diseases faster and more accurately by automatically identifying key symptoms in patient records, potentially reducing the years of diagnostic delay that patients with these conditions typically experience.
Researchers created a new computer tool that helps doctors identify rare pancreatic diseases by reading medical case reports. The tool learns to recognize symptoms described in patient stories and then suggests which disease a patient might have. Six rare pancreatic diseases are included: autoimmune pancreatitis (types 1 and 2), cystic fibrosis, hereditary chronic pancreatitis, pancreatic sufficiency, and Shwachman-Diamond syndrome.
CACRC-Pan: A Concept Annotated Case Report Corpus and Processing Pipeline for Rare Pancreatic Diseases. Abstract: We introduce CACRC-Pan, a corpus of case reports for six rare pancreatic diseases (AIP1-2, CF, HCP, PS and SDS) annotated at two concept levels: token-level symptom labels (HPO-coded spans) and document-level disease labels. This dual annotation enables both standard natural language processing (NLP) tasks and studies on concept bottleneck models, linking interpretable symptom concepts to diagnostic outcomes. We also provide a baseline NLP pipeline combining PubMedBERT for symptom extraction with a lightweight voting classifier for disease inference. Results demonstrate the effectiveness of this simple, interpretable approach, and all resources are released openly to support reproducible research on rare pancreatic diseases. Authors: Mulimbi et al. Journal: Studies in health technology and informatics MeSH: Natural Language Processing, Humans, Pancreatic Diseases, Rare Diseases, Electronic Health Records, Data Mining, Vocabulary, Controlled