Session: (0176–0195) Healthcare Disparities in Rheumatology Poster I: Lupus
0185: Classifying Individuals with Rheumatic Conditions as Financially Insecure Using Electronic Health Record Data and Natural Language Processing: Algorithm Derivation and Validation
Boston Children's Hospital Milton, MA, United States
Disclosure information not submitted.
Mia Chandler1, Tianrun Cai2, Leah Santacroce2, Sciaska Ulysse2, Katherine Liao2 and Candace Feldman2, 1Boston Children's Hospital, Milton, MA, 2Brigham and Women's Hospital, Boston, MA
Background/Purpose: Social determinants of health (SDoH) such as financial insecurity contribute to disparities in rheumatic disease care and outcomes but are not routinely included in structured electronic health record (EHR) data, (e.g., ICD-10 billing codes). SDoH described in clinical notes are not readily extractable and therefore cannot be easily incorporated into research studies. We leveraged natural language processing (NLP) to extract terms related to financial insecurity and used machine learning models to develop and validate an algorithm to identify individuals with this critical SDoH.
Methods: We randomly selected 600 patients from 20,395 with rheumatic or musculoskeletal conditions enrolled in an integrated care management program (iCMP) between 1/1/12-10/18/21. iCMP provides care for medically and psychosocially complex patients. The study team (social epidemiologists, pediatric and adult rheumatologists, bioinformaticians) defined the construct "financial insecurity" using nominal group technique. Reviewers (MTC, SU, CHF) operationalized this definition with manual EHR reviews to establish the gold standard. Individuals were classified as having definite, possible, or no financial insecurity in separate training and validation cohorts. We constructed a context-driven lexicon containing terms for financial insecurity using data from PubMed, the Unified Medical Language System, and previous EHR reviews (Table 1). All available notes were then processed using NLP with the context-driven lexicon. We developed models using logistic regression, LASSO regression, and random forest, trained on EHR-based review of cases of financial insecurity (definite or definite and possible combined) and determined the performance metrics for each model.
Results: Among 600 identified patients, we excluded 62 due to lack of notes, clear rheumatologic diagnoses, or iCMP enrollment confirmation (N=538). 245,142 notes were processed from the training (N=366) and validation cohorts (N=172). Financial insecurity was present among 100 individuals (27%) in the training cohort and 63 (37%) in the validation cohort (Table 2). All models (logistic regression, LASSO, random forest) classifying the presence of financial insecurity performed similarly regardless of the algorithm used, with logistic regression models achieving the overall highest positive predictive value (PPV) of 0.98. (Table 3). The logistic regression models had specificities ranging from 0.94-0.98, sensitivities ranging from 0.27-0.54 and PPVs of 0.89-0.91. LASSO regression models had specificities ranging from 0.98-0.99, sensitivities of 0.20-0.29, and PPVs of 0.90-0.95. The random forest models had specificities ranging from 0.96-0.98, sensitivities of 0.29-0.48, and PPVs of 0.90-0.94.
Conclusion: Using a context-driven general lexicon for financial insecurity, NLP enabled the development of algorithms to classify individuals with terms or phrases indicative of financial insecurity in free-text EHR notes. These models with high positive predictive values could be leveraged to identify patients with this SDoH for future health equity interventions.
Table 1. Terms and Phrases Selected for Use in Models to Classify Individuals as Ever Financially Insecure Based on Their Presence in Clinical Notes
Table 2: Baseline Characteristics of Patients with Rheumatic of Musculoskeletal Conditions in the Training and Validation Cohorts (N=538)
Table 3: Model Performance for Definite and Definite and Possible Financial Insecurity
M. Chandler: None; T. Cai: None; L. Santacroce: None; S. Ulysse: None; K. Liao: UCB, 2; C. Feldman: BMS Foundation, 5, Curio Bioscience, 12, My husband is one of the founders and will receive equity (but has not received anything to date)., OM1, Inc., 2, Pfizer, 5.