Feng Xie, Philip Chung, Jonathan D. Reiss, Erico Tjoa, Davide De Francesco, Thanaphong Phongpreecha, William Haberkorn, Dipro Chakraborty, Alan Lee Chang, Tomin James, Yeasul Kim, Samson Mataraso, Camilo Espinosa, Liu Yang, Chi-Hung Shu, Lei Xue, Eloïse Berson, Neshat Mohammadi, Sayane Shome, S Momsen Reincke, Marc Ghanem, Ivana Maric, Brice Gaudilliere, Martin S Angst, Karl Sylvester, Gary M Shaw, Lawrence S Prince, David K Stevenson, Nima Aghaeepour
Stanford University and Stanford University School of Medicine.
United States
Lancet Digital Health
Lancet Digit Health 2025;
DOI: 10.1016/j.landig.2025.100926
Abstract
Background: Early identification and monitoring of neonatal morbidities are critical for timely interventions that can prevent complications, optimise resource use, and support families. Although traditional tools based on tabular data and biomarkers are beneficial, they are restricted in assessing the risk of morbidities in newborns. In this study, we developed NeonatalBERT, a pre-trained large language model (LLM) that estimates the risk of neonatal morbidities from clinical notes.
Methods: This prognostic study investigated retrospective primary and external cohorts from two different quaternary-care academic medical centres in the USA: Stanford Health Care and Beth Israel Deaconess Medical Center. NeonatalBERT was initially pre-trained on clinical notes from the primary cohort and then fine-tuned separately for both cohorts. NeonatalBERT was also compared against other existing LLMs, such as BioBERT and Bio-ClinicalBERT, as well as traditional machine learning and logistic regression models using tabular features. NeonatalBERT was evaluated on 19 neonatal morbidities (respiratory distress syndrome, bronchopulmonary dysplasia, pulmonary haemorrhage, pulmonary hypertension, atelectasis, aspiration syndrome, intraventricular haemorrhage, periventricular leukomalacia, neonatal seizures, other CNS disorders, patent ductus arteriosus, cardiovascular instability, sepsis, candidiasis, anaemia, jaundice, necrotising enterocolitis, retinopathy of prematurity, and death) for the primary cohort and ten for the external cohort (respiratory distress syndrome, bronchopulmonary dysplasia, pulmonary haemorrhage, intraventricular haemorrhage, patent ductus arteriosus, sepsis, jaundice, necrotising enterocolitis, retinopathy of prematurity, and death). For each outcome, the area under the receiver operating characteristic curve, area under the precision-recall curve (AUPRC), and F1 scores were evaluated.
Findings: 32 321 newborns were included in the primary cohort, including 27 411 in the primary training set (mean gestational age 38·64 weeks [SD 2·30]; 13 056 [47·6%] female and 14 355 [52·4%] male newborns) and 4910 in the primary testing set (mean gestational age 38·64 [2·13] weeks; 2336 [47·6%] female and 2574 [52·4%] male newborns). Additionally, 7061 newborns were selected into the external cohort, including 5653 in the external training set (1567 [27·7%] premature and 4086 [72·3%] term births; 2614 [46·2%] female and 3039 [53·8%] male newborns) and 1408 in the external testing set (383 [27·2%] premature and 1025 [72·8%] term births; 624 [44·3%] female and 784 [55·7%] male newborns). In the primary cohort, the mean AUPRC over 19 outcomes was 0·291 (95% CI 0·268-0·314) for NeonatalBERT, 0·238 (0·217-0·259) for Bio-ClinicalBERT, 0·217 (0·197-0·236) for BioBERT, and 0·194 (0·177-0·211) for the traditional model using tabular data. In the external cohort, NeonatalBERT had a mean AUPRC of 0·360 (0·328-0·393), outperforming other models with the range of 0·224-0·333.
Interpretation: Based on validation using two large-scale US datasets, NeonatalBERT effectively estimates the risk of neonatal morbidities from unstructured clinical notes of newborns. The promising results from this study show the potential of NeonatalBERT to enhance neonatal care and streamline hospital operations.
Category
Mechanical and Computer Models of Pulmonary Vascular Disease and Therapy
Class I. Persistent Pulmonary Hypertension of the Newborn
Class III. Pulmonary Hypertension Associated with Lung Disease
Age Focus: Pediatric Pulmonary Vascular Disease
Fresh or Filed Publication: Fresh (PHresh). Less than 1-2 years since publication
Article Access
Free PDF File or Full Text Article Available Through PubMed or DOI: Yes
