A comprehensive study on machine learning models combining with oversampling for bronchopulmonary dysplasia-associated pulmonary hypertension in very preterm infants

Dan Wang, Shuwei Huang, Jingke Cao, Zhichun Feng, Qiannan Jiang, Wanxian Zhang, Jia Chen, Shelby Kutty, Changgen Liu, Wenyu Liao, Le Zhang, Guli Zhu, Wenhao Guo, Jie Yang, Lin Liu, Jingwei Yang, Qiuping Li
Seventh Medical Center of PLA General Hospital. Second School of Clinical Medicine, Southern Medical University. Hunan Children’s Hospital. Tsinghua University. Qingdao Women and Children’s Hospital. Tianjin Central Hospital of Gynecology Obstetrics. Guangdong Women and Children Hospital. Johns Hopkins School of Medicine. BNU-HKBU United International College. Tsinghua University.
China and United States

Respiratory Research
Respir Res 2024; 25:
DOI: 10.1186/s12931-024-02797-z

Abstract
Background: Bronchopulmonary dysplasia-associated pulmonary hypertension (BPD-PH) remains a devastating clinical complication seriously affecting the therapeutic outcome of preterm infants. Hence, early prevention and timely diagnosis prior to pathological change is the key to reducing morbidity and improving prognosis. Our primary objective is to utilize machine learning techniques to build predictive models that could accurately identify BPD infants at risk of developing PH.
Methods: The data utilized in this study were collected from neonatology departments of four tertiary-level hospitals in China. To address the issue of imbalanced data, oversampling algorithms synthetic minority over-sampling technique (SMOTE) was applied to improve the model.
Results: Seven hundred sixty one clinical records were collected in our study. Following data pre-processing and feature selection, 5 of the 46 features were used to build models, including duration of invasive respiratory support (day), the severity of BPD, ventilator-associated pneumonia, pulmonary hemorrhage, and early-onset PH. Four machine learning models were applied to predictive learning, and after comprehensive selection a model was ultimately selected. The model achieved 93.8% sensitivity, 85.0% accuracy, and 0.933 AUC. A score of the logistic regression formula greater than 0 was identified as a warning sign of BPD-PH.
Conclusions: We comprehensively compared different machine learning models and ultimately obtained a good prognosis model which was sufficient to support pediatric clinicians to make early diagnosis and formulate a better treatment plan for pediatric patients with BPD-PH.

Category
Class III. Pulmonary Hypertension Associated with Lung Disease
Diagnostic Testing for Pulmonary Vascular Disease. Risk Stratification

Age Focus: Pediatric Pulmonary Vascular Disease

Fresh or Filed Publication: Fresh (PHresh). Less than 1-2 years since publication

Article Access
Free PDF File or Full Text Article Available Through PubMed or DOI: Yes

Scroll to Top