In large epidemiological studies, a comprehensive analysis of a disease process often involves several statistical sub-models that describe different aspects of the covariates and the disease outcome. A final prediction model for the disease can be constructed by incorporating all the influential covariates and sub-models, so that meaningful clinical interpretations could be obtained. Existing statistical machine learning methods lack a systematic approach for incorporating all these influential covariates and sub-models in a biologically meaningful way. We describe a knowledge-guided machine learning (KGML) procedure to construct a comprehensive statistical model for predicting the distributions of time-to-event outcomes with longitudinal covariates. This procedure combines several statistical machine learning approaches with the biomedical knowledge established in the literature. We apply our procedure to the Coronary Artery Risk Development in Young Adults (CARDIA) study and demonstrate that this procedure leads to novel insights into the effects of longitudinal risk factors on the distributions of incident cardiovascular disease (CVD). We demonstrate the appropriateness of our procedure through a simulation study.
I | Attachment | Action | Size | Date | Who | Comment |
---|---|---|---|---|---|---|
![]() |
Colin_Wu_Bio_1.17.24.jpg | manage | 23 K | 08 Jan 2024 - 21:13 | CierraStreeter |