Jue Hou, of the Department of Biostatistics at Harvard T.H. Chan School of Public Health and candidate for a faculty position in the Division of Biostatistics, will present:
“Winning Strategy in Real-world Evidence — Intelligent Medicine with Electronic Health Records”
Abstract: Electronic Health Records (EHR) is a rich resource for research in medicine and health. The routinely collected EHR data facilitates large scale real-world studies with shorter study cycle and less labor than traditional studies. To realize EHR’s potential in supplementing traditional studies, we need scalable tools to extract clinical information from EHR and the downstream analyses robust to imprecise extraction. In the talk, I present two of my recent projects on developing methods for such robust analyses. To anchor the analysis, gold-standard labels are generated from manual annotation for a small subset whose size is limited by the cost of annotation. The problem can be formulated as the semi-supervised surrogate assisted (SAS) learning setting where the variable of interest is captured by imprecise surrogates with accurate labels for a small subset. The first project addressed the risk prediction under high-dimensional regression with imprecise response. Utilizing the large unlabeled data with predictive surrogates, we developed the SAS estimation and inference for individual risk under virtually no sparsity assumption regarding the risk model and number of labels. Our method demonstrated significant improvement from the supervised benchmark using labeled data alone in the application on the genetic risk prediction of type 2 diabetes mellitus. The second project addressed the estimation of average treatment effect with imprecise treatment and response. We characterized the efficiency under semi-supervised learning beyond classical regularity and proposed a convenient semi-supervised multiple machine learning (SMMAL) framework for efficient SAS estimation. Simulation showed SMMAL is as reliable as supervised benchmark while being more efficient. In either project, the proposed method does not require any model specification involving the imprecise surrogates.
All are Welcome.