Presented by Changfu Chen
Masters Candidate in Biostatistics
Plan B Adviser: Dr. Jue (Marquis) Hou
In Electronic Health Record (EHR) studies where exact event times are unavailable, only current status information is observed for a limited labeled subset and patient characteristics may be high-dimensional, semiparametric transformation models provide a flexible framework for risk prediction, but variable selection and estimation become challenging in such setting. The method estimates regression coefficients and an unknown monotone transformation function jointly, while a batched stagewise strategy reduces computation by performing multiple small coefficient updates between transformation updates. This preserves the slow-learning regularization effect of stagewise procedures while substantially lowering computational cost. Regularization is controlled through an L∞-type gradient threshold and tuned by cross validation or a 1SE rule. Simulations demonstrate competitive estimation, selection, and prediction performance together with favorable scalability in high-dimensional settings. We illustrate the proposed methods by developing a genetic risk prediction model for Rheumatoid Arthritis (RA) using data from All of Us Research Program.


