Doctoral candidate in Biostatistics, Elise Palzer, will present:
“Multi-source Data Decomposition and Prediction for Various Data Types”
PhD Advisers: Eric Lock and Sandra Safo
Abstract: Analyzing multi-source data, which are multiple views of data on the same subjects, has become increasingly common in molecular biomedical research. Recent methods have sought to uncover underlying structure and relationships within and/or between the data sources, and other methods have sought to build a predictive model for an outcome using all sources. However, existing methods that do both are presently limited because they either (1) only consider data structure shared by all datasets while ignoring structures unique to each source, or (2) they extract underlying structures first without consideration to the outcome. We propose a method called supervised joint and individual variation explained (sJIVE) that can simultaneously (1) identify shared (joint) and source-specific (individual) underlying structure and (2) build a linear prediction model for an outcome using these structures. Furthermore, we extend sJIVE to allow for binary and/or count data and to incorporate sparsity using a method called sparse exponential family sJIVE (sesJIVE). Our R package, sup.r.jive, implements sJIVE, sesJIVE, and a previous method called JIVE-Predict with easy-to-use summary and visualization tools to increase accessibility to these methods.