Charles Spanbauer, a Postdoctoral Associate at the University of Minnesota and candidate for a faculty position in the Division of Biostatistics, will present:
“Bayesian Regression Tree Models for Nonparametric Instrumental Variable Regression With High Dimensionality Applied to Genome Wide Causal Inference”
Abstract: Recently, Bayesian Additive Regression Trees (BART) has emerged as a leading method for obtaining high-quality and reliable predictions in a variety of fields, including finance, engineering, economics, and biomedical research, particularly statistical genetics. Because of this utilization, researchers have developed many extensions for BART to handle increasingly complex data and modeling situations, including causal inference. In that spirit, we develop a BART-based model for flexible nonparametric instrumental variables (IV) regression. The flexibility of BART allows this method to appropriately model a nonlinear exposure-outcome relationship. Additionally, any measured covariates interacting with the exposure can also be modeled so that the causal relationship could be individualized. One particularly interesting avenue of application for this method is transcriptome-wide association studies (TWAS). Traditional genome-wide association methods correlate genetic expression data to an outcome, however expression levels are well-known to be confounded by environmental factors. TWAS is an IV method utilizing genetic sequencing data in the form of single-nucleotide polymorphisms (SNPs) as instruments to ascertain the true causal effect of expression level, decoupling the expression effect from any environmental confounders. In addition to the methodology itself, we present additional extensions and strategies necessary to apply this method to TWAS. These include: how to obtain a sparse exposure signal in a high-dimensional setting, how to compare competing models based on leave one out cross-validation and the pseudo Bayes Factor, and how to control multiplicities across the entire genome using the false Bayesian discovery rate. Additionally, simulation studies and the results of applying our method across the entire genome using hippocampus volume data from the Alzheimer’s Disease Neuroimaging Initiative are presented.
All are welcome.