Edward McFowland III, of the Information & Decision Sciences Department at the University of Minnesota, will present:
Abstract: The randomized experiment is an important tool for inferring the causal impact of an intervention. The most common analysis conducted in this context is the estimation of the average treatment effect (ATE). However, the recent literature on heterogeneous treatment effects demonstrates the utility of estimating the marginal conditional average treatment effect (MCATE), i.e., the average treatment effect for a subpopulation of respondents who share a particular subset of covariates. Additionally, the literature proposes the use of data mining methods to estimate the exponential number (in covariate size) of MCATEs that exist in the data. However, each proposed method makes its own set of (restrictive) assumptions about the intervention’s affect, the underlying data generating processes, and which subpopulations (MCATEs) to explicitly estimate. Moreover, the majority of the literature provides no mechanism to identify which subpopulations are the most affected–beyond manual inspection–and provides little guarantee on the correctness of the identified subpopulations. Therefore, we propose Treatment Effect Subset Scan (TESS), a new method for discovering which subpopulation in a randomized experiment is most significantly affected by a treatment. We frame the affected subpopulation discovery challenge as a pattern detection problem where we maximize a nonparametric scan statistic (measurement of distributional divergence) over all subpopulations, while being parsimonious in which specific subpopulations’ effects to estimate. Furthermore, we identify the subpopulation which experiences the largest distributional change as a result of the intervention, while making minimal assumptions about the intervention’s affect or the underlying data generating process. In addition to the algorithm, we provide finite sample statistical bounds on its error and detection power, and provide sufficient conditions for detection consistency–i.e., exact identification of affected subpopulation. Finally, we validate the efficacy of the method by discovering heterogeneous treatment effects in simulations and in a real-world dataset from a well-known program evaluation study.
A social tea will be held at 3:00 p.m. in A434 Mayo. All are Welcome.