Loading Events
Biostats seminar graphic
  • This event has passed.

Data Thinning (and beyond) to Avoid Double Dipping

Wednesday, February 25 @ 3:00 pm - 4:00 pm CST

Location: Virtual

Presented by Anna Neufeld
Assistant Professor of Statistics
Williams College

While classical statistical methods are designed for testing hypotheses about pre-specified models, the reality of modern science is that analysts often explore their data before coming up with models and hypotheses of interest. We refer to the practice of using the same data to generate and then test a hypothesis, or to fit and then evaluate a model, as double dipping. Problems arise when standard statistical procedures are applied in settings that involve double dipping. Often, we avoid double dipping by splitting our observations into a training set and a test set. While this sample splitting approach is straightforward and easy to understand, it is generally unapplicable in unsupervised settings. Motivated by unsupervised problems that arise in the analysis of single-cell RNA sequencing data, we propose data thinning, an alternative to sample splitting that splits each observation in a dataset into two independent pieces. We show that this method provides an elegant solution to our motivating problems under distributional assumptions and discuss extensions that can be used when those assumptions are not met.

A seminar tea will be held at 2:45 p.m. in University Office Plaza, Room 240.

Audience for this event: All are welcome

Share Event:

Contact

We strive to host inclusive and accessible events. To request accommodations or additional information, please contact biostats@umn.edu

© 2015 Regents of the University of Minnesota. All rights reserved. The University of Minnesota is an equal opportunity educator and employer. Privacy Statement