Name: Data Thinning (and beyond) to Avoid Double Dipping
Start: 2026-02-25T15:00:00-06:00
End: 2026-02-25T16:00:00-06:00
Location: Virtual

Presented by Anna Neufeld
Assistant Professor of Statistics
Williams College

While classical statistical methods are designed for testing hypotheses about pre-specified models, the reality of modern science is that analysts often explore their data before coming up with models and hypotheses of interest. We refer to the practice of using the same data to generate and then test a hypothesis, or to fit and then evaluate a model, as double dipping. Problems arise when standard statistical procedures are applied in settings that involve double dipping. Often, we avoid double dipping by splitting our observations into a training set and a test set. While this sample splitting approach is straightforward and easy to understand, it is generally unapplicable in unsupervised settings. Motivated by unsupervised problems that arise in the analysis of single-cell RNA sequencing data, we propose data thinning, an alternative to sample splitting that splits each observation in a dataset into two independent pieces. We show that this method provides an elegant solution to our motivating problems under distributional assumptions and discuss extensions that can be used when those assumptions are not met.

A seminar tea will be held at 2:45 p.m. in University Office Plaza, Room 240.

Audience for this event: All are welcome

School of Public Health

Events Calendar

Data Thinning (and beyond) to Avoid Double Dipping

Details

Venue

Organizer

Related Events

Data Thinning (and beyond) to Avoid Double Dipping

Contact

Details

Venue

Organizer

Related Events

Integrating Transcriptomic Signals into CALDERA to Infer Causal Gene

CHAI Aging Working Group

UMN SPH: MPH Financial Advising Drop-ins