Efficient Estimation Under Data Fusion
Presented by Sijia (Lucy) Li, Ph.D.
Postdoctoral Research Fellow
Harvard T.H. Chan School of Public Health
* Candidate for faculty position in the Division of Biostatistics and Health Data Science
The rapid expansion of available data has facilitated the use of data fusion, which allows researchers to combine information from many data sources to obtain valid summaries of a target population of interest. For example, technology companies integrate massive unlabeled data with a small amount of labelled data to make accurate predictions. In education, policymakers leverage multiple datasets generated by different current policies to evaluate a new policy. In public health, researchers merge randomized trial data with observational data to transport causal conclusions on treatment effect to different populations. Due to the considerable number of open problems in this area, it is of interest to in develop a general framework and approach that would allow researchers to tackle data fusion problems without limiting themselves to specific parameter of interests, data structures or the number of data sources. In this talk, I will introduce our proposed unified data fusion framework and approach using tools from semiparametric efficiency theory.
A seminar tea will be held at 1:15 p.m. in University Office Plaza, Room 116. All are Welcome.