Loading Events
Biostats seminar graphic
  • This event has passed.

BHDS Plan B Presentation with Cheng-Chang Wu

May 29, 2024 @ 10:00 am - 11:00 am CDT

Location: hybrid: in person at University Office Plaza, Room 240 or zoom

Comparative Evaluation of Missing Data Imputation for Omics Data

Presented by Cheng-Chang Wu
Masters Candidate in Biostatistics

Plan B Adviser: Eric Lock

Abstract: Missing data is a common challenge for the analysis of many molecular “omics” datasets (e.g., genomics, metabolomics, proteomics). This challenge is particularly significant in metabolomics and other mass-spectrometry based technologies, primarily due to the prevalence of informative missingness, such as data Missing Due to Limit of Detection (MLOD). This study evaluated two widely-used imputation methods, SoftImpute and KNNimpute, across a variety of parameter settings, for estimating missing values in simulated datasets under diverse conditions. These conditions included varying data dimensions, signal-to-noise ratios, missing data proportions, and missingness mechanisms (Missing Completely at Random and MLOD). Additionally, the methods were applied to a real metabolomics dataset from the MILK-OMICS study.

We show that the SVD-based SoftImpute consistently outperforms the neighborhood-based KNNimpute across all simulations, likely due to its ability to capture the low-rank structure inherent in the simulated data. SoftImpute’s performance is further influenced by the adopted tuning approaches and parameter settings. Notably, the nuclear norm regularization proves effective in providing stable solutions and mitigating over-fitting issues, especially under scenarios with weak signal strength. While fine-tuning the rank of the approximation for imputations may yield superior results, it also exhibits instability without nuclear norm penalization. These findings underscore the importance of parameter settings and tuning approaches for SoftImpute to ensure optimal imputation accuracy. Although simulations highlight SoftImpute’s advantages, the application to real metabolomics data reveals limitations of current methods in handling missingness induced by detection limit. Nonetheless, these insights can guide the development of more effective imputation strategies suited to the challenges of omics data analysis.

Keywords: Missing data imputation, Omics data, Metabolomics, SoftImpute, KNNimpute, Simulations, Low-rank approximation, Missing due to limit of detection, Nuclear norm regularization

Audience for this event: All are welcome

Share Event:

Contact

We strive to host inclusive and accessible events. To request accommodations or additional information, please contact biostats@umn.edu

© 2015 Regents of the University of Minnesota. All rights reserved. The University of Minnesota is an equal opportunity educator and employer. Privacy Statement