Researchers from the University of Minnesota (UMN) School of Public Health (SPH) and the Masonic Institute for the Developing Brain are developing a new tool that will significantly enhance genomic research at the University. Named the UMN Genomic Data Commons (GDC), the project’s developers envision it as a centralized hub for data sharing and for harmonizing genomic datasets. The UMN GDC will focus on developing three specific areas:
- A centralized place to store local and publicly available genomic data, pre-processed, harmonized, and integrated according to principles developed by the National Institutes of Health (NIH) data commons initiative.
- A web interface for end users to access basic summary information about these datasets and submit requests for data analysis.
- Analytic pipelines to perform different genomic analysis utilizing the GDC’s integrated datasets.
The ambitious project is funded by a UMII Seed Grant fund through the UMN Office of the Vice President for Research.
SPH Professor Saonli Basu, who will lead the project, described the GDC as an innovative way to provide a local genomic data hub and set of analytic tools that will be useful for a variety of researchers across the University.
“The GDC will be beneficial for University researchers who would like to access and analyze local and public databases for research, collaboration, and new grant applications, and for students and early-stage investigators who seek preliminary data for their dissertations or their first independent grants,” Basu said. “Moreover, the GDC’s analysis pipelines would assist many researchers and students who require assistance with large-scale genomic analysis.”
Basu said the GDC’s ability to harmonize genomic data will be an invaluable tool for researchers. “Reproducibility is a key component of scientific discovery,” she noted. “We often need to harmonize genomic data across different studies to account for the differences in sequencing techniques or in study design strategies. By providing local researchers an opportunity to access curated and harmonized datasets through our UMN GDC, we hope it will encourage collaboration among UMN researchers and facilitate reproducible research.”
To create the GDC, the team will leverage the talents and resources of several University units, including the Minnesota Supercomputing Institute (MSI), the Masonic Institute for the Developing Brain (MIDB), the Medical School, and the U of M Informatics Institute (UMII). A steering committee with faculty members from the SPH Division of Biostatistics, Medical School, Masonic Institute for the Developing Brain, UMII and MSI will participate in the development of the Genomic Data Commons, prioritizing the project tasks, and ensuring appropriate use of the datasets. Students will also play a role in forming the GDC: SPH Biostatistics PhD students supported by the genomics training program, for example, are developing analysis pipelines to conduct genomic analysis.
Researchers estimate it will take two years to develop the first local repository of genomic data. As far as sharing the results, Basu said they plan to make the resource “as open as possible. We intend to maximize the sharing of information to the extent allowed by the data use agreements for the datasets. In general, all metadata, summary results, and aggregated analyses will be available without the need for specific authorization,” she said.