Gene function prediction by a combined analysis of gene expression data and protein-protein interaction data

J Bioinform Comput Biol. 2005 Dec;3(6):1371-89. doi: 10.1142/s0219720005001612.

Abstract

Prediction of biological functions of genes is an important issue in basic biology research and has applications in drug discoveries and gene therapies. Previous studies have shown either gene expression data or protein-protein interaction data alone can be used for predicting gene functions. In particular, clustering gene expression profiles has been widely used for gene function prediction. In this paper, we first propose a new method for gene function prediction using protein-protein interaction data, which will facilitate combining prediction results based on clustering gene expression profiles. We then propose a new method to combine the prediction results based on either source of data by weighting on the evidence provided by each. Using protein-protein interaction data downloaded from the GRID database, published gene expression profiles from 300 microarray experiments for the yeast S. cerevisiae, we show that this new combined analysis provides improved predictive performance over that of using either data source alone in a cross-validated analysis of the MIPS gene annotations. Finally, we propose a logistic regression method that is flexible enough to combine information from any number of data sources while maintaining computational feasibility.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Computer Simulation
  • Databases, Factual
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation, Fungal / physiology*
  • Models, Biological
  • Protein Interaction Mapping / methods*
  • Saccharomyces cerevisiae / metabolism*
  • Saccharomyces cerevisiae Proteins / metabolism*
  • Signal Transduction / physiology*

Substances

  • Saccharomyces cerevisiae Proteins