A parametric joint model of DNA-protein binding, gene expression and DNA sequence data to detect target genes of a transcription factor

Pac Symp Biocomput. 2008:465-76.

Abstract

This paper concerns with predicting the regulatory targets of a transcription factor (TF). We propose and study a joint model that combines the use of DNA-protein binding, gene expression and DNA sequence data simultaneously; a parametric mixture model is used to realize unsupervised learning, which however can be extended to semi-supervised learning too. We applied the methods to an E coli dataset to identify the target genes of LexA, which, along with applications to simulated data, demonstrated potential gains of jointly modeling multiple types of data over using only one type of data.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Artificial Intelligence
  • Bacterial Proteins / metabolism
  • Base Sequence
  • Computational Biology
  • DNA / genetics*
  • DNA / metabolism*
  • DNA, Bacterial / genetics
  • DNA, Bacterial / metabolism
  • Databases, Genetic
  • Databases, Protein
  • Escherichia coli Proteins / metabolism
  • Gene Expression
  • Genes, Bacterial
  • Models, Biological*
  • Models, Genetic
  • Protein Binding
  • Regulon
  • Serine Endopeptidases / metabolism
  • Transcription Factors / metabolism*

Substances

  • Bacterial Proteins
  • DNA, Bacterial
  • Escherichia coli Proteins
  • LexA protein, Bacteria
  • Transcription Factors
  • DNA
  • Serine Endopeptidases