loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Fourth IEEE International Conference on Data Mining (ICDM'04)
Dependencies between Transcription Factor Binding Sites: Comparison between ICA, NMF, PLSA and Frequent Sets
Brighton, United Kingdom
November 01-November 04
ISBN: 0-7695-2142-8
Heli Hiisil?, Helsinki University of Technology, Finland
Ella Bingham, Helsinki University of Technology, Finland
Gene expression of eucaryotes is regulated through transcription factors, which are molecules able to attach to the binding sites in the DNA sequence. These binding sites are small pieces of DNA usually found upstream from the gene they regulate. As the binding sites play an important role in the gene expression, it is of interest to find out their characteristics.
In this paper we look for dependencies and independencies between these binding sites using independent component analysis (ICA), non-negative matrix factorization (NMF), probabilistic latent semantic analysis (PLSA) and the method of frequent sets. The data used are human gene upstream regions and possible binding sites listed in a biological database. Also, results on the baker's yeast (S.Cerevisiae) upstream regions are briefly discussed for comparison.
ICA, NMF and PLSA are latent variable methods that decompose the observed data into smaller components. Of these, ICA and NMF were originally aimed for continuous data. We show that these methods can be successfully used on discrete DNA data as well. PLSA and the method of frequent sets were created for discrete data sets.
The above methods reveal partially overlapping sets of possible binding sites such that the binding sites within a set are dependent of each other. The methods of frequent sets and NMF give a good overview of the most common data structures, whereas using ICA and PLSA we find large sets that are surprisingly frequent. That is, sets of very frequently occurring possible binding sites can be found near hundreds or thousands of genes; also interesting but less frequent ones co-occur surprisingly often.
Citation:
Heli Hiisil?, Ella Bingham, "Dependencies between Transcription Factor Binding Sites: Comparison between ICA, NMF, PLSA and Frequent Sets," icdm, pp.114-121, Fourth IEEE International Conference on Data Mining (ICDM'04), 2004
Usage of this product signifies your acceptance of the Terms of Use.