loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Sixth IEEE International Conference on Data Mining (ICDM'06)
GraphRank: Statistical Modeling and Mining of Significant Subgraphs in the Feature Space
Hong Kong
December 18-December 22
ISBN: 0-7695-2701-9
Huahai He, University of California, Santa Barbara, USA
Ambuj K. Singh, University of California, Santa Barbara, USA
We propose a technique for evaluating the statistical significance of frequent subgraphs in a database. A graph is represented by a feature vector that is a histogram over a set of basis elements. The set of basis elements is chosen based on domain knowledge and consists generally of vertices, edges, or small graphs. A given subgraph is transformed to a feature vector and the significance of the subgraph is computed by considering the significance of occurrence of the corresponding vector. The probability of occurrence of the vector in a random vector is computed based on the prior probability of the basis elements. This is then used to obtain a probability distribution on the support of the vector in a database of random vectors. The statistical significance of the vector/subgraph is then defined as the p-value of its observed support. We develop efficient methods for computing p-values and lower bounds. A simplified model is further proposed to improve the efficiency. We also address the problem of feature vector mining, a generalization of itemset mining where counts are associated with items and the goal is to find significant sub-vectors. We present an algorithm that explores closed frequent sub-vectors to find significant ones. Experimental results show that the proposed techniques are effective, efficient, and useful for ranking frequent subgraphs by their statistical significance.
Citation:
Huahai He, Ambuj K. Singh, "GraphRank: Statistical Modeling and Mining of Significant Subgraphs in the Feature Space," icdm, pp.885-890, Sixth IEEE International Conference on Data Mining (ICDM'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.