ISSN: 0162-8828

Marco Saerens , Université catholique de Louvain (UCL), Belgium

Masashi Shimbo , Nara Institute of Science and Technology, Takayama

Amin Mantrach , Yahoo! Labs, Barcelona

Silvia Garcia-Diez , Université catholique de Louvain (UCL), Belgium

Mathieu Senelle , Université catholique de Louvain (UCL), Belgium

François Fouss , Université catholique de Louvain (UCL), Belgium

ABSTRACT

This work introduces a novel nonparametric density index defined on graphs, the Sum-over-Forests (SoF) density index. It is based on a clear and intuitive idea: high-density regions in a graph are characterized by the fact that they contain a large amount of low-cost trees with high outdegrees while low-density regions contain few ones. Therefore, a Boltzmann probability distribution on the countable set of forests in the graph is defined so that large (high-cost) forests occur with a low probability while short (low-cost) forests occur with a high probability. Then, the SoF density index of a node is defined as the expected outdegree of this node on the set of forests, thus providing a measure of density around that node. Following the matrix-forest theorem and a statistical physics framework, it is shown that the SoF density index can be easily computed in closed form through a simple matrix inversion. Experiments on artificial and real data sets show that the proposed index performs well on finding dense regions, for graphs of various origins.

INDEX TERMS

Indexes, Equations, Vegetation, Probability distribution, Physics, Correlation, Mining methods and algorithms, Trees, Graph Theory, Discrete Mathematics, Mathematics of Computing, Data mining

CITATION

Marco Saerens, Masashi Shimbo, Amin Mantrach, Silvia Garcia-Diez, Mathieu Senelle, François Fouss, "The Sum-over-Forests Density Index: Identifying Dense Regions in a Graph",

*IEEE Transactions on Pattern Analysis & Machine Intelligence*, vol. , no. , pp. 0, 5555, doi:10.1109/TPAMI.2013.227