|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
2011 IEEE 27th International Conference on Data Engineering
Mining large graphs: Algorithms, inference, and discoveries
Hannover, Germany
April 11-April 16
ISBN: 978-1-4244-8959-6
| ASCII Text | x | ||
| U Kang, Duen Horng Chau, Christos Faloutsos, "Mining large graphs: Algorithms, inference, and discoveries," Data Engineering, International Conference on, pp. 243-254, 2011 IEEE 27th International Conference on Data Engineering, 2011. | |||
| BibTex | x | ||
| @article{ 10.1109/ICDE.2011.5767883, author = {U Kang and Duen Horng Chau and Christos Faloutsos}, title = {Mining large graphs: Algorithms, inference, and discoveries}, journal ={Data Engineering, International Conference on}, volume = {0}, year = {2011}, isbn = {978-1-4244-8959-6}, pages = {243-254}, doi = {http://doi.ieeecomputersociety.org/10.1109/ICDE.2011.5767883}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - CONF JO - Data Engineering, International Conference on TI - Mining large graphs: Algorithms, inference, and discoveries SN - 978-1-4244-8959-6 SP243 EP254 A1 - U Kang, A1 - Duen Horng Chau, A1 - Christos Faloutsos, PY - 2011 VL - 0 JA - Data Engineering, International Conference on ER - | |||
How do we find patterns and anomalies, on graphs with billions of nodes and edges, which do not fit in memory? How to use parallelism for such terabyte-scale graphs? In this work, we focus on inference, which often corresponds, intuitively, to "guilt by association" scenarios. For example, if a person is a drug-abuser, probably its friends are so, too; if a node in a social network is of male gender, his dates are probably females. We show how to do inference on such huge graphs through our proposed HAdoop Line graph Fixed Point (Ha-Lfp), an efficient parallel algorithm for sparse billion-scale graphs, using the Hadoop platform. Our contributions include (a) the design of Ha-Lfp, observing that it corresponds to a fixed point on a line graph induced from the original graph; (b) scalability analysis, showing that our algorithm scales up well with the number of edges, as well as with the number of machines; and (c) experimental results on two private, as well as two of the largest publicly available graphs -- the Web Graphs from Yahoo! (6.6 billion edges and 0.24 Tera bytes), and the Twitter graph (3.7 billion edges and 0.13 Tera bytes). We evaluated our algorithm using M45, one of the top 50 fastest supercomputers in the world, and we report patterns and anomalies discovered by our algorithm, which would be invisible otherwise.
Citation:
U Kang, Duen Horng Chau, Christos Faloutsos, "Mining large graphs: Algorithms, inference, and discoveries," icde, pp.243-254, 2011 IEEE 27th International Conference on Data Engineering, 2011
Usage of this product signifies your acceptance of the Terms of Use.
