2011 IEEE 11th International Conference on Data Mining Workshops (2011)
Dec. 11, 2011 to Dec. 11, 2011
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDMW.2011.84
Understanding how nodes interconnect in large graphs is an important problem in many fields. We wish to find connecting nodes between two nodes or two groups of source nodes. In order to find these connecting nodes in huge graphs, we have devised a highly parallelized variant of a k-shortest path algorithm that levies the power of the Hadoop distributed computing system and HBase distributed key/value store. We show how our system enables previously unobtainable graph analysis by finding these connecting nodes in graphs as large as one billion nodes or more on modest commodity hardware in a time frame of just minutes.
hadoop, distributed computing, shortest paths, bfs, algorithm
A. Rahman, A. Levine, C. McCubbin and B. Perozzi, "Finding the 'Needle': Locating Interesting Nodes Using the K-shortest Paths Algorithm in MapReduce," 2011 IEEE 11th International Conference on Data Mining Workshops(ICDMW), Vancouver, Canada, 2011, pp. 180-187.