2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) (2018)
Washington, DC, USA
May 1, 2018 to May 4, 2018
Recently, large-scale networks attract significant attention to analyze and extract the hidden information of big data. Toward this end, graph embedding is a method to embed a high dimensional graph into a much lower dimensional vector space while maximally preserving the structural information of the original network. However, effective graph embedding is particularly challenging when massive graph data are generated and processed for real-time applications. In this paper, we address this challenge and propose a new real-time and distributed graph embedding algorithm (RTDGE) that is capable of distributively embedding a large-scale graph in a streaming fashion. Specifically, our RTDGE consists of the following components: (1) a graph partition scheme that divides all edges into distinct subgraphs, where vertices are associated with edges and may belong to several subgraphs; (2) a dynamic negative sampling (DNS) method that updates the embedded vectors in real-time; and (3) an unsupervised global aggregation scheme that combines all locally embedded vectors into a global vector space. Furthermore, we also build a real-time distributed graph embedding platform based on Apache Kafka and Apache Storm. Extensive experimental results show that RTDGE outperforms existing solutions in terms of graph embedding efficiency and accuracy.
Big Data, data encapsulation, distributed processing, embedded systems, graph theory, network theory (graphs), sampling methods, vectors
W. Liu, H. Li and B. Xie, "Real-Time Graph Partition and Embedding of Large Network," 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), Washington, DC, USA, 2018, pp. 432-441.