The Community for Technology Leaders
2016 IEEE 32nd International Conference on Data Engineering (ICDE) (2016)
Helsinki, Finland
May 16, 2016 to May 20, 2016
ISBN: 978-1-5090-2020-1
pp: 1410-1413
Jia Yu , School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, 85281, United States
Jinxuan Wu , School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, 85281, United States
Mohamed Sarwat , School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, 85281, United States
ABSTRACT
This paper demonstrates GEOSPARK a cluster computing framework for developing and processing large-scale spatial data analytics programs. GEOSPARK consists of three main layers: Apache Spark Layer, Spatial RDD Layer and Spatial Query Processing Layer. Apache Spark Layer provides basic Apache Spark functionalities as regular RDD operations. Spatial RDD Layer consists of three novel Spatial Resilient Distributed Datasets (SRDDs) which extend regular Apache Spark RDD to support geometrical and spatial objects with data partitioning and indexing. Spatial Query Processing Layer executes spatial queries (e.g., Spatial Join) on SRDDs. The dynamic status of SRDDs and spatial operations are visualized by GEOSPARK monitoring map interface. We demonstrate GEOSPARK using three spatial analytics applications (spatial aggregation, autocorrelation and co-location) to show how users can easily define their spatial analytics tasks and efficiently process such tasks on large-scale spatial data at interactive performance.
INDEX TERMS
Spatial databases, Sparks, Correlation, Monitoring, Heating, Vegetation, Data analysis
CITATION

J. Yu, J. Wu and M. Sarwat, "A demonstration of GeoSpark: A cluster computing framework for processing big spatial data," 2016 IEEE 32nd International Conference on Data Engineering (ICDE), Helsinki, Finland, 2016, pp. 1410-1413.
doi:10.1109/ICDE.2016.7498357
164 ms
(Ver 3.3 (11022016))