Abstract—Data science methods and approaches address all stages of transition from data to knowledge and action. Visualization of this data is essential for human understanding of the subject under study, analytical reasoning about it, and generating new knowledge. Geographic data science deals with data that incorporates spatial and, often, temporal elements. The articles selected for this special issue represent a mix of theoretical approaches and novel applications of geographic data science.
Keywords—computer graphics; computer graphics research; geographic data science; data visualization; geographic information science; visual analytics
Data science is an emerging area of work concerned with the task of extracting useful information and gaining insight from large data collections. Methods that scale to big data in terms of volume, variety, velocity, and veracity are of particular interest in data science. Data science methods and approaches address all stages of transition from data to knowledge and action, including data acquisition, cleaning and processing, information extraction, integration and representation, data analysis, and knowledge extraction and explanation.
Visualization of data is essential for human understanding of the subject under analysis, analytical reasoning about it, and generating new knowledge. Interactive visual interfaces support the human cognitive processes by allowing analysts to look at a subject from different perspectives and at different scales and levels of detail, link diverse pieces of information, and direct and control the work of computational analytical tools. Therefore, visual analytics approaches play an important role in data science.Visualization of data is essential for human understanding of the subject under analysis, analytical reasoning about it, and generating new knowledge.
More specifically, geographic data science deals with data that incorporates spatial and, often, temporal elements. In 2010, Gennady Andrienko and his colleagues defined the research agenda for spatiotemporal visual analytics, pointing out the unique properties of space and time that necessitate specific approaches to analyzing data with spatial and temporal components.1 Thus, spatial and temporal dependence (autocorrelation) enable interpolation and extrapolation, which can be used to fill gaps in incomplete data and derive plausible estimates beyond the areas and/or time periods represented in available data; the integration of information of different types and from different sources using references to common locations and/or time units; spatial and temporal inference; and many other operations.
The effects of the spatial and temporal dependences are not absolute, however. Geographic space consists of places with diverse properties, and spatial dependence is weakened by this heterogeneity and by natural or artificial barriers that often exist between places. In 2017, a strategic paper by Alan MacEachren called for geo-visual analytics approaches for defining and characterizing places based on multiple heterogeneous and interconnected data types and sources.2
Time can be considered a linearly ordered set of moments or intervals as well as a system of recurring time cycles: daily, weekly, annual, and domain-specific cycles. Therefore, temporal dependence is more complex than just a correlation between close time moments along a timeline because it also includes correlations between corresponding positions in different time cycles. For example, there may be more similarity between the mornings of different days than between the morning and noon time of the same day. As with barriers in space, temporal dependence could also be interrupted by various events. Thus, we must properly take into account both the existence of spatial and temporal dependences and the possibility of distortion or interruption of these dependences.
Traditionally, geographic analysis strongly relied on the use of visual (mostly cartographic) representations, but traditional approaches are now severely challenged by the volume, variety, complexity, dynamics, and other properties of the data requiring analysis. Advances in sensor and positioning technologies in recent years have facilitated an unprecedented growth in the collection of spatially and temporally referenced data. Examples of big geographic data sources include aerial and terrestrial laser scanning, remote sensing imagery, weather data, data streams from geosensor networks, and tracks of various objects moving by land, sea, and air. The massive volumes of collected data contain complex, yet implicit spatial, temporal, and semantic interrelations that are waiting to be uncovered and made explicit.
One of the most challenging problems in geographic data science is the need to assess the data quality, suitability, and distribution of the data available for analysis.3,4 Data-quality issues, structure, and feature relationships can often be revealed by appropriate visualizations. In spatiotemporal data, this might involve the detection of misaligned temporal resolutions, temporal regularity or irregularity, the presence of temporal gaps, varying spatial resolutions, the presence of spatial gaps, issues concerning the identities of moving objects, properties related to the data-collection method, positioning errors, and others. There is a pressing need for theoretical and methodological development in this direction. A related issue is the visualization of data uncertainty in space5 and time.6
A common approach in geographic data science is the integration of multiple data sets characterized by different spatial and temporal references, at multiple scales and resolutions. Several articles in this special issue propose specific data transformations that allow us to address challenging practical problems. Generally, the articles selected for this special issue represent a mix of theoretical approaches and novel applications of geographic data science.
“Typology of Uncertainty in Static Geolocated Graphs for Visualization” by Tatiana von Landesberger, Sebastian Bremm, and Marcel Wunderlich addresses the important problem of visualizing uncertain geographic information, specifically considering geolocated graphs. The authors consider major types of graph uncertainty (node and edge uncertainty) and their interplay (uncertainty of node-edge relationships and structural implications of uncertainty).
The article “ANALYTiC: An Active Learning System for Trajectory Classification” by Amílcar Soares Júnior, Chiara Renso, and Stan Matwin follows the active learning paradigm for semantic labeling of trajectories. Starting with a manually labeled set of trajectories and their features, the proposed method identifies examples that are hard to label automatically and presents them visually to the user. User input drives the following semantic labeling process.
In the article “Impact of Spatial Scales on the Intercomparison of Climate Scenarios,” Wei Luo, Michael Steptoe, Zheng Chang, Robert Link, Leon Clarke, and Ross Maciejewski propose a coordinated multiple views interface for exploring differences between outputs of different climate modeling scenarios. A map interface allows the user to select specific areas of interest. Hierarchical clustering highlights differences between scenarios with respect to attribute groups.
“Urban Space Explorer: A Visual Analytics System for Understanding Urban Planning” by Alireza Karduni, Isaac Cho, Ginette Wessel, William Ribarsky, Eric Sauda, and Wenwen Dou proposes a coordinated multiple views interface for exploring topics in social media posts in relation to the population distribution and the locations of points of interest. Assuming that social media contributors travel on the street network, the proposed system infers the transportation modes, estimates travel times, and assesses traffic flows. Potential users (urban planners) found the multiscale analysis of places to be a useful feature. They expressed interest in the analysis of multiple related geographic layers, such as weather or car accidents.
Finally, in the article “Name Profiler Toolkit,” Feng Wang, Brett Hansen, Ryan Simmons, and Ross Maciejewski describe an approach that supports the exploration of spatial distributions of first and last names, their co-occurrence, and relationships with additional data, such as wealth distribution. The power of the system is in fusing data from multiple sources, thus helping users to investigate interesting place-name-age-income relationships.
We expect these articles to help showcase some of the new strands of research in geographic data science. Further, we hope this special issue will stimulate widespread interest among scientists and practitioners in exploring this emerging and exciting field.