Featured article thumbnail imageTraditional data mining usually deals with data from a single domain. In the big data era, we face a diversity of datasets from different sources in different domains. These datasets consist of multiple modalities, each of which has a different representation, distribution, scale, and density. How to unlock the power of knowledge from multiple disparate (but potentially connected) datasets is paramount in big data research, essentially distinguishing big data from traditional data mining tasks. This calls for advanced techniques that can fuse knowledge from various datasets organically in a machine learning and data mining task. This paper summarizes the data fusion methodologies, classifying them into three categories: stage-based, feature level-based, and semantic meaning-based data fusion methods. The last category of data fusion methods is further divided into four groups: multi-view learning-based, similarity-based, probabilistic dependency-based, and transfer learning-based methods. These methods focus on knowledge fusion rather than schema mapping and data merging, significantly distinguishing between cross-domain data fusion and traditional data fusion studied in the database community. This paper does not only introduce high-level principles of each category of methods, but also give examples in which these techniques are used to handle real big data problems. In addition, this paper positions existing works in a framework, exploring the relationship and difference between different data fusion methods. This paper will help a wide range of communities find a solution for data fusion in big data projects.

Academics and researchers worldwide continue to produce large numbers of scholarly documents including papers, books, technical reports, etc. and associated data such as tutorials, proposals, and course materials. The abundance of data sources enables researchers to study scholarly collaboration at a very large scale. The ever increasing diversity of disciplines and complexity of research problems, particularly multi-disciplinary research, requires collaboration. Besides the traditional venues of collaboration where scholars typically meet annually at conferences or meetings, the Internet provides a wide range of platforms for scholars to engage with other scholars. These new platforms include academic search-oriented Web engines such as Google Scholar, social media sites such as, ResearchGate and Mendeley, more interactive social sites such as Twitter and Facebook, and Wiki-style virtual collaboration sites. These services allow scholars to share academic resources, exchange opinions, follow each other’s research, keep up with current research trends, and build their professional networks. Researchers increasingly realize that scholarly achievements should not merely be the final published articles. The datasets used in study and many other intermediary results are equally important for supporting research. Therefore, a set of rapidly developing research topics, research data management, data curation/stewardship, data sharing policy, etc. are becoming important issues for research communities. This special issues aims at bringing together researchers with diverse interdisciplinary backgrounds interested in scholarly big data.

In the past decade, we have witnessed the greying of society and the escalating costs of medical managements, which have been the number one concern of most governments. This has heightened the need for preventive healthcare practices that helps to anticipate and prevent the onset of illnesses. On the other hand, with the help of advanced medical devices and social networking services, medical data is more convenient to be acquired, shared, and delivered. As such, medical filed is entering a big data era. When being applied to big medical data applications, lots of the existing tools and systems for big medical data analytics would become questionable. However, the big medical data itself in turn has provided unique opportunity for better wellbeing.

On the other hand, there is a realization that an essential part of long-term healthcare is in adopting a good life style that involves proper exercises and diets. Many companies marketing wearable health sensor products therefore also offer mobile health apps that provide first-order analytics to monitor and track personal life styles. However, the sensing data and the low-level analytics are typically used in isolation without integration to medical knowledge or environmental data, such as weather and pollution. In addition, there are strong links between personalized health sensor data to knowledge of critical illnesses such as Diabetes, Depression or Arthritis, as the long-term cares of these illnesses are related to proper activities and diets. The integration of these sources would usher in a new era of personalized wellness that enables the system and users to work collaboratively towards better wellness and lifestyles.

This special issue aims to link big medical data to sensor and environmental data to support better personalized health and user mobility, especially with respect to critical illnesses. Originality and impact on society, in combination with the innovative technical aspects of the proposed solutions will be the major evaluation criteria.

Urbanization’s rapid progress has modernized people’s lives but also engendered big challenges, such as air pollution, increased energy consumption and traffic congestion. Tackling these challenges can seem nearly impossible years ago given the complex and dynamic settings of cities. Nowadays, sensing technologies and large-scale computing infrastructures have produced a variety of big data in urban spaces, e.g. human mobility, air quality, traffic patterns, and geographical data. The big data contain rich knowledge about a city and can help tackle these challenges when used correctly.

Urban computing is a process of acquisition, integration, and analysis of big and heterogeneous data generated by a diversity of sources in urban spaces, such as sensors, devices, vehicles, buildings, and human, to tackle the major issues that cities face, e.g. air pollution, increased energy consumption and traffic congestion [1][2]. Urban computing connects unobtrusive and ubiquitous sensing technol-ogies, advanced data management and analytics models, and novel visualization methods, to create win-win-win solutions that improve urban environment, human life quality, and city operation systems. Urban computing also helps us understand the nature of urban phenomena and even predict the future of cities.

Data is becoming an increasingly decisive resource in modern societies, economies, and governmental organizations. Big Data is an emerging paradigm encompassing various kinds of complex and large scale information beyond the processing capability of conventional software and databases. Various technologies are being discussed to support the handling of big data such as massively parallel processing databases, scalable storage systems, cloud computing platforms, Hadoop and Spark. Due to the multisource, massive, heterogeneous, and dynamic characteristics of application data involved in a distributed environment, one of the most important characteristics of Big Data is to carry out computing on the petabyte (PB), even the exabyte (EB)-level data with a complex computing process. Therefore, large-scale scalable Big Data Infrastructure with corresponding programming language support and software models for efficient processing in distributed environments such as cloud is on demand.

In this special issue, we invite articles on innovative research to address challenges of Big Data Infrastructure with emerging computing platforms such as heterogeneous clouds, hybrid architectures, Hadoop or Spark with emphasis on addressing real-time requirements imposed by emerging Big Data applications such as sensing data, e-commerce data, business transactions and web logs, and etc.

