IEEE Transactions on Big Data

From the January-March 2015 issue

Methodologies for Cross-Domain Data Fusion: An Overview

By Yu Zheng

Featured article thumbnail imageTraditional data mining usually deals with data from a single domain. In the big data era, we face a diversity of datasets from different sources in different domains. These datasets consist of multiple modalities, each of which has a different representation, distribution, scale, and density. How to unlock the power of knowledge from multiple disparate (but potentially connected) datasets is paramount in big data research, essentially distinguishing big data from traditional data mining tasks. This calls for advanced techniques that can fuse knowledge from various datasets organically in a machine learning and data mining task. This paper summarizes the data fusion methodologies, classifying them into three categories: stage-based, feature level-based, and semantic meaning-based data fusion methods. The last category of data fusion methods is further divided into four groups: multi-view learning-based, similarity-based, probabilistic dependency-based, and transfer learning-based methods. These methods focus on knowledge fusion rather than schema mapping and data merging, significantly distinguishing between cross-domain data fusion and traditional data fusion studied in the database community. This paper does not only introduce high-level principles of each category of methods, but also give examples in which these techniques are used to handle real big data problems. In addition, this paper positions existing works in a framework, exploring the relationship and difference between different data fusion methods. This paper will help a wide range of communities find a solution for data fusion in big data projects.

download PDF View the PDF of this article      csdl View this issue in the digital library

Editorials and Announcements


  • We're pleased to announce that Qiang Yang, head of the Huawei Noah's Ark Research Lab and a professor at the Hong Kong University of Science and Technology, has accepted the position of inaugural Editor-in-Chief beginning 1 Jan. 2015. Read more.


Call for Papers

Special Issue on Urban Computing

Submission deadline: November 30, 2015. View PDF.

Urbanization’s rapid progress has modernized people’s lives but also engendered big challenges, such as air pollution, increased energy consumption and traffic congestion. Tackling these challenges can seem nearly impossible years ago given the complex and dynamic settings of cities. Nowadays, sensing technologies and large-scale computing infrastructures have produced a variety of big data in urban spaces, e.g. human mobility, air quality, traffic patterns, and geographical data. The big data contain rich knowledge about a city and can help tackle these challenges when used correctly.

Urban computing is a process of acquisition, integration, and analysis of big and heterogeneous data generated by a diversity of sources in urban spaces, such as sensors, devices, vehicles, buildings, and human, to tackle the major issues that cities face, e.g. air pollution, increased energy consumption and traffic congestion [1][2]. Urban computing connects unobtrusive and ubiquitous sensing technol-ogies, advanced data management and analytics models, and novel visualization methods, to create win-win-win solutions that improve urban environment, human life quality, and city operation systems. Urban computing also helps us understand the nature of urban phenomena and even predict the future of cities.

Special Issue on Big Data Infrastructure

Submission deadline: December 20, 2015. View PDF.

Data is becoming an increasingly decisive resource in modern societies, economies, and governmental organizations. Big Data is an emerging paradigm encompassing various kinds of complex and large scale information beyond the processing capability of conventional software and databases. Various technologies are being discussed to support the handling of big data such as massively parallel processing databases, scalable storage systems, cloud computing platforms, Hadoop and Spark. Due to the multisource, massive, heterogeneous, and dynamic characteristics of application data involved in a distributed environment, one of the most important characteristics of Big Data is to carry out computing on the petabyte (PB), even the exabyte (EB)-level data with a complex computing process. Therefore, large-scale scalable Big Data Infrastructure with corresponding programming language support and software models for efficient processing in distributed environments such as cloud is on demand.

In this special issue, we invite articles on innovative research to address challenges of Big Data Infrastructure with emerging computing platforms such as heterogeneous clouds, hybrid architectures, Hadoop or Spark with emphasis on addressing real-time requirements imposed by emerging Big Data applications such as sensing data, e-commerce data, business transactions and web logs, and etc.

General Call for Papers

TBD Call-for-Papers Flyer Version 1. (PDF)

TBD Call-for-Papers Flyer Version 2. (PDF)

Access Recently Published TBD Articles

RSS Subscribe to the RSS feed of latest TBD content added to the digital library.

Mail Sign up for the Transactions Connection Newsletter.

TBD is financially cosponsored by:

IEEE Computer SocietyIEEE Communications SocietyIEEE Computational Intelligence SocietyIEEE Sensors CouncilIEEE Consumer Electronics Society


IEEE Signal Processing SocietyIEEE Systems, Man, & Cybernetics SocietyIEEE Systems CouncilIEEE Vehicular Technology Society


TBD is technically cosponsored by:

IEEE Control Systems SocietyIEEE Photonics SocietyIEEE Engineering in Medicine & Biology SocietyIEEE Power & Energy SocietyIEEE Biometrics Council