Software Engineering for Big Data Systems
Guest Editors' Introduction • Ian Gorton, Ayse Basar Bener, and Audris Mockus • April 2016
Translations by Osvaldo Perez and Tiejun Huang
Listen to the Guest Editors' Introduction
Spanish (recorded by Martin Omana):
Chinese (recorded by Timothy K. Shih):
We edited a special issue on “Software Engineering for Big Data Systems” for the March/April 2016 issue of IEEE Software magazine. The issue focused on big data’s implications for software engineering and five categories of design requirements for building such systems:
- pervasive distribution;
- write-heavy workloads;
- variable request workloads;
- computation-intensive analytics; and
- high availability.
Designed as pervasive distributed systems, big data systems must consider quality attributes such as reliability, scalability, transparency, and performance. They should be designed for fault tolerance, high consistency, high availability, and ability to embrace contextual changes. Big data systems require new techniques to trade off between read or write optimization in handling heavy workloads or variable request workloads. Computationally intensive analytics make it more challenging to design for cost-effective large-scale platforms to handle transactional systems and heavy load analytics simultaneously on the go.
As an extension of that special issue, we present the April 2016 theme for Computing Now. For this special issue, we selected five relevant articles from the IEEE Computer Society Digital Library (CSDL) published in conferences or journals over the past year and a half. Although many more articles addressed specific technical issues in detail, we selected review and position papers for a broad overview of multiple issues.
In this Issue
In "Common Pitfalls of Benchmarking Big Data Systems," Gwen Shapira and Yanpei Chen discuss big data system benchmarking. Performance benchmarking is important in predicting real-life big data system behaviour, but the authors point out that it’s useful in the industry only if done properly. Reflecting on their experience, they show some of the pitfalls and errors when benchmarking is not done rigorously.
Andrea Rosa, Lydia Y. Chen, and Walter Binder’s “Understanding Unsuccessful Executions in Big-Data Systems” explores reasons that big data systems fail. They review the literature on unsuccessful executions (terminations) in big data systems and identify the gaps, highlighting the importance of better understanding the causes, and present their views of how to address the gap.
"Embrace the Challenges: Software Engineering in a Big Data World," by Kenneth Anderson, talks about the challenges of designing big data systems. Based on his experience since 2009, Anderson examines the technical and management challenges and shares approaches to overcoming them.
In "Research Directions for Engineering Big Data Analytics Software," Carlos Otero and Adrian Peter examine research directions for engineering big data analytics software as such systems become increasingly mainstream. They discuss a new paradigm for big data systems and design requirements, recommending that researchers tackle these problems to build architectures, tools, frameworks, scalable algorithms, and self-adaptive learning systems.
Finally, Ian Gorton and John Klein’s "Distribution, Data, Deployment: Software Architecture Convergence in Big Data Systems" discusses the trade-offs among distributed data, software, and deployment architectures. The authors present concerns regarding a distributed healthcare system as an illustrative example, systematically selecting and applying a sequence of tactics to effectively address design concerns in big data systems.
The articles in this month’s theme present an overview of challenges in this important emerging area. We invite you to read them and add your comments. For further information, many other articles are available to subscribers in the CSDL.
Ian Gorton is a Professor and Director of Computer Science at Northeastern University, Seattle. He has a PhD in Computer Science from Sheffield Hallam University. His main technical interests include software architecture and scalability. Contact him at i;email@example.com.
Ayse Basar Bener is a professor of Mechanical and Industrial Engineering at Ryerson University. She has a PhD in Information Systems from the London School of Economics. Her main technical interests include big data applications in software analytics, software measurement, software economics, and software quality. Contact her at ayse.bener [at] ryerson.ca.
Audris Mockus is the Ericson-Harland Mills Chair Professor at the University of Tennessee, Knoxville. He has a PhD in Statistics from Carnegie Mellon University. His interests lie at the intersection of quantitative methods and software development, such as sound analysis techniques of operational data produced in the course of creating or using software. Contact him at firstname.lastname@example.org.