Data Storage Evolution
The computer industry has long treated storage as a peripheral element. Now, data and data storage are attracting the most attention in systems design. Computer engineers invented storage hierarchy to balance performance with cost per byte processed and stored. Additionally, structuring data helps the computing thread to reach the data element quickly, trading off storage capacity with compute power.
The industry has traditionally been compute-centric, but as end users become more empowered, creative application developers are moving data to the center of their system designs. For instance, cloud-computing platforms have invented new ways of storing data, moving away from the beaten path. End users are creating more data than ever before in newer formats, and enterprise users are demanding the storage and analytics they get from the public Internet. For Computing Now this month, I have selected a set of articles representing the factors that are influencing this new thinking in the design of storage systems.
With the emergence of cloud computing connected to myriad interaction devices and applications, we are living through the next wave of IT ubiquity and pervasiveness. It is dramatically changing the way we do business, entertain, and connect with others — essentially, how we live our lives. The emergence of RAID in the 1980s is an early instance of creating a common intellectual framework and taxonomy in storage systems design, marrying end-user needs and the shifting landscape of available building blocks. New end-user demands and the evolution of technology are now causing similar tectonic movements toward new ways of building computing systems, integrating storage management and highly flexible connectivity. The creation of larger, persistent main memory systems is motivating system developers to evaluate the need to structure data differently or even do away with structuring as we know it.
Empowered end users cause application systems to evolve at tremendous speeds and continuously create new requirements for interoperation. For instance, a social networking site user can add content and pointers from a website, by simply dragging and dropping. The evolution of mashups that combine data and functionality from multiple sources is another example of this new design paradigm. This is leading to the evolution of the user experience, along with computation and data management.
Emerging new sources of data are causing dramatic growth in the volume of data being generated and transported. New sources of data include digitized documents, multimedia, and, more recently, data from sensors attached to the physical world. While bandwidth is increasing rapidly, connectivity constrained by latency is a fundamental challenge. Storage distribution, efficient data transportation, and application partitioning are the main methods of working around latency and are likely to remain so for years. Finally, protecting data from inappropriate use and accidental destruction is getting to be more complex than ever.
Thus, data and storage are becoming the central theme in the design of next-generation IT systems. Overall, storage system trends indicate a major shift of concerns and a new era of innovation opportunities driven by end-user demands and the capabilities of new generation of technology building blocks.
In “From Microprocessors to Nanostores: Rethinking Data-Centric Systems,” (login required for full text) Parthasarathy Ranganathan calls for a rethinking of the basic building blocks of computing systems influenced by emerging data-oriented workloads. Anthony Cleve, Tom Mens, and Jean-Luc Hainaut describe the challenges of managing the evolution of data-intensive software systems in “Data-Intensive System Evolution.” (login required for full text) In “Multiparadigm Data Storage for Enterprise Applications,” (login required for full text) Debasish Ghosh describes strategies for storing data with richer application semantics that simplify the programming models for access. In “Data Stream Management Systems for Computational Finance,” (login required for full text) Badrish Chandramouli and his coauthors present data stream management systems (DSMS) that process and issue long-running queries over temporal data streams. Bo Sheng, Qun Li, and Weizhen Mao discuss optimization issues in dealing with large volumes of data from sensor networks and storage node placement in “Optimize Storage Placement in Sensor Networks.” (login required for full text) Finally, “Cost-Aware Virtual USB Drive” (login required for full text) by Young Jin Nam and his coauthors is an interesting investigation combining mobile access and public cloud storage.
“RAID: A Personal Recollection of How Storage Became a System” by Randy H. Katz in IEEE Annals of the History of Computing recalls the context that led to the invention of RAID. The article describes what was perhaps the first systemic view on storage, leading to the creation of a common intellectual framework and taxonomy. Katz’s insights are still relevant to the modern-day storage engineer.
is director of R&D in Hewlett Packard’s Storage Works Division in Bangalore, India and a visiting professor at International Institute of Information Technology, Bangalore. He’s also Computing Now’s regional liaison to IEEE Computer Society activities in India. Contact him at s.nagarajan at computer dot org.
Related Multimedia: Data-Centric Systems in the Exascale Era
Computing Now editor in chief Dejan Milojicic talks with HP’s Alistair Veitch and Partha Ranganathan about how the recent increase in unstructured data, coupled with paradigm shifts in storage and communication, are leading to revolutionary thinking about how computers are built.
Download Part 1 (.flv; 5:43)
Download Part 2 (.flv; 6:49)