Era of Agile and Always-Available Data Storage
Guest Editor's Introduction • Sundara Nagarajan • March 2013
Translated by Osvaldo Perez and Tiejun Huang
The only component of the data center that continues to grow in size and number, storage is a fascinating area of computing. Enterprise IT leaders continue to seek maximum efficiency in organizing and operating their data centers. As IT has become the business technology of modern enterprises, the unavailability of data for even a short duration has become unacceptable. Meanwhile, IT workers are tasked daily with solving increasingly sophisticated business problems for which the central theme is often a massive amount of data. In day-to-day life, only a small amount of data is hot — data to which the user needs immediate access (emails that arrived today, for example) — whereas a lot of data is cold (not needed for immediate access — as with emails that arrived 2 years ago). That said, a small portion of the cold data can turn hot suddenly on demand, which can introduce difficult hurdles when dealing with massive scale regarding data and users.
This presents a set of challenging problems for storage researchers and engineers: How do we ensure the storage infrastructure to be scalable, efficient, and reliable with no access disruptions even for upgrades and maintenance? How do we access the hot data as fast as possible while storing the cold data as inexpensively as possible? How do we ensure that the data is managed as a continuum, throughout its life cycle? Storage industry leaders and startups continue to invest talent and resources to take on this array of real-world problems.
Changing Landscape for Data Management
Currently, storage users can choose to incur capital expenditures to build their own storage infrastructure or source pay-per-use storage as a service. The underlying management in both cases calls for multitenant capabilities — to be able to isolate groups of users completely and deliver those tenants secured access and assured service levels. Sophisticated economic models and brokering services are letting CIOs and IT managers comprehend the dynamic nature of capital and operating expenses to establish the most desirable total cost of ownership (TCO).
Advances in storage component technologies — flash/solid-state drives, storage class memory, and continually improving disk drives -- are presenting new options for storage system vendors. Users want solutions that hide the complexity of component choice, data placement, and data protection. Within this ever-growing sector — broadly referred to as enterprise storage — IT research firm IDC predicts scale-out architectures will be the fastest growing segment in 2013. [source] System engineers have been looking since the '70s for clustered architectures that can be immortal and scale to large size. Clustered architectures enabled high-end mainframes and expensive fault-tolerant systems. Innovations in storage system architecture with commodity components and cluster interconnects are making affordable products. These systems offer highly agile, scalable virtualized storage infrastructure that can operate non-disruptively.
Modern storage systems assimilate a huge supply of algorithms and architecture innovations developed over decades. Storage systems are characterized by a significant set of base technologies that are essential to build products, though not yet to differentiate them. These technologies include strong data-protection algorithms at disk-layout level and resilient file-system, storage-efficiency technologies, such as deduplication and compression; file, file-system, or volume snapshots; and clones. Clustered architectures provide significant additional capabilities to minimize disruption due to faults and while doing upgrades or routine maintenance. Evidence now shows that mainstream storage architectures are clustered, and they combine well with commodity components to deliver the best customer value.
This Month's Theme
For this issue of Computing Now, I gathered a set of articles that examine the trends in modern enterprise data-management architectures and clustered storage solutions.
In "IaaS Cloud Architecture: From Virtualized Data Centers to Federated Infrastructures," Rafael Moreno-Vozmediano, Rubén Montero, and Ignacio Llorente present a good overview of the anatomy of cloud infrastructure, as well as explain the concept of the cloud OS and its role in operating a modern enterprise data center.
Lee Garber's "Converged Infrastructure: Addressing the Efficiency Challenge" is a Computer news story about the integration of data-center components: compute servers, storage, and networking. Such converged infrastructure architectures can include components from single vendors or from different best-of-breed providers. This article presents the opportunities and challenges in realizing flexible, off-the-shelf converged infrastructure systems — data center in a box.
Boris Grot and his colleagues make extensive analysis of a scale-out solution with low-power processors to achieve optimal TCO in "Optimizing Data-Center TCO with Scale-Out Processors." The article defines TCO as "an optimization metric that considers the costs of real estate, power delivery and cooling infrastructure, hardware-acquisition costs, and operating expenses." This excellent study will have far-reaching impact on storage system architecture.
Our next article, Yaoguang Wang and colleagues' "HAaaS: Towards Highly Available Distributed Systems," discusses an important demand on contemporary storage architecture: high availability. This article presents a solution that uses a shared storage solving some important issues of state management.
We end with "A Novel Solution of Distributed File Storage for Cloud Service," a conference paper by Yu Zhang, Weidong Liu, and Jiaxing Song that discusses properties of cloud services and presents significant data on building a shared storage service.
To augment the articles in this month's theme, I asked Tim Russell, vice president of data lifecycle ecosystem solutions at NetApp, about the changes and challenges he's seeing with clustered storage architectures as deployed by enterprise users. As leader of a team that develops solutions for managing data from creation through long-term retention, and previously leading a product strategy group that identified and assessed long-term market trends, Russell constantly interacts with large users of storage systems. In this exclusive video response, he shares his vision on enterprise storage systems.
I also spoke with Rajkumar Buyya, Professor of Computer Science and Software Engineering; Future Fellow of the Australian Research Council; and Director of the Cloud Computing and Distributed Systems Laboratory at the University of Melbourne, Australia. In his video response to my questions, Buyya speaks about his work and research perspectives in the clustered systems domain.
Enterprise IT consumers — employees, customers, suppliers, and partners — are demanding pervasive experience with IT in their fast-paced business and personal lives. For CIOs and enterprise IT technologists, business service disruption is not an option. Rapidly changing technology and delivery models are driving them to rethink approaches to delivering business technology to enterprise consumers. Data-management architecture is key to the solution, and clustered storage systems are providing the foundation for modern business technology in the enterprise.
Sundara Nagarajan is a technical director with NetApp and a visiting professor at International Institute of Information Technology in Bangalore, India. He's Computing Now's regional liaison to IEEE Computer Society activities in India. Contact him at s.nagarajan at computer dot org.