Issue No.03 - May/June (2006 vol.10)
Published by the IEEE Computer Society
Murray Woodside , Carleton University
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/MIC.2006.49
Application-level quality of service (QoS) is the Achilles' heel of services offered overthe Internet. The articles in this special issue cover various aspects of this complex problem, while exposing the challenges we have yet to overcome.
Application-level quality of service (QoS) is the Achilles' heel of services offered over the Internet. Given multitier services' complexity and the dynamic nature afforded by on-the-fly service composition, even planning an adequate initial service deployment is difficult. Once in operation, a burst of load can quickly swamp a service and its database, and services' ability to react is often inadequate. The articles in this special issue cover various aspects of this complex problem, while exposing the challenges we have yet to overcome. Although QoS covers performance (delay and capacity), dependability, security, and many other properties in principle, this special issue focuses on performance and, to a lesser extent, dependability.
Many distributed system projects fail or are delivered late because they can't meet QoS goals. Discovering a serious shortfall during load testing just prior to deployment can be disastrous. Solutions based on upgrading the hardware infrastructure are expensive and don't necessarily solve problems caused by poorly chosen software components and inadequately designed software architectures. Instead, developers need methods for designing QoS capabilities into distributed systems. As with other design problems, such methods must rely on a combination of experience and some kind of QoS model.
QoS results from the interaction of user demands, system behavior under these demands, and the resources the behavior requires. Poor understanding of any one of these factors leads to uncontrolled QoS and inefficient resource use. The intensity of the workload placed by users might be unpredictable. Therefore, QoS assurance plans must either control demand by denying access to users during overloads, overprovision resources, or expand resources on-the-fly during overloads. Online adaptation of resource levels (and also service behavior) is one solution to uncertainty regarding the three QoS factors, and is the subject of initiatives in autonomic and adaptive systems. Making these management solutions work efficiently requires a third factor — knowledge of applications' behavior and their resource demands. We can model all three factors, but in a new application, it's often application behavior that's least understood.
Many projects depend on performance testing at integration time, but the record is full of costly failures (poorly documented, given that very few people advertise their mistakes), sometimes leading to product abandonment. QoS design-in methods are a risk-reduction technique based on either modeling the application, as with software performance engineering (SPE), or on the use of well-understood components, as in prediction-enabled component technology (PECT).
Designing to QoS Specifications
Three broad approaches to QoS design might be termed evolution, precision design, and flexible design. Most designers depend on evolution and on their experience — that is, they use performance features that have worked in the past. Evolution provides some certainty but is a poor guide to the use of new technology. Performance difficulties in the introduction of multithreading in the 1990s and Enterprise JavaBeans more recently illustrate the pitfalls of evolution.
Precision design exploits models of workload, behavior, and resources to obtain predictable QoS. This benefit is particularly important in critical systems; however, the models require precise knowledge of the three QoS factors, which might require expensive and time-consuming effort. Schedulability analysis is an example of a precision technique. 1 Finally, flexible design adds features to deal with workload uncertainty and create scalable software that (given sufficient resources) can operate efficiently for a wide range of loads. 2 Although this type of design is based on a looser description of the workload, efficiency still depends on knowledge and models of the application's behavior and its resource use.
In this Issue
In "Managing End-to-End QoS in Distributed Embedded Applications," Praveen Sharma and her colleagues describe their experience with a component framework specifically designed for QoS control. The authors draw on this experience to develop a middleware QoS-management approach based on the composition of QoS components. Application components are encapsulated in qoskets, which system middleware can manage to provide adaptive capability. These components provide QoS monitoring, some decision-making capability, and actuation (the ability to change what they do). In this article, they're applied to a system of radio-controlled surveillance aircraft.
In "Increasing QoS in Selfish Overlay Networks," Bruno Gusmão Rocha, Virgilio Almeida, and Dorgival Guedes describe how overlay networks provide application-level routing on top of an underlying Internet routing substrate. In an ideal world, all nodes in an overlay network cooperate to provide network resources. In practice, some nodes exhibit selfish behavior and use services without providing anything in return. These free-riding nodes must be identified and isolated. The article presents a novel approach to identifying free-riders, assessing their impact on overall network QoS, and discouraging selfish behavior. The proposed approach is based on a reputation mechanism in which nodes with higher reputations are more likely to have their requests served, whereas those with lower reputations have a lower probability of obtaining overlay resources. The authors model the interaction between overlay nodes as a noncooperative game in which each node's strategy is to select nodes with which to connect in order to maximize benefit. The authors evaluate the proposed reputation-based mechanism using simulation.
A movement toward precision-design techniques for distributed systems is visible in "QoS Assessment via Stochastic Analysis," by Simona Bernardi and José Merseguer. It describes a process for deriving performance models directly from design models in the Unified Modeling Language (UML) to provide rapid evaluation based on software design. The authors derive a stochastic model for the response time of a reliability framework from its behavior specified in UML, and they explore its performance in different alarm conditions.
Service-oriented computing is becoming a common paradigm for supporting distributed applications that rely on Internet-available services. The final article of this issue, "Reliability Prediction for Service-Oriented Computing Environments," by Vicenzo Grassi and Simone Patella, offers an architecture for predicting composed systems' reliability. The authors consider atomic services — that is, services that don't invoke other services — and composite services, which might need to call on other services according to some service-usage profile. Service providers are considered autonomous and can adopt three types of policies with respect to revealing how they select services to invoke: no transparency, partial transparency, and total transparency. The article provides an algorithm for computing the composite service's predicted reliability in a distributed manner. The authors implemented their architecture in a grid computing platform.
We hope that the articles in this special issue will help raise awareness to an important, and often overlooked, cause for failures of distributed applications. The trend toward increased use of "third-party" software components and the increased adoption of service-oriented architectures (SOAs) will exacerbate the issues discussed here. Possible solutions might include the design of QoS-aware software systems 3 and the use of autonomic computing techniques.
Murray Woodside is a distinguished research professor in the Department of Systems and Computer Engineering at Carleton University, Ottawa, Canada. He has a PhD in control engineering from Cambridge University and until recently held the OCRI-NSERC Industrial Research Chair in Performance Engineering of Real-time Software. Contact him at email@example.com.
Daniel A. Menascé is a professor of computer science and the associate dean for research and graduate studies at the Volgenau School of Information Technology and Engineering at George Mason University. He has a PhD in computer science from UCLA. Menascé is a fellow of the ACM and the recipient of the 2001 A.A. Michelson Award from the Computer Measurement Group. Contact him at firstname.lastname@example.org.