Subscribe
Issue No.02 - March/April (2004 vol.8)
pp: 34-44
Athman Bouguettaya , Virginia Tech
ABSTRACT
In order for Web services to expand across the Internet, users need to be able to efficiently access and share Web services. The authors present a query infrastructure that treats Web services as first-class objects. The query infrastructure evaluates queries through the invocations of different Web service operations. Because efficiency plays a central role in such evaluation, the authors propose a query optimization model based on aggregating the quality of Web service (QoWS) parameters of different Web services. The model adjusts QoWS through a dynamic rating scheme and multilevel matching. The rating provides an assessment of Web services' behavior. Multilevel matching allows the expansion of the solution space by enabling similar and partial answers.

The Web brought connectivity to a wealth of hitherto-inaccessible information sources. Although powerful search engines help us sift through the information glut, the ever-increasing amount of accessible information has made quality information search an arduous task. The Web's lack of semantics keeps computers from being able to automatically process information. To solve this problem, researchers envisioned the Semantic Web. 1 Web services — sets of related functionalities that consumers can programmatically access and manipulate through the Web — provide a key to enabling the Semantic Web. The ability to efficiently access and share Web services is an important step toward the full deployment of the "Service Web."
Web services can be tied to specific data sources, or they can be generic enough to operate with a wide range of sources. They can reside in legacy or newly developed systems that work with databases and other services. In fact, much information "hides" behind Web services. Using Web services consists generally of invoking operations by sending and receiving messages. However, a complex application, such as a travel package that accesses diverse Web services, needs an integrated and efficient way to manipulate and deliver Web services' functionalities. A comprehensive middleware platform would require techniques such as service description, discovery, querying, composition, monitoring, security, and privacy 2 to assist in managing autonomous and heterogeneous Web services.
We propose a novel infrastructure that offers complex and optimized query facilities for Web services. In a nutshell, users submit declarative queries that the infrastructure resolves through the combined invocations of different Web service operations. Queries target Web services and the information flow the services exchange during the invocation of their operations. Our query model facilitates the formulation and submission of queries and their transformation into actual Web service operations invocations. The optimization model uses quality of Web service (QoWS) parameters to meet users' requirements.
Scenario Planning a Trip
An oft-cited example in the context of Web services is of a tourist — say, Ravi — who is looking for a good travel plan (see Figure 1). Ravi's plan includes traveling from Washington, DC, to Miami, from Miami to San Francisco, and then back to Washington. He will spend five days in Miami and two in San Francisco. He will also rent a car while in Miami.

Figure 1. Web services example. Ravi's quest for a travel package takes three steps. He must locate (a) a ticket, (b) accommodations, and (c) a rental car.

Typically, Ravi would start searching for airline sites using a Web search engine ( Figure 1a), which would return numerous answers. After selecting a few airlines' Web sites and querying them, Ravi might select one that seemed to offer the best deal. Next, he would use a Web directory with information about accommodations ( Figure 1b). After a few queries, he would decide which hotels to stay at during his trip. In the third step, he would browse and query some known car rental Web sites ( Figure 1c) until he again selected the one that seemed to offer the best deal. Finally, Ravi would use his notes to calculate the travel package's total cost and decide to make the purchase. Returning to the Web sites, he would make the reservations for the airline tickets, car rental, and hotel.
This scenario highlights the difficulties that lay users have to go through to conduct complex queries on the Web. In addition to braving a time-consuming and difficult process, Ravi is likely to miss out on better deals. In essence, he needs a system that lets him efficiently query Web services. Ravi would then need only to express his needs for information through simple declarative queries over a well-defined interface. Our goal is to propose such a generic approach for optimally querying Web services.
Web Services Querying
Web services' primary use has so far consisted of invoking operations by sending and receiving messages. Although this meets the requirements of simple applications, complex applications that access diverse Web services need an integrated way to manipulate and deliver Web services' functionalities. In addition, as more companies start to deploy Web services, many will compete by offering similar functionalities — different airlines offering the same connection between two cities, for example. However, the Web services will differ in the way that they offer these functionalities (requiring different input and output parameters, for instance) and the conditions for using them (different levels of QoWS). In addition, satisfying users' requests might not require returning exact answers. Indeed, users might be satisfied by alternative or "partial" answers.
As a major step in addressing these challenges, we propose a new approach to querying Web services and the information flow during the invocation of their operations. Depending on the query, its resolution might lead to the invocation of various Web service operations and the combination of their output results. Our approach emphasizes the optimization of the query-resolution process while allowing partial answers.
A Query Model for Web Services
To facilitate the interactions between users and Web services, we need to represent the space of Web services within a given application domain in a generic way. We define a set of domain-specific operations called virtual operations — so-called because they don't belong to any actual Web service — for use within a three-level query model:

• At the top, the query level consists of a set of relations that give users an interface for formulating and submitting declarative queries directed to Web services. Different mapping rules define different sets of relations over the virtual operations (such as flight schedules and rental car listings).

• The virtual level consists of the Web service-like operations a particular application domain typically offers (finding hotel room rates and checking flight availability, for example).

• The concrete level represents the space of Web services offered on the Web — the potential candidates for answering queries. Because the Web services are a priori unknown, the query model needs to discover them and match them to the virtual operations appearing in a query.

In Figure 2, we illustrate this query model for the travel application domain. For example, Ravi could get available rooms and rental cars in a given city and throughout a given period by simply formulating a query that uses the relations Rooms(City, Start, End, Rate) and Cars(City, Start, End, Model, Rate). The query model then maps these two relations to their corresponding virtual operations BookRoom and RentACar. The matching process then matches these virtual operations to various operations from the concrete level.

Figure 2. Three-level query scheme, using the travel domain as an example. (a) Relations from the query level express the user's search. (b) The query model transforms those relations to bear on virtual operations from the virtual level. (c) The model then matches virtual operations to concrete operations on the concrete level.

From Relations to Virtual Operations
Relations at the query level define a specific view of the application domain. We represent them as conjunctive queries over virtual operations, thus presenting the virtual operations just as if they were relations. More precisely, let be the set of relations the query level defines and let be the set of virtual operations. For any relation

where x i are the attributes of , and y ji are the corresponding operation's input and output variables. This definition means that to get R i's tuples, we need to invoke the different operations Vop j. It does not mandate any order or limit concurrency among these operations. An example of a mapping rule is
Airlines(DepartureCity, ArrivalCity,AirlineNames, WebSites) :-    InquireAirlines(DepartureCity, ArrivalCity, AirlineNames),    GetWebSites(AirlineNames,           WebSites)
The above mapping rule defines the relation Airlines through two virtual operations: InquireAirlines returns the list of airlines operating between two cities, and GetWebSites returns the Web sites for a list of airlines. The query infrastructure can obtain tuples of Airlines by invoking the operation InquireAirlines and then GetWebSites.
Users can directly use virtual operations to access Web services, but the use of relations has two benefits. For one thing, it lets users formulate and submit database-like queries in a natural way. For another, it provides a view tailored for a particular group of users interested in some specific part of the service space.
Virtual Operations Representation
Virtual operations represent functionalities a particular application domain typically offers. For any virtual operation appearing in a query, we need to locate the relevant Web service operations. This requires a semantic description as well as syntactic attributes. We assume that business partners would agree on a common ontology for describing concrete Web services and virtual operations. Our example adopts the community-based description of Web services defined in our prototype Web Digital Government System, 3 which we describe in the "Implementation" section.
Each operation, whether virtual or concrete, is semantically described through its function and category. Function contains two attributes:

Functionality represents the business functionality provided by the operation ( booking and listing, for example).

Synonyms contains a list of alternative functionality names for the operation (for instance, reservation is a synonym of booking.).

Category also contains two attributes:

Domain gives the area of interest of the operation (for example, flight, accommodation, and tour).

Synonyms resembles those defined for the operation's purpose.

To invoke any operation, we need to set values for its input variables. In formulating queries, users are free to specify any type of conditions on the different variables. This can lead to a scenario in which the system can't invoke operations because some input variables have missing values. If all virtual operation descriptions specified all their input variables' potential values, it would solve the problem (though this is not always possible for all input variables). The system could then expand an input variable in all its possible values that satisfy any condition appearing in the query.
The following quintuple formally represents each virtual operation:
Vop = (In, Out, Domains, Category, Function)
where In is the set of input variables; Out is the set of output variables; Domains is a set of pair ( x, range), where x is an enumerable variable appearing in In and range is the set of all possible values for x; Category describes the domain of interest; and Function describes the business function. The following is an example of a virtual operation with all of its attributes:
Fares = (In, Out, Domains, Category,Function)
where In = (DepartureCity, ArrivalCity, DepartureDate), Out = (DepartureDate, ArrivalDate, ArrivalTime, Price), Domains = {(DepartureDate, [all dates between today's date and November 30, 2004])}, Category = {Flight,Trip}, and Function = {Listing, Fares}.
Multilevel Matching for Virtual Operations
As providers begin competing in their Web services offerings, differences will occur in the required input, returned output, QoWS, and so on. For example, a weather service's forecast details, freshness of information, and fees to access the service might vary depending on the provider. This means it will not always be possible to find an exact match for a given virtual operation. Instead of finding only concrete operations that match exactly the virtual operations appearing in a query, a more flexible matching scheme that allows virtual and concrete operations' attributes to differ might still satisfy users' needs.
We define a function similar to check whether two attributes appearing in two operations are the same: similar( x, y) is True if x and y correspond to the same concepts with respect to the common ontology defined in the application domain. For any two operations op1 and op2, In(op1) = In(op2) if

In(op1) and In(op2) have the same number of variables and

In(op 1) (resp. In(op 2)),yIn(op 2) (resp. In(op 1)) | similar(x, y) is True

We define Out(op1) = Out(op2) similarly. We also define In(op1) In(op2) and Out(op1) Out(op2) in a similar way, where the first set is a subset of the second one.
Let Vop and Concop be virtual and concrete operations, respectively. We identified four different matching levels obtained by varying the way we compare attributes of virtual and concrete operations:

• exact match,

• overlapping match,

• partial match, and

• partial and overlapping match.

In an exact match, the concrete operation matches the virtual operation in all attributes. Two operations match exactly if they have the same input and output variables, as well as the same Category and Function.
An overlapping match relates to an operation offering close functionalities to that of the virtual operation. Two operations overlap if they have the same input and output variables, and their Category and Function overlap. Let us assume a virtual operation
Fares = (In, Out, Domains, Category, Function)
where In = (DepartureCity, ArrivalCity, DepartureDate), Out = (DepartureDate, ArrivalDate, ArrivalTime, Price), Domains = {(DepartureDate, [all dates between today's date and November 30, 2004])}, Category = {Flight,Trip}, and Function = {Listing, Fares}.
And a concrete operation
AirfareTickets = (In, Out, Category, Function)
where In = (DepartureCity, ArrivalCity, DepartureDate), Out = (DepartureDate, ArrivalDate, ArrivalTime, Price), Category = {Flight, Trip, Tour, Tourism}, and Function = {Listing, Fares, Quotes} provides an overlapping match.
A partial match corresponds to the case where input and output attributes of the two operations do not coincide. Two operations Vop and Concop match partially if they have the same Category and Function, and Out(Concop) Out(Vop) or In(Concop) In(Vop). An example of the first subset relationship is an operation that does not return all the output attributes the virtual operation expects. An example of the second subset relationship is two operations fare1(<Travel Information>, IsStudent, Cost) and fare2(<Travel Information>, Cost). Both operations return fares for a trip specified by <Travel Information>, but the second one does not take into account whether the requester is a student, which might lower the price.
A partial and overlap match combines the overlap and partial matches. Two operations Vop and Concop match partially and by overlap if Out(Concop) Out(Vop) or In(Concop) In(Vop) and their Category and Function attributes overlap.
We assign a matching degree to each level that quantifies the precision of the matching. The matching degree directly affects the query results' quality. The above levels receive matching degrees of 1, 3/4, 1/2, and 1/4, respectively. The values are arbitrary, existing mainly to distinguish the different matching levels.
Web Services Query Optimization
Several service execution plans using different Web services can potentially resolve the same query. Thus we need to set appropriate criteria for selecting the best among all possible execution plans. A key feature in distinguishing between competing Web services is their QoWS, 4 which encompasses several quantitative and qualitative parameters that measure how well the Web service delivers its functionalities.
The objective of the optimization process is to maximize (or minimize) the value of each of the following QoWS parameters:

Latency represents the average time for an operation to return results after its invocation.

Fees are units of money that a consumer of a Web service must pay to invoke operations.

Availability represents the probability that a Web service is available. Large values mean high availability, and small values indicate low availability.

We want to minimize the negative parameters (latency and fees) and maximize the positive parameter (availability). For normalization of QoWS parameters, positive parameters range from 0 to 1, and negative parameters are greater than or equal to 1.
Web Services Ratings
Fluctuations can occur over a Web service's lifetime, so it might not always fulfill advertised QoWS parameters. In general, most users can accept small differences between delivered and advertised values. However, large differences indicate that the Web service is suffering a performance degradation in delivering its functionalities. For that reason, our system would monitor QoWS parameters for invoked Web services — essentially, measuring the QoWS parameters' fluctuations and providing an assessment or rating for the Web service. Such rating plays an important role in the optimization process.
Each time a user selects and invokes a Web service, we measure the gap between the promised and delivered QoWS. This QoWS distance is the sum of these differences (or their inverse) for all QoWS parameters. More precisely, let pQ i (resp. dQ i) (1 ≤ in) be the value of the promised QoWS (resp. delivered QoWS) for QoWS i. Pos and neg are the set of QoWS parameters to maximize and minimize respectively. QoWSdist is the QoWS distance, which is calculated as follows:

Web services initially receive the highest rating ( ). If a Web service has a QoWS distance below a certain negative threshold , then its rating decreases. In contrast, if the QoWS distance has a value above a certain positive threshold , the rating increases. A simple rating of Web services would consolidate all QoWS fluctuations into a single formula. Alternately, we could treat each service's QoWS separately by assigning each parameter its own rating. We could then either use each parameter's rating to weigh the corresponding QoWS in the objective function or aggregate all ratings and use that to weigh the whole formula.
Several rating schemes already exist on the Web, such as the systems that Epinions.com, Bizrate.com, and eBay use. These rest mainly on user feedback, where consumers rate products, services, and providers from personal experience. These rating systems do not conduct any monitoring on Web services and operation invocations, as our query infrastructure would. However, we could take advantage of these systems to refine our own ratings. For example, the monitoring agent could communicate with them periodically to update its own ratings.
Objective Function
Whether Web services succeed in delivering their functionalities relies mainly on QoWS. Thus, the objective of the optimization process is to identify and employ Web services with the best values for those QoWS parameters. Two other parameters, rating ( rating) and matching degree ( md), refine the definition of success. We can therefore characterize any Web service operation involved in resolving a query by three measures: promised QoWS parameters, rating, and md.
Different approaches exist for computing the overall quality of an operation on the basis of its QoWS parameters. The Simple Additive Weighting method, widely used in decision making, usually reaches ranking results very close to those of more sophisticated methods. 5 This method comprises three basic steps:

1. Scale the different QoWS parameters to make them comparable.

2. Apply user-supplied weights for each parameter, if specified as part of the query.

3. Add the weighted and scaled QoWS parameters for each operation.

The operation's quality is the sum obtained in the last step.
We define an objective function F that needs to be maximized for each concrete operation op involved in resolving a query:

where is the maximum value for the i th QoWS parameter for all concrete operations matching the same virtual operation and is the minimum.
Query Processing and Optimization
Users submit conjunctive queries (conjunctions of relations and conditions) over relations from the query level. Queries have the following general form:

where R i are relations from the query level. X and X i are tuples of variables; and C k represents conditions on variables appearing in the query. The form of those conditions is C k = x op c, where x is an input or output variable appearing in any X i, c is a constant, and . Multiple occurrences of a variable express equality.
The query undergoes several transformations that result in a service execution plan, which basically defines a sequence of sets of operations. The query infrastructure can concurrently invoke operations within the same set. Optimization occurs at the Web service and data levels. The former concerns the choice of Web services and operations; the latter relates to the ordering of operations invocations, data routing, and data manipulation operations for collecting results. Data-level optimization needs to ensure that the obtained ordering is feasible: whenever an operation appears in the ordering (that is, must be invoked), all its input variables must be bound (have a value).
We have devised an algorithm that builds efficient service execution plans using a local selection approach for optimization. The algorithm selects the best concrete operation for each virtual operation appearing in a query (after the algorithm applies the mapping rules). To ensure the obtained plan's feasibility, the algorithm builds sequences of virtual-operation invocations iteratively on the basis of the availability of input variables for each operation. We assume that providers describe their Web services using WSDL augmented with the semantic attributes described in the "Virtual Operations Representation" section. We assume also that they publish them in UDDI service registries. Providers can advertise QoWS parameters using UDDI tModels. The main idea is that the algorithm builds sequences of virtual-operation invocations on the basis of input variables' availability. The algorithm then discovers concrete operations, matches them to virtual operations, and assesses them using the objective function while ensuring that the resulting sequence is still feasible. The algorithm has three phases:

• initialization and query unfolding

• virtual operations ordering, and

• service discovery and operations matching.

The initialization and query unfolding phase's main task is to collect all bound variables the algorithm obtains from the query either directly (equality conditions) or by using ranges (inequality conditions) defined in virtual operations. The algorithm also initializes the service execution plan to an empty set. It then unfolds the query to bear on virtual operations using the different mapping rules.
In the virtual operations ordering phase, the algorithm iteratively selects the virtual operations that it can invoke at a certain time in the sequence that represents the service execution plan. The algorithm makes its selection by determining, at each iteration, all virtual operations that have their input attributes bound — that is, whose variables the algorithm can replace with available constant values. Those selected virtual operations will eventually provide new bound variables through their output attributes. These would allow the algorithm to select other operations in the next iterations. If it finds an operation with no bound variables, the query is not answerable. At the end of this phase, the service execution plan bears on virtual operations. For example, assume that a query Q contains the relation Airlines defined in the "From Relations to Virtual Operations."
Q(AirlineNames, WebSites) :-  Airlines(DepartureCity, ArrivalCity,       AirlineNames, WebSites),  DepartureCity = "Miami",       ArrivalCity = "Baltimore"
The algorithm replaces the relation Airlines with the virtual operations InquireAirlines and GetWebSites. Based on bound variables DepartureCity and ArrivalCity, the algorithm needs to select InquireAirlines first. The availability of the output attribute AirlineNames would then allow the algorithm to select GetWebSites next in the sequence.
The service discovery and operations matching phase's goal is to replace virtual operations with concrete operations, while making sure that the obtained sequence is still feasible based on available bindings at each position in the sequence. The algorithm traverses the sequence and initiates a lookup for each single virtual operation. The lookup should return a concrete operation with the highest value for the objective function, as defined previously. It starts by looking for relevant Web services through UDDI service registries, using the virtual operation's Category and Function attributes to build a UDDI inquiry. The lookup searches the description of each returned Web service for operations that match the virtual operation using one of the four matching levels. For the previous query Q, several different providers could offer both virtual operations InquireAirlines and GetWebSites with different QoWS. The algorithm selects the concrete operation with the highest value for the objective function. Because a partial match can occur, the algorithm must check that there is no virtual operation in the sequence whose inputs depend on the missing outputs that a partial match allows. If the missing outputs are required, the algorithm executes the lookup again until it finds an appropriate concrete operation. The algorithm terminates when it has replaced all virtual operations in the sequence (service execution plan) with concrete operations, or when it determines that it can't match a virtual operation to any existing concrete operations.
Implementation
To test our proposed query infrastructure, we implemented it on top of our prototype Web Digital Government ( Figure 3). 6 We developed WebDG at Virginia Tech to allow access to different e-government applications, such as Medicaid and benefits for pregnant women, as Web services. WebDG describes these Web services using WSDL and registers them in a local UDDI registry. Implemented across a network of Solaris workstations, it includes several social services applications written in Java and accessing various databases. We adopted Systinet's WASP UDDI Standard 3.1 as our UDDI toolkit and the Cloudscape 4.0 database as our UDDI registry.

Figure 3. The query infrastructure in WebDG. This figure shows the different components of the service query engine as implemented within the WebDG system for e-government services.

The service query engine, the part of WebDG that implements the query approach we present in this article, includes the following components:

• The service locator discovers WSDL descriptions by accessing the UDDI registry. Once the service locator discovers a service, the query engine invokes its operations through the SOAP binding stub, which uses the Apache SOAP API.

• The operation matchmaker interacts with the service locator to retrieve the services' descriptions. It determines the concrete operations to use in the service execution plan.

• The monitoring agent monitors Web service invocations. Its goal is to assess their behavior in terms of the delivered QoWS.

• The query optimizer, the query engine's central component, determines the best service execution plan based on QoWS, service ratings, and matching degrees.

• The execution engine takes over after the optimizer generates an efficient service execution plan. The execution engine enacts the service execution plan by actually invoking Web services using SOAP.

The service query engine receives Web service queries as conjunctive queries over relations defined for social services. It takes care of the correct and optimal execution of Web service queries in WebDG.
Conclusions
Researchers in both academia and industry are studying Web services because of their role in deploying the Semantic Web. To the best of our knowledge, our proposed query and optimization model is the first to provide complex query capabilities over Web services. It expresses queries over the query level and transforms them into a combination of virtual-operation invocations. Different matching modes match the virtual operations against concrete operations from actual Web services. The proposed model selects and combines appropriate operations based on relevance, QoWS, matching degrees, ratings, and feasibility.
We can extend our query infrastructure in several ways. Semantic-based optimization techniques for Web services would use intelligent agents to take advantage of the current context (the application's semantics) in order to enhance optimization. We could also cater to the dynamic and volatile nature of Web services by designing adaptive techniques to compensate for the effects on the service execution plan efficiency of unpredictable events during run time. Such adaptive techniques could, for example, +replace a Web service that became unavailable with one that offers similar functionalities. The replacement strategy should not decrease the overall quality of the service execution plan.
We thank Brahim Medjahed and Abdelmounaam Rezgui for their valuable comments on this article. This research is supported by the US National Science Foundation under grant 9983249-EIA and the US National Library of Medicine/NIH under grant 1-R03-LM008140-01.

#### References

Mourad Ouzzani is a PhD candidate in the Computer Science department at Virginia Tech. His research interests include query optimization over Web services, digital government, and Web-based databases. Ouzzani received a BSc and an MSc in computer science from University of Science and Technology, Houari Boumediene, in Algiers, Algeria. He is a member of the IEEE, the IEEE Computer Society, and the ACM. Contact him at mourad@vt.edu.
Athman Bouguettaya is the program director of the Computer Science department and the director of the E-Commerce and E-Government Research Lab at Virginia Tech. His research interests are in Web databases and Web services. He is on the editorial board of the Distributed and Parallel Databases Journal, he is cochair of the 2004 IEEE RIDE Workshop, and he was a guest editor for a special issue of IEEE Internet Computing on database technology on the Web. Contact him at athman@vt.edu.