Flexible Routing in the Cloud
by George Lawton
Cloud computing gives businesses flexible resource access that lets them offer multitiered pricing for different infrastructure classes of services such as CPU, storage, and total network throughput. However, network latency remains an uncontrolled variable in cloud computing because no tools yet exist to offer the same kind of flexibility for different network service classes.
At Georgia Institute of Technology, researchers are working to develop a transit portal (TP) that will let cloud applications dynamically change network routes to meet a particular application’s need and a user's willingness to pay more for it. PhD student Vytautas Valancius is leading the TP project, in coordination with Nick Feamster, an assistant professor at Georgia Tech, Jennifer Rexford, a professor at Princeton, and Akihiro Nakao, associate professor at the University of Tokyo.
Today, cloud-based route assignments are very inefficient, said Valancius. "There are thousands of applications running in the Amazon data center, and every application uses the same route." TP will give users some control over network latency by letting them manage the routes by which traffic enters and leaves their applications. But much more work remains to be done before this type of flexible routing becomes widely available.
Driving Forces
Latency measures the lag that packets experience in traveling the network. In a perfect world, packets would travel to their destinations at the speed of light, 186,000 miles per second. But in the real world, physical, electronic, and topological characteristics of the route slow packets down.
Latency's effects are most significant when traffic between two parties is interdependent. It's not as important in large packet transfers, such as backing up a database. But it can be mission critical in some highly interactive applications. For example, some stock market trading firms are starting to move their computer trading facilities as physically close to the stock exchanges as possible to minimize the travel time. Flexible network routing would let companies buy a higher class of route when an application requires it and pay less for applications such as data backup.
TP could also be useful outside the cloud to large-scale Internet service providers who want control over how Internet traffic reaches their systems — especially for latency-sensitive applications such as game servers and content distribution. According to Andrew Warfield, associate professor of computer science at the University of British Columbia and technical director for storage and emerging technologies in Citrix Systems' virtualization management division, "The transit portal work would extend cloud-based infrastructures to allow people leasing computing in these facilities to have the same degree of control over routing that they would if they were hosting services in their own facilities."
Warfield believes that one of TP's most interesting aspects its potential to give this degree of flexibility to smaller users. Cloud providers like Amazon's EC2 already let small customers lease virtual machines in geographically disperse datacenters around the world, explained. "Transit portal would allow them to have a great deal of control over how traffic is delivered to these servers."
Large network operators typically have network points of presence in geographically dispersed locations, which are connected to other network operations by very expensive carrier-grade routers. A TP connects to special routers at multiple locations to mimic this functionality for smaller users.
Disaster recovery is another TP application area. For example, Warfield said that it could redirect traffic away from a failed virtual machine in the event of a power outage.
This work could also lead to a new class of virtual network providers, said Michael Cote, an industry analyst with Redmonk. "Virtual networks, and TP in particular, significantly lower the barrier for a service provider to deploy its own network and to easily reconfigure or reprovision the network when needs or traffic demands change."
How Transit Portal Works
Today, large network operators can control network routing for applications such as gaming, content distribution, or failover. But this kind of flexibility comes at the cost of acquiring a network footprint and IP address space, negotiating contracts with ISPs, and installing and configuring routers. To address these challenges, the TP project replaces a data center’s border router with a portal that gives services the illusion of direct upstream connectivity.
The project currently has TP research installations at US test sites in Georgia, New Jersey, and Wisconsin and in Japan. Valancius said they are making the code available for other researchers that would like to participate.
A TP enables smaller networks to use anycast for dynamically reassigning IP addresses as needed. Traditionally, anycast was only available to users with network routers in multiple networking exchanges around the world by establishing DNS-server replicas and peering in multiple locations. With TP, specialized virtual routers work in conjunction with each application to manage how network traffic is routed.
Many Challenges
Several technical and business challenges must be addressed before virtual routing becomes commonplace. Valancius said the current research imposes too much burden to learn the routes and figure out which ones are best. Tools for gathering and sharing information about route performance must be developed.
"We need a better understanding of the economics of interconnection," Feamster added. Infrastructure providers will need better accounting mechanisms for determining how to charge users for these resources.
Other concerns include giving users too much network control. According to Warfield, the TP research has to "address the policy-enforcement issues around allowing relatively more novice users to safely reconfigure BGP [border gateway protocol] advertisements without introducing errors or instability into the broader Internet."
Nor will TP solve all latency problems. Cote said that even if a cloud provider could guarantee a certain service level on its own network, "there are so many other network segments between the cloud data center and the end users that other things could happen to reduce the effect. If you're a cloud customer paying this premium for a better network, and some part of the network outside of the cloud provider's control screws up, you'll probably want your money back, even if it's not the cloud provider's fault."
In the long run, Cote expects this kind of capability to benefit the cloud market. "While cloud is initially sold as being cheap and simple," he said, "there's much theoretic usefulness in it having as many levers and dials to turn as possible to customize your exact cloud-based services to the money you want to spend."
More broadly, the TP work opens the door for all sorts of dynamic infrastructure services that live in the cloud, said Warfield. "I hope that we see other approaches similar to this, not just for networking but also for applications like storage, databases, and high availability. I also hope that cloud providers see systems like this as useful and provide the necessary APIs for researchers to develop innovative, low-level services within the cloud."
To learn more about TP, visit http://valas.gtnoise.net/tp.
George Lawton is a freelance correspondent in Guerneville, California. You can contact him via his website http://www.glawton.com.