DCN Gets Ready for Production
The global R&D community is getting ready for a new network scheduling and networking research tool when the Dynamic Circuit Network (DCN) goes into production this fall. The technology has been in use by Internet2, ESnet (formerly Energy Sciences Network), and GÉANT (European Research and Education Internet) for the last several years. DCN's official launch will include a new name, Internet2 ION (interoperable on-demand network).
DCN has let researchers and network engineers automatically provision dedicated circuits across networks to support large data transfers and other bandwidth-intense applications. Users can reserve network circuits using a Web interface that talks to underlying routers via the Inter-Domain Controller (IDC) protocol. Current DCN implementations have been built on top of ESnet’s On-demand Secure Circuits and Advanced Reservation System (Oscar) and Dynamic Resource Allocation via GMPLS Optical Networks (Dragon) developed by Mid-Atlantic Crossroads (MAX), an advanced networking consortium of 13 higher education and research institutions in the US Washington DC region.
Eric Boyd, Internet2's deputy technical officer, said, "While this is a valuable technology, it is only valuable as far as it gets a widely deployed footprint. Many of the research efforts in this area had a small footprint. We wanted to get something that would allow you to provision circuits across many networks."
The goal is to make DCN technology available across research networks worldwide. In the research and education space, the end-to-end paths of interest typically comprise 5 to 7 networks. Boyd said network administrators that know how to provision a virtual LAN on top of their routers have the expertise to connect their network to the DCN infrastructure. Once the IDC software has been installed, researchers could automatically connect across a larger DCN.
IDC enables equipment from different manufacturers to plug into a DCN. For example, ESnet is running their network on top of Juniper MS routers, while Internet2 is built with Ciena routers. Boyd said, "If you can build a VLAN by hand, this allows you to do it automatically in software."
Chin Guok, ESnet's principal investigator for Oscar, explained, 'In today's routing, traffic takes the best path, but if all of the traffic goes over the best path, you get congestion. We allow you to traffic-engineer alternate paths that isolate the bandwidth and traffic. It allows you to do workflow scheduling so you have network guarantees to utilize the compute resources you have booked. We tell researchers to fire at will, which is not what you typically tell people on a best-effort routed network—it would look like a denial-of-service attack. But when they have the bandwidth and service guarantees, they have predictability and can provide redundancy as needed."
This effort could also open up R&D for various networking technologies across a wide area. Boyd said, "There are a lot of different technologies a researcher might want to try implementing. This is one tool in the toolkit that allows them to investigate and play around with these."
Chin said, "The whole idea is to change the paradigm of networks where you have circuits, so you can define the technology you want at your service. If you wanted to run strange protocols like InfiniBand or if you had specific requirements like Sonet, now the network can provide that dynamically for you."
DCN can also help reduce the costs of large-scale high-bandwidth networks. Chin said that one analysis found the cost of circuit-switching equipment was about 20 percent of what it cost for traditional routing. He said, "When we looked at the data, it made sense to look at using dynamic circuits for applications with huge data movements and offload them from our normal IP network."
DCN in Action
Shawn McKee, a research scientist with the University of Michigan, is also director for the US Atlas Great Lakes Tier-2 Center. AGL-T2 is one of the key US facilities that will be involved in analyzing data from the Large Hadron Collider (LHC) in Geneva, Switzerland. McKee said they've used DCN in demonstrations and as proof-of-principle regarding LHC (Atlas).
He noted, "We have had a bit of a chicken-and-egg problem in that DCN was not available when the Atlas architecture and software were being developed. The physicists and software engineers working on the overall computing framework for Atlas were forced to treat the network like a black box: put bits into the network at the source location and hope they were available at the destination when you needed them. Having any kind of dialogue with the network was not part of the design options available originally. Recent physics-network projects like UltraLight, Terapaths, and LambdaStation have focused on integrating the network as part of our computing infrastructure."
In high-energy physics, researchers have designed and built large, complex detectors to capture the result of those collisions in great detail. Most of the time, the particle collisions produce "ordinary" events. But on rare occasions, they reveal new and interesting physics phenomena. Because the large, complex set of detectors produces so much data and the interesting events they're looking for are so rare, the researchers have a huge problem akin to searching for needles in haystacks. As a result, they have to process petabytes of data on supercomputers around the world to look for new physics results. This requires a vast amount of storage to hold the data as well as a very large number of processors to reconstruct, validate, filter, and scan the data.
Because of the complexity and scale of the LHC and the experiments (detectors) that are built there, physicists have developed very large, globally distributed collaborations to access the data. They've adopted grid computing as a means to harness globally distributed resources (storage, computers, and networks) to try to meet these needs.
McKee said, "The opportunity offered by DCN can greatly simplify what we can expect from the network. Given the vast amounts of data, the large number of physicists involved, and the distributed nature of storage and processors, we need some mechanism to help us manage and coordinate our data flows between storage and computers. In the default best-effort network, it's very difficult to predict the time-to-completion for a multiterabyte data set copy. Many things impact that data flow across wide-area networks, and events can even disrupt the data transfer, causing it to fail. If that data is the source for a number of analysis tasks, the scheduling of those tasks is contingent on the data available at the cluster that will run the analysis jobs."
Because DCN creates point-to-point dynamic circuits, it can provide a well-defined connection of a specific bandwidth between a source and destination and, therefore, a well-defined time-to-completion for any given data set. Researchers can use that predicable transfer capability to schedule their use of processors and storage I/O much more effectively. Also, they don't have to worry about sharing that circuit with other traffic, so they can even use alternative transport technologies to help fully utilize the available bandwidth DCN provides.
A related use for DCN is the ability to prioritize critical data flows, explained McKee. "Given the large number of physicists and resources involved in Atlas, there will be many competing activities that our infrastructure must support. Some of these activities are central for Atlas as a whole. Creating the standard reconstructed data sets for all physicists to use is one example. If a problem in the first version of this reconstruction was found, necessitating a huge reprocessing effort with commensurate data movement, we would want to ensure that this happens at the highest priority."
The biggest challenge was making the system as simple as possible to use. Getting all the corresponding tools in place to help debug problems and monitor dynamics circuits will also be a challenge. But McKee believes these can be addressed because DCN only needs to scale across a group of high-impact users rather than across the whole Internet.
DCN is only as valuable as it is connected. As McKee noted, "DCN is much more valuable if all of those sites you need to interact with support it. Getting the underlying DCN capabilities accessible to most Internet2 (and ESnet) sites is the big challenge, but I think things are moving in the right direction."