New Product Uses Light Connections in Blade Server
by George Lawton
A company has delivered the first version of a product that uses light signals, instead of cables and switches, to connect blade-server nodes. Lightfleet has sold and installed its Beacon prototype, a 32-node server, to Microsoft Research. Beacon uses the company's Direct Broadcast Optical Interconnect (DBOI).
"The key innovation is the broadcast simultaneity," said Lightfleet CEO John Peers. "This is not a switch in the traditional sense. It's a permanent continuous interconnect. There's no switch so there’s no congestion. Instead of having to create and maintain a pathway for a connection, all of the connections are maintained all of the time."
Because DBOI uses broadcast light signals to communicate, it reduces power consumption, latency, and data skew compared with other interconnect technologies. More important, it simplifies memory sharing among multiple nodes in a cluster, blurring the line between symmetric multiprocessing (SMP) and a traditional cluster supercomputer.
According to Lightfleet senior fellow Bill Dress, "We get the benefits of the computing power of an SMP but without the limits of the bus and the OS single point of failure."
Interconnect technologies are fundamental to system performance. Even if a server has fast chips and a large amount of memory, slow interconnects will reduce overall performance for many types of modeling and simulation applications. "DBOI enables an architectural shift, rather than just providing a faster interconnect," noted Bob Laliberte, senior analyst for Enterprise Strategy Group an industry analysis firm.
How DBOI Fits In
One big challenge in high-performance computing (HPC) systems lies in optimizing the communications between multiple processors. High-end SMP systems use specially designed (and expensive) buses and crossbar switches that let multiple processing cores share an operating system and memory.
Alternatively, multiple commodity-grade compute blades can be clustered using interconnect technologies such as Ethernet, Infiniband, or Fibre Channel. Each blade has a wire or fiber connecting to a centralized switch, which passes the messages along to other blades. However, this approach can send messages only from one point to another. If a particular message needs to go to multiple blades, the switch has to make copies and forward each one to the appropriate destination. Another solution is to use a special address and an unreliable broadcast mode to all nodes and let the application software determine whether the message is for the receiving host.
In contrast, DBOI broadcasts all of the traffic from one blade to all the other blades simultaneously. Each blade is connected via a PCIe backplane to the DBOI system, and a laser beam modulates the DBOI broadcast data from the host. The beam bounces off the system’s rear surface, and all 32 nodes can read it simultaneously. At this point, the individual nodes' receiving hardware determines whether the node is participating in this message. If so, it passes the message to the host; if not, it discards the message.
In the prototype Microsoft implementation, each of four DBOI modules contains eight transmitters and 32 receivers handling communications for a set of eight compute nodes. The 32 separate receivers in each set allow a node to spatially differentiate between the signals from the different nodes' transmitters.
Electronics underneath the transceiver identify and forward packets destined for a particular node. Each node transmits slightly faster than a single PCIe lane but uses all available PCIe lanes to move data between the DBOI module and host. In the prototype, each host uses a four-lane PCIe 1.1 interface for a gross data rate to and from the host of 10 gigabits per second (Gbps). Future versions will move to eight lanes of PCIe 2.0 or 3.0, increasing the host bandwidth. In the logical configuration, each host monitors all 32 channels simultaneously but the protocol electronics forwards only the messages that an individual host needs to receive.
The transmitter uses direct-modulation techniques on a laser. A set of lenses causes each beam to arrive at a different location on the optical side of the DBOI module. The optics creates a space-division multiplexer that lets receivers differentiate the individual signals. Because the signals don't interfere with each other, the communication path remains open.
DBOI differs from optical networking technologies such as Intel's Light Peak, because the broadcast optics operates in a defined space rather than a point-to-point fiber-optic connection.
Specific DBOI Improvements
DBOI could change the programming model for HPC applications. Its broadcast capability supports a publish/subscribe model, giving every node equal access to the data. Its similarities to a shared-memory SMP system simplify programming across HPC clusters in comparison with Message Passing Interface. Programmers don't have to account for the data-arrival skew, so they can concentrate on the application’s logical operation.
"You can't do broadcasting with any other interconnect," said Brian Garrett, vice president of ESG Labs. "It wasn't easy for every node to have a shared-memory view of a big problem before. In general, you passed messages around."
DBOI also reduces the data-arrival skew across system nodes. This improves performance when synchronizing multiple processes — for example, in a relational database. In a traditional interconnect, the path each message travels can change, which can affect the precision of data-block synchronization. With DBOI, the message-travel distance never changes, and the data skew is a few nanoseconds compared with microseconds in an Infiniband system.
Lightfleet's Peers estimates that the prototype design consumes one-third the power of comparable blade systems. By eliminating an external switch, BDOI eliminates all the power required to move signals down copper wires. In addition, it reduces the communications overhead of sending the same data to multiple nodes.
Initial DBOI implementations will focus on HPC applications that involve considerable communication across nodes, such as financial simulations on Wall Street. As the prices come down, the technology could be ported to high-end servers as a way to increase speed and lower costs. Eventually, Peers expects the technology to find its way into routers and other communications-intensive equipment.
George Lawton is a freelance technology writer based in Guerneville, California. Contact him at http://glawton.com.