This Article 
 Bibliographic References 
 Add to: 
Ordered Round-Robin: An Efficient Sequence Preserving Packet Scheduler
December 2008 (vol. 57 no. 12)
pp. 1690-1703
Jingnan Yao, Cisco Systems, San Jose
Jiani Guo, Cisco Systems, San Jose
Laxmi Narayan Bhuyan, University of California, Riverside, Riverside
With the advent of powerful network processors (NPs) in the market, many computation-intensive tasks such as routing table look-up, classification, IPSec, and multimedia transcoding can now be accomplished more easily in a router. An NP consists of a number of on-chip processors to carry out packet level parallel processing operations. Ensuring good load balancing among the processors increases throughput. However, such multiprocessing also gives rise to increased out-of-order departure of processed packets. In this paper, we first propose an Ordered Round Robin (ORR) scheme to schedule packets in a heterogeneous network processor assuming that the workload is perfectly divisible. The processed loads from the processors are ordered perfectly. We analyze the throughput and derive expressions for the batch size, scheduling time and maximum number of schedulable processors. To effectively schedule variable length packets in an NP, we propose a Packetized Ordered Round Robin (P-ORR) scheme by applying a combination of deficit round robin (DRR) and surplus round robin (SRR) schemes. We extend the algorithm to handle multiple flows based on a fair scheduling of flows depending on their reservations. Extensive sensitivity results are provided through analysis and simulation to show that the proposed algorithms satisfy both the load balancing and in-order requirements for parallel packet processing.

[1] G. Welling, M. Ott, and S. Mathur, “A Cluster-Based Active Router Architecture,” IEEE Micro, vol. 21, no. 1, Jan./Feb. 2001.
[2] Intel, Intel ixp2800 Network Processor, npfamilyixp2xxx.htm, 2008.
[3] IBM, “The Network Processor: Enabling Technology for High-Performance Networking,” 1999.
[4] Motorola, Motorola C-Port Corporation: C-5 Digital Communications Processor, , 1999.
[5] E. Blanton and M. Allman, “On Making TCP More Robust to Packet Reordering,” ACM SIGCOMM Computer Comm. Rev., vol. 32, pp. 20-30, Jan. 2002.
[6] S. Bohacek, J.P. Hespanha, and J. Lee, “A New TCP for Persistent Packet Reordering,” IEEE/ACM Trans. Networking, vol. 14, no. 2, pp. 369-382, Apr. 2006.
[7] T. Spalink, S. Karlin, L. Peterson, and Y. Gottlieb, “Building Robust Software Based Router Using Network Processors,” Proc. 18th Symp. Operating Systems Principles (SOSP '01), pp.216-229, Nov. 2001.
[8] M. Satyanarayanan, “Scalable, Secure, and Highly Available Distributed File Access,” Computer, vol. 23, no. 5, May 1990.
[9] L. Kencl and J.Y.L. Boudec, “Adaptive Load Sharing for Network Processors,” Proc. IEEE INFOCOM, 2002.
[10] J. Guo, F. Chen, L. Bhuyan, and R. Kumar, “A Cluster-Based Active Router Architecture Supporting Video/Audio Stream Transcoding Services,” Proc. 17th Int'l Parallel and Distributed Processing Symp. (IPDPS '03), Apr. 2003.
[11] B.A. Shirazi, A.R. Hurson, and K.M. Kavi, Scheduling and Load Balancing in Parallel and Distributed Systems. IEEE CS Press, 1995.
[12] J. Guo, J. Yao, and L. Bhuyan, “An Efficient Packet Scheduling Algorithm in Network Processors,” Proc. IEEE INFOCOM '05, vol. 2, pp. 807-818, Mar. 2005.
[13] B. Wu, Y. Xu, H. Lu, and B. Liu, “An Efficient Scheduling Mechanism with Flow-Based Packet Reordering in a High-Speed Network Processor,” Proc. IEEE Workshop High Performance Switching and Routing (HPSR), 2005.
[14] W. Shi and L. Kencl, “Sequence-Preserving Adaptive Load Balancers,” Proc. ACM/IEEE Symp. Architectures for Networking and Comm. Systems (ANCS '06), pp. 143-152, Dec. 2006.
[15] H. Adiseshu, G. Parulkar, and G. Varghese, “A Reliable and Scalable Striping Protocol,” Proc. ACM SIGCOMM '96, pp. 131-141, 1996.
[16] J.A. Cobb and M. Lin, “A Theory of Multi-Channel Schedulers for Quality of Service,” J. High Speed Networks, vol. 12, nos. 1,2, pp. 61-86, 2002.
[17] S. Iyer, A. Awadallah, and N. McKeown, “Analysis of a Packet Switch with Memories Running at Slower than the Line Rate,” Proc. IEEE INFOCOM '00, pp. 529-537, 2000.
[18] D. Khotimsky and S. Krishnan, “Evaluation of Open-Loop Sequence Control Schemes for Multi-Path Switch ES,” Proc. IEEE Int'l Conf. Comm. (ICC '02), vol. 4, pp. 2116-2120, 2002.
[19] Y.C. Cheng and T.G. Robertazzi, “Distributed Computation with Communication Delays,” IEEE Trans. Aerospace and Electronic Systems, vol. 24, no. 6, pp. 700-712, Nov. 1988.
[20] V. Bharadwaj, D. Ghose, V. Mani, and T.G. Robertazzi, Scheduling Divisible Loads in Parallel and Distributed Systems. IEEE CS Press, 1996.
[21] T.G. Robertazzi, “Ten Reasons to Use Divisible Load Theory,” Computer, vol. 36, no. 5, pp. 63-68, May 2003.
[22] M. Shreedhar and G. Varghese, “Efficient Fair Queuing Using Deficit Round Robin,” IEEE/ACM Trans. Networking, vol. 4, pp.375-385, June 1996.
[23] T. Wolf and M. Franklin, “Design Tradeoffs for Embedded Network Processors,” Proc. IEEE Int'l Conf. Architecture of Computing Systems (ARCS '02), vol. 2299, pp. 149-164, Apr. 2002.
[24] Cooperative Assoc. for Internet Data Analysis, http:/www.caida. org, 2008.

Index Terms:
Scheduling and task partitioning, Parallel Architectures, Processor Architectures, Computer Systems Organ, Load balancing and task assignment, Multiple Data Stream Architectures (Multiprocessors), multi-processor scheduling
Jingnan Yao, Jiani Guo, Laxmi Narayan Bhuyan, "Ordered Round-Robin: An Efficient Sequence Preserving Packet Scheduler," IEEE Transactions on Computers, vol. 57, no. 12, pp. 1690-1703, Dec. 2008, doi:10.1109/TC.2008.88
Usage of this product signifies your acceptance of the Terms of Use.