This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Performance Analysis of a Distributed Question/Answering System
June 2002 (vol. 13 no. 6)
pp. 579-596

The problem of question/answering (Q/A) is to find answers to open-domain questions by searching large collections of documents. Unlike information retrieval systems very common today in the form of Internet search engines, Q/A systems do not retrieve documents, but instead provide short, relevant answers located in small fragments of text. This enhanced functionality comes with a price: Q/A systems are significantly slower and require more hardware resources than information retrieval systems. This paper proposes a distributed Q/A architecture that enhances the system throughput through the exploitation of interquestion parallelism and dynamic load balancing and reduces the individual question response time through the exploitation of intraquestion parallelism. Inter and intraquestion parallelism are both exploited using several scheduling points: one before the Q/A task is started and two embedded in the Q/A task. An analytical performance model is introduced. The model analyzes both the interquestion parallelism overhead generated by the migration of questions and the intraquestion parallelism overhead generated by the partitioning of the Q/A task. The analytical model indicates that both question migration and partitioning are required for a high-performance system: Intraquestion parallelism leads to significant speedup of individual questions, but it is practical up to about 90 processors, depending on the system parameters. The exploitation of intertask parallelism provides a scalable way to improve the system throughput. The distributed Q/A system has been implemented on a network of 16 Pentium III computers. The experimental results indicate that, at high system load, the dynamic load balancing strategy proposed in this paper outperforms two other traditional approaches. At low system load, the distributed Q/A system reduces question response times through task partitioning, with factors close to the ones indicated by the analytical model.

[1] A. Acharya and S. Setia, “Availability and Utility of Idle Memory in Workstation Clusters,” Proc. ACM SIGMETRICS Conf. Measuring and Modeling of Computer Systems, May 1999.
[2] D. Andresen and T. Yang, “SWEB++: Partitioning and Scheduling for Adaptive Client-Server Computing on WWW,” Proc. 1998 SIGMETRICS Workshop Internet Server Performance, June 1998.
[3] D. Andresen, T. Yang, O. Ibarra, and O. Egecioglu, “Adaptive Partitioning and Scheduling for Enhancing WWW Application Performance,” J. Parallel and Distributed Computing, vol. 49, no. 1, Feb. 1998.
[4] D.P. Bertsekas and J.N. Tsitsiklis, Parallel and Distributed Computation.Englewood Cliffs, N.J.: Prentice Hall International, 1989.
[5] T. Brisco, “RFC 1794: DNS Support for Load Balancing,” Apr. 1995.
[6] E. Brown and H. Chong, “The GURU System in TREC-6,” Proc. Sixth Text Retrieval Conf. (TREC), Nov. 1997.
[7] B. Cahoon, K. McKinley, and Z. Lu, “Evaluating the Performance of Distributed Architectures for Information Retrieval Using a Variety of Workloads,” ACM Trans. Information Systems, vol. 18, no. 1, Jan. 2000.
[8] J. Callan, M. Connell, and A. Du, “Automatic Discovery of Language Models for Text Databases,” Proc. ACM-SIGMOD Int'l Conf. Management of Data, pp. 479-490, 1999.
[9] J. Callan, Z. Lu, and W. Croft, “Searching Distributed Collections with Inference Networks,” Proc. 18th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 21-28, 1995.
[10] N. Carriero, E. Freeman, D. Gelernter, and D. Kaminsky, “Adaptive Parallelism and Piranha,” Computer, vol. 28, no. 1, pp. 40-49, Jan. 1995.
[11] S. Chen, L. Xiao, and X. Zhang, “Dynamic Load Sharing with Unknown Memory Demands of Jobs in Clusters,” Proc. 21st Ann. Int'l Conf. Distributed Computing Systems (ICDCS 2001), pp. 109-118, 2001.
[12] F. Douglis and J. Ousterhout, “Transparent Process Migration: Design Alternatives and the Sprite Implementation,” Software—Practice and Experience, vol. 46, no. 2, 1997.
[13] M. Faerman, A. Su, R. Wolski, and F. Berman, “Adaptive Performance Prediction for Distributed Data-Intensive Applications,” Proc. Supercomputing '99, Nov. 1999.
[14] L. Gravano and H. Garcia-Molina, “Generalizing GLOSS to Vector-Space Databases and Broker Hierarchies,” Proc. 21st Int'l Conf. Very Large Databases (VLDB), pp. 78-89, 1995.
[15] L. Gavarno, H. Garcia-Molina, and A. Tomasic, "The Effectiveness of Gloss for the Text Database Discovery Problems," Proc. ACM Sigmod 94, ACM Press, New York, May 1994, pp. 126-137.
[16] S. Harabagiu, D. Moldovan, M. Pasca, R. Mihalcea, M. Surdeanu, R. Bunescu, R. Girju, V. Rus, and P. Morarescu, “FALCON: Boosting Knowledge for Answer Engines,” Proc. Text Retrieval Conf. (TREC-9), Nov. 2000.
[17] S. Harabagiu, M. Pasca, and S. Maiorano, “Experiments with Open-Domain Textual Question Answering,” Proc. 18th Int'l Conf. Computational Linguistics, COLING-2000, Aug. 2000.
[18] M. Harchol-Balter and A. Downey, “Exploiting Process Lifetime Distributions for Load Balancing,” ACM Trans. Computer Systems, vol. 3, no. 3, 1997.
[19] J. Hawking, N. Craswell, and P. Thistlewaite, “Overview of TREC-7 Very Large Collection Track,” Proc. Text Retrieval Conf. (TREC-7), Nov. 1998.
[20] C. Hui and S. Chanson, “Improved Strategies for Dynamic Load Sharing,” IEEE Concurrency, vol. 7, no. 3, 1999.
[21] P. Jogalekar and M. Woodside, “Evaluating the Scalability of Distributed Systems,” IEEE Trans. Parallel and Distributed Systems, vol. 11, no. 6, June 2000.
[22] T. Kunz, “The Influence of Different Workload Descriptions on a Heuristic Load Balancing Scheme,” IEEE Trans. Software Engineering, vol. 17, no. 7, pp. 725-730, July 1991.
[23] F.C.H. Lin and R.M. Keller, “The Gradient Model Load Balancing Method,” IEEE Trans. Software Eng., vol. 13, no. 1, pp. 32-38, Jan. 1987.
[24] P.K.K. Loh, W.J. Hsu, C. Wentong, and N. Sriskanthan, “How Network Topology Affects Dynamic Load Balancing,” IEEE Parallel and Distributed Technology: Systems and Applications, vol. 4, no. 3, pp. 25-35, Fall 1996.
[25] R. Luling, B. Monien, and F. Ramme, “Load Balancing in Large Networks: A Comparative Study,” Proc. Third IEEE Symp. Parallel and Distributed Processing, pp. 686-689, Dec. 1991.
[26] D. Moldovan, S. Harabagiu, M. Pasca, R. Mihalcea, R. Goodrum, R. Girju, and V. Rus, “The Structure and Performance of an Open-Domain Question Answering System,” Proc. 38th Ann. Meeting Assoc. for Computational Linguistics, pp. 563-570, Oct. 2000.
[27] D. Moldovan, S. Harabagiu, M. Pasca, R. Mihalcea, R. Goodrum, R. Girju, and V. Rus, “LASSO: A Tool for Surfing the Answer Net,” Proc. Text Retrieval Conf. (TREC-8), Nov. 1999.
[28] F. Muniz and E.J. Zaluska, “Parallel Load Balancing: An Extension to the Gradient Model,” Parallel Computing, vol. 21, pp. 287-301, 1995.
[29] NIST The Text REtrieval Conf. http:/trec.nist.gov/, 2002.
[30] NIST, The ZPRISE 2.0 home page.http://www-nlpir.nist.gov/works/papers/zp2 main.html, 2002.
[31] V.A. Saletore, “A Distributed and Adaptive Dynamic Load Balancing Scheme for Parallel Processing of Medium-Grain Tasks,” Proc. Fifth Distributed Memory Computing Conf., pp. 995-990, Apr. 1990.
[32] A. Tanenbaum, Distributed Operating Systems. Prentice-Hall, 1995.
[33] G. Voelker, “Managing Server Load in Global Memory Systems,” Proc. ACM SIGMETRICS Conf. Measuring and Modeling of Computer Systems, May 1997.
[34] E. Voorhees and D. Harman, “Overview of the Ninth Text REtrieval Conference,” Proc. Text Retrieval Conf. (TREC-9), Nov. 2000.
[35] M. Willebeck-LeMair and A. Reeves, “Strategies for Dynamic Load Balancing on Highly Parallel Computers,” IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 9, pp. 979-993, Sept. 1993.
[36] L. Xiao, X. Zhang, and Y. Qu, “Effective Load Sharing on Heterogenous Networks of Workstations,” Proc. 14th Int'l Parallel and Distributed Processing Symp. (IPDPS 2000), May 2000.
[37] X. Zhang, Y. Qu, and L. Xiao, “Improving Distributed Workload Performance by Sharing both CPU and Memory Resources,” Proc. 20th Int'l Conf. Distributed Computing Systems (ICDCS 2000), Apr. 2000.

Index Terms:
Distributed question answering, load balancing, migration, partitioning.
Citation:
Mihai Surdeanu, Dan I. Moldovan, Sanda M. Harabagiu, "Performance Analysis of a Distributed Question/Answering System," IEEE Transactions on Parallel and Distributed Systems, vol. 13, no. 6, pp. 579-596, June 2002, doi:10.1109/TPDS.2002.1011413
Usage of this product signifies your acceptance of the Terms of Use.