Issue No.06 - June (2002 vol.13)
<p>The problem of question/answering (Q/A) is to find answers to open-domain questions by searching large collections of documents. Unlike information retrieval systems very common today in the form of Internet search engines, Q/A systems do not retrieve documents, but instead provide short, relevant answers located in small fragments of text. This enhanced functionality comes with a price: Q/A systems are significantly slower and require more hardware resources than information retrieval systems. This paper proposes a <it>distributed Q/A architecture</it> that enhances the system throughput through the exploitation of <it>interquestion parallelism</it> and <it>dynamic load balancing</it> and reduces the individual question response time through the exploitation of <it>intraquestion parallelism</it>. Inter and intraquestion parallelism are both exploited using several scheduling points: one before the Q/A task is started and two embedded in the Q/A task. An analytical performance model is introduced. The model analyzes both the <it>interquestion parallelism overhead</it> generated by the migration of questions and the <it>intraquestion parallelism overhead</it> generated by the partitioning of the Q/A task. The analytical model indicates that both question migration and partitioning are required for a high-performance system: Intraquestion parallelism leads to significant speedup of individual questions, but it is practical up to about 90 processors, depending on the system parameters. The exploitation of intertask parallelism provides a scalable way to improve the system throughput. The distributed Q/A system has been implemented on a network of 16 Pentium III computers. The experimental results indicate that, at <it>high system load</it>, the dynamic load balancing strategy proposed in this paper outperforms two other traditional approaches. At <it>low system load</it>, the distributed Q/A system reduces question response times through task partitioning, with factors close to the ones indicated by the analytical model.</p>
Distributed question answering, load balancing, migration, partitioning.
Mihai Surdeanu, Dan I. Moldovan, Sanda M. Harabagiu, "Performance Analysis of a Distributed Question/Answering System", IEEE Transactions on Parallel & Distributed Systems, vol.13, no. 6, pp. 579-596, June 2002, doi:10.1109/TPDS.2002.1011413