^{1}], [

^{2}], [

^{3}], [

^{4}], [

^{5}], [

^{6}], [

^{7}], [

^{8}], [

^{9}], [

^{10}], [

^{11}], [

^{12}], [

^{13}], [

^{14}], [

^{15}], [

^{16}], [

^{17}], [

^{18}], [

^{19}], [

^{20}], [

^{21}], [

^{22}], [

^{23}], [

^{24}], [

^{25}], [

^{26}], [

^{27}], [

^{28}], [

^{29}], [

^{30}], [

^{31}], [

^{32}]. The methods in previous studies have been used to construct all forms of a test to satisfy the same test constraints (e.g., the number of test items and the amount of test information) to ensure that all forms have equivalent qualities. Van der Linden and Boekkooi-Timminga [

^{6}] proposed a sequential method of constructing test forms using linear programming to minimize the fitting errors to the test constraints. The items that had been used for constructing the test were removed from the item bank and then the next test forms were constructed from the remaining items. This method was called “sequential construction.” However, there was a serious problem in that the fitting errors in these methods increased as the number of constructed test forms increased.

^{16}] and Armstrong et al. [

^{14}] proposed methods that simultaneously constructed all test forms to minimize the differences in the fitting errors on the test forms. The former used linear programming and the latter used network-flow programming. Although the differences in the fitting errors on the test forms were minimized, the computational costs of these methods exponentially increased as the size of the item bank or the number of test constraints increased.

^{19}] proposed a big-shadow-test (BST) method that sequentially constructed test forms by minimizing the difference in fitting errors between a current constructed test form and the remaining set of items in the item bank. BST mitigated the problem with computational costs, it did not fundamentally solve the problem.

^{30}] formalized the test constructions to maximize the number of test forms with nonoverlapping (i.e., neither of two test forms had a common item; otherwise, it was called an overlapping item) constraints as maximum set-packing problems. However, nonoverlapping conditions interrupt the generation of a sufficiently large number of test forms from an item bank. That is, nonoverlapping conditions interrupt the effective use of an item bank. To solve this problem, Ishii et al. [

^{20}] applied a maximum clique technique to the construction of multiple test forms. This method guaranteed the maximum number of test forms with overlapping items. However, the computational costs also exponentially increased as the item bank increased in size. This meant that it was difficult to apply the method in practice. Thus, although BST cannot guarantee the maximum number of test forms, it is still practically the most useful method. However, it is difficult to use BST with overlapping constraints. That is, the item bank cannot effectively be used by BST.

^{20}] and [

^{30}] using random search algorithm. Namely, the method cannot guarantee the maximum number of test forms, but asymptotically or approximately guarantees it. In addition, this method can be utilized for overlapping constraints.

^{33}] and Borovska [

^{34}] used Genetic Algorithm (GA) and Pirim et al. [

^{35}] used a tabu search. Moreover, some studies [

^{36}], [

^{37}], [

^{38}] have compared the efficiencies of random search algorithms using various optimization problems such as engineering optimization problems, a traveling salesman problem, and complex combination problems. The results of these studies revealed that Bees Algorithm (BA) provided the best accuracies for optimal solutions regardless of the lowest computational time compared with simulated annealing (SIM), GA, and Ant Colony algorithm (ANT). Consequently, BA has a strong possibility to alleviate the trade-off in constructing multiple test forms since the problems employed in [

^{36}], [

^{37}], [

^{38}], and the constructing multiple test forms are all combinatorial optimization problems and classified as NP-hard problems.

1. It alleviates the trade-off between computational costs and differences in fitting errors.

2. It approximately maximizes the number of test forms with overlapping constraints.

^{6}], [

^{13}], [

^{14}], [

^{16}], [

^{19}], [

^{30}].

(1)

(2)

(3)

^{6}] proposed a method that sequentially constructs test forms using linear programming to minimize the following fitting errors:

(4)

^{16}] proposed a method using linear programming that simultaneously constructed multiple test forms and minimized the differences in fitting errors.

(5)

(6)

^{19}] proposed a big-shadow-test method using linear programming that sequentially constructs test forms by minimizing the differences in fitting errors between a currently constructed test form and the set of items remaining in the item bank. They called their “shadow test form” the remaining items set.

(7)

^{30}] proposed a method that formalizes the construction of multiple test forms to maximize the number of test forms from an item bank as maximum set-packing problems. Although this method guaranteed the maximum number of test forms from an item bank, no items were allowed to overlap in the test forms. This interrupted the generation of a sufficiently large of number of test forms from the item bank. Consequently, nonoverlapping conditions interrupted the item bank from being effectively used.

^{20}] applied the maximum clique technique to the construction of multiple test forms. Nevertheless, the computational costs exponentially increased as the data size increased. Namely, this method is difficult to implement in practice.

1. In order to construct equivalent test forms, the traditional methods enable test forms to be constructed that minimize the differences in fitting errors between all forms. However, the differences in fitting errors decrease as the computational costs increase. That is, there is a trade-off between the differences in fitting errors between the test forms and the computational costs.

2. A maximum number of test forms from an item bank that cannot be guaranteed and overlapping constraints are difficult to be implemented. That is, the item bank cannot effectively be used in practice.

^{20}] and [

^{30}] using random search algorithm. Namely, the proposed method cannot guarantee the maximum number of test forms, but asymptotically or approximately guarantees it. In addition, the proposed method can be utilized for overlapping constraints.

^{33}] and Borovska [

^{34}] used GA and Pirim et al. [

^{35}] used a tabu search.

^{36}] compared GA and BA using engineering optimization problems. Wong et al. [

^{37}] compared ANT, GA, and BA using a traveling salesman problem. While Pham et al. [

^{38}] used complex combination problems to compare ANT, SIM, GA, and BA. The results of these studies revealed that BA provided the best accuracies for optimal solutions with the lowest computational time. Accordingly, BA has a strong possibility to alleviate the trade-off in constructing multiple test forms since the problems employed in [

^{36}], [

^{37}], [

^{38}], and the construction of multiple test forms are all combinatorial optimization problems and classified as NP-hard problems.

^{39}]. Honey bees live in a hive where they store honey that they have foraged. Honey bees can communicate the locations of food sources to their hive mates by performing a so-called “waggle dance.” The durations of this dance are proportional to the quantities of food at the sources. By engaging in this behavior, large groups of bees are recruited to forage sources that contain large quantities of food. This reduces the individual time required to forage for food.

^{37}], which proposed BA for solving the traveling salesman problem.

^{37}] for solving the traveling salesman problem in Fig. 1. First, artificial bees generate the initial population of routes (solutions) using a random search technique [

^{40}] to find cities they will visit next. Then, the initial population is evaluated to measure the total length of each route (fitness value). Second, the artificial bees iteratively improve the initial population. That is, the routes from the initial population are selected according to selection probabilities that are inversely proportional to the total lengths of the routes. After that, artificial bees are recruited to improve the selected routes. This method of recruiting is applied by observing the waggle dance of honey bees. The numbers of recruited artificial bees are inversely proportional to the total lengths of the selected routes. Then, the artificial bees generate a new population using a neighborhood-search technique [

^{41}] in which the artificial bees find shorter routes being influenced by the selected routes. Namely, the artificial bees select cities using selection probabilities that are inversely proportional to the distances between cities and the selection probabilities of cities that are included in the selected routes are higher than that of the other cities. The process in the second step is iterated until the stopping criterion is met.

**Step A.**The BA for Step A is outlined in Fig. 2. In this step, each artificial bee constructs one test form by sequentially selecting items that satisfy test constraints and minimize fitting errors until the construction of the test form is completed. As mentioned above, the test constraints in this step are only constraints about each test form. The first group of artificial bees constructs test forms using a random search and the later groups of artificial bees improve the constructed test forms using a neighborhood search. For more details, Step A is divided into the following five steps:

1. The item-selection probability of each remaining item is inversely proportional to the fitting error of the constructed test form if this form includes the remaining item.

2. The item-selection probability of each remaining item becomes zero if no test constraints are satisfied.

^{22}] method as

(14)

1. The rules in Step A-1.

2. If each remaining item is included in the selected test form, the selection probability of this item is higher than the selection probabilities of the other items.

^{37}] as

(15)

1. The fitting errors of the test forms are smaller than the smallest fitting error in the system memory.

2. The test forms are not the same as the stored test forms in the system memory.

**Step B.**The BA for Step B is outlined in Fig. 3. In Step B, the largest and most equivalent set of test forms, which minimizes the difference in fitting errors between test forms and maximizes the number of test forms, is extracted from the collection of test forms from Step A. The difference in fitting errors between test forms is indicated by a standard deviation, , of fitting errors. The test constraints satisfied in this step concern the relationships among test forms. Each artificial bee in the first group extracts a set of test forms by sequentially selecting them to minimize the standard deviation of fitting errors using a random search until this artificial bee cannot find any more available test forms. The later groups of artificial bees improve the extracted sets of test forms using a neighborhood search. To provide more detail, Step B is divided into additional five steps.

(22)

1. The rule in Step B-1.

2. If each remaining test form is included in the selected set of test forms, the form-selection probability of this form is higher than the form-selection probabilities of the other forms.

^{37}] as

(23)

^{42}]. The system had six units of computer nodes including one server and five workers and each unit was equipped with a 2.5-GHz Quad-Core Intel processor. The workers have a total of 20 processor cores.

^{43}] in the remaining experiments.

^{19}], the GA for constructing multiple test forms proposed by Sun et al. [

^{32}] ( ), and a GA based on a two-step test construction ( ) to demonstrate its accuracy and speed in constructing multiple test forms. simultaneously constructed multiple test forms to minimize the fitting errors and the difference in the fitting errors. Although some experiments in [

^{32}] proved that could construct multiple equivalent test forms quite well, the implemented test constraints and the implemented item banks were too simple for actual application. Here, we compared with the proposed method in this experiment. Moreover, we developed based on the two-step test construction described in Section 4.3 in which BA is replaced by GA to compare the performances of BA and general GA under the same conditions. BA, BST, , and were used to construct multiple test forms to minimize the fitting errors indicated by the sum of the absolute differences (SADs) between the expected test information function and the test information functions of the constructed test forms at five levels of ability, , and to minimize the difference in fitting errors indicated by the standard deviation of SADs in the constructed test forms. The test information function described in this paper is based on the two-parameter logistic model of IRT.

Table 1. Distributions of Item Parameters

Table 2. Details on Test Constraints

1. The differences in fitting errors of extracted sets of test forms are not lower than the smallest difference of fitting errors of stored set of test forms in the system memory.

2. The fitting errors of extracted sets of test forms are not lower than the smallest fitting error of stored set of test forms in the system memory.

^{44}]. The implemented parallel-computing environment for BST consisted of nine units of computer nodes including one server and eight workers. Each unit was equipped with a 2.5-GHz Quad-core Intel processor. The workers have a total of 32 processor cores.

Table 3. Results for Accuracy and Speed of Constructing Multiple Test Forms

Table 4. Results for Parameter Tuning for Accuracy of Constructing Multiple Test Forms

Table 5. Results for Processor Cores Related Performance

^{20}] proposed a method which guarantees the maximum number of possible constructed test forms with allowing the overlapping items. However, the computational time of the method increases exponentially as the size of the item bank increases. In this experiment, the method cannot provide the guaranteed maximum number of test forms from the item bank in reasonable time. On the other hand, BST requires a huge computational time when overlapping constraints are permitted. Therefore, we performed this experiment using only the proposed method.

Table 6. Results for Overlapping Test Construction

^{45}].

*P. Songmuang is with the Faculty of Human Sciences, Waseda University, 2-579-15, Mikashima, Tokorozawa-shi, Saitama-ken, 359-1192, Japan. E-mail: pokpong@aoni.waseda.jp.*

*M. Ueno is with the Department of Social Intelligence and Informatics, Graduate School of Information Systems, The University of Electro-Communications, 1-5-1, Chofugaoka, Chofu-shi, Tokyo 182-8585, Japan. E-mail: ueno@ai.is.uec.ac.jp.*

*Manuscript received 23 Dec. 2009; revised 22 Mar. 2010; accepted 3 Aug. 2010; published online 27 Aug. 2010.*

*For information on obtaining reprints of this article, please send e-mail to: lt@computer.org, and reference IEEECS Log Number TLT-2009-12-0205.*

*Digital Object Identifier no. 10.1109/TLT.2010.29.*

#### References

**Pokpong Songmuang**received the BEng degree from Thammasat University in 2003, the MEng degree from Nagaoka University of Technology in 2006, and the PhD degree in computer science from the University of Electro-Communications in 2010. He is currently an assistant professor at Waseda University. His research interests include e-testing, data mining, and web technologies.

**Maomi Ueno**received the PhD degree in computer science from the Tokyo Institute of Technology in 1994. He has been an associate professor in the Graduate School of Information Systems at the University of Electro-Communications since 2007. He has also worked at the Tokyo Institute of Technology (1994-1996), Chiba University (1996-2000), and the Nagaoka University of Technology (2000-2007). He received best paper awards from the 20th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2008), ED-MEDIA 2008, e-Learn2004, e-Learn2005, and e-Learn2007. His interests are in e-learning, e-testing, e-portfolio, machine learning, data mining, Bayesian statistics, Bayesian networks, and so on. He is a member of the IEEE.

| |||