^{1}], [

^{2}]. The indecomposability of EFMs implies that the deletion of any single reaction contributing to an EFM completely disables it. A cut set (CS) is defined as a set of reaction deletions that eliminates all target EFMs to be killed. A CS is called a minimal cut set (MCS) if none of its subsets is a CS [

^{3}]. MCSs are of special interest for metabolic engineering, as they require the least effort when biologically implemented. It has been shown that CSs are especially useful for two types of tasks:

1. The calculation of all MCSs of an unconstrained system, which means that every determined MCS kills all EFMs of the investigated systems. Numerous applications of this method have been suggested, for example, evaluating structural robustness and fragility, identifying targets in rational drug design, and predicting phenotypes [ ^{4}].

2. EFMs can be utilized to split the functional units of a metabolic system into two groups: a) desired functions and b) unwanted functions [ ^{5}]. A constrained cut set (cCS) is defined as a set of reaction deletions that eliminates all unwanted EFMs and keeps—at least some of—the desired EFMs. Such a strategy can be used to optimize microbiological production hosts to efficiently produce substances of interest, for example, ethanol [ ^{6}]. This is achieved by assigning all EFMs that are involved in efficient ethanol production to the set of desired EFMs. All other EFMs are assigned to the set of unwanted EFMs. Consequently, the biological implementation of the computed MCSs of such a system results in an optimized microorganism. Therefore, MCSs are a valuable tool to identify and realize optimal gene intervention strategies, as has been shown by Trinh et al. [ ^{7}].

^{8}]. Even medium-sized networks with approximately 100 reactions can easily have several hundred million EFMs. The huge number of EFMs makes the computation of MCSs a challenge and results in very long runtimes that do not allow the use of MCSs for medium-scale or large-scale metabolic models. Recently, several methods have been developed to compute efficiently MCSs using hitting set algorithms [

^{4}], [

^{9}], [

^{3}], [

^{5}], [

^{10}], [

^{11}] and binary linear programming [

^{12}]. The performance evaluation of MCS computation algorithms is nontrivial and currently no methods are known that are output-polynomial [

^{10}]. In this study, we introduce an improved version of the hitting set approach that was originally reported by Haus et al. [

^{3}]. The approach of Haus et al. significantly increased the performance of MCS programs by utilizing Berge's algorithm [

^{13}].

1. the set of all EFMs to be killed,

2. the set of preminimal cut sets (preMCS), which kill all EFMs that have already been processed, and

3. the new preMCS candidates.

^{3}]. Consequently, a superset test is required to guarantee that all CSs are MCSs. Using a linear search approach to find subsets in the set of existing preMCS results in a search time of . Various other subset search approaches that scale better than the linear approach have been reported, for example, in [

^{14}] and [

^{15}] methods were reported that can identify subsets in time . To reduce the runtime spent in the subset search of Berge's algorithm, we implemented a binary bit pattern tree approach [

^{16}] that can be run in parallel on multiple CPU cores. The concept of binary bit pattern trees is not new, but we are not aware that they have been used in the context of MCS computation. By using our tree approach, the size of metabolic models that can be analyzed and genetically optimized by MCSs is pushed to new and significantly higher limits. Besides the importance of MCSs in metabolic engineering, the computation of MCSs is also an essential problem in discrete mathematics with many applications in computer science, artificial intelligence, and game theory [

^{17}]. These fields may also benefit from the idea of using binary bit pattern trees to speed up the computation of MCSs. In this study, we explain the main concept of our tree method and compare the performance of the tree method with a linear search algorithm.

^{13}] as demonstrated by Haus et al. [

^{3}]. In Berge's algorithm, the majority of the computation time is spent on the procedure that checks whether or not a new MCS candidate is a superset of an already determined preMCS [

^{3}] (see Table 2). If this is the case, then the new preMCS candidate is dismissed. Otherwise, the candidate is added to the list of preMCS.

Table 1. Implementation of the Superset Test Utilizing a Boolean AND-Operation

1. eliminating reactions that must not be knocked out, as this would result in the deletion of (too many) wanted modes,

2. removing duplicate modes,

3. removing modes that are supersets of other modes, and

4. combining duplicate reactions.

^{18}].

*regEfmtool*[

^{19}], [

^{20}]. The used network has been described in [

^{21}]. It models the central carbon metabolism of generic plant cells and consists of 78 reactions (33 reversible and 45 irreversible) and 55 internal metabolites. The network describes the following main pathways: TCA cycle, glycolysis, pentose phosphate, respiration, sucrose, and starch synthesis. All of them belong to one of these four compartments: vacuole, mitochondria, plastid, and cytosol. Transporters have been added for intercompartmental exchange. For the benchmark, we computed MCSs of the unconstrained system, which means that all EFMs of the system have to be killed. An unconstrained system has the effect that the preprocessing steps 1 to 3 mentioned earlier cannot be applied and, hence, the size of the system is only marginally reduced during preprocessing. Consequently, using an unconstrained system results in a high computational workload. The total number of MCSs of the chosen system is 2,815,375, where the minimum and maximum number of knockouts is 4 and 18, respectively (see Fig. 5). However, because of preprocessing step 4, which combines duplicate reactions, the Berge algorithm only needs to compute 93,009 MCSs. In a postprocessing step these 93,009 MCSs are expanded to the final number of 2,815,375. The concept of CS expanding is illustrated in Fig. 4, which shows two networks. Network (a) depicts the original toy system that has two EFMs and seven MCSs that remove both EFMs: , , , , , , and . Network (b) is the toy system after combining duplicate reactions, which only has three MCS: , , and . However, in a postprocessing step the three MCSs can be expanded to obtain the full set of MCSs of the original network. Using the compressed network instead of the full network results in a strong reduction of the program's total execution time [

^{18}].

^{3}]. The time lost managing the bit pattern tree accumulated to approximately 40 seconds. A part of these 40 seconds is spent in the candidate generation procedure, where invalid preMCSs are eliminated and, hence, must be removed from the tree. The other task of the tree management is to add valid candidates to the preMCS tree.

Table 2. Comparison of MCS Computation Runs with (a) Linear Search, (b) Tree Search with Random Bitorder, and (c) Tree Search with a Bit Order Derived from EFM Properties for a System with 114,614 EFMs

*tree—EFM*. Test case (b)

*tree—random*used a random bit order. For test case (b), five runs with varying bit orders were used and the results averaged. To obtain comparable results, the same random bit order was used for each set of runs with varying numbers of threads. Using the bit order derived from the reaction frequency resulted in a performance gain of approximately a factor 3 compared to the random bit order.

*P. pastoris*model has 15 million EFMs and roughly 1.8 million MCS, while our

*A. thaliana*contains only 1.7 million EFMs but has 3.7 billion MCSs. Note that we terminated the linear search runs of the two large systems, as the execution time was extraordinarily high. Based on the average time spent to search for a single candidate in the set of preMCSs, an extrapolated runtime was estimated. Table 3 shows that the tree approach outperforms the linear search algorithm. Moreover, the tree approach allows to study metabolic systems that could not be analyzed with a linear search.

Table 3. Comparison of MCS Computation with Linear Search and with Tree Search for Different Metabolic Networks Using 10 Parallel Threads

^{3}]. The bottleneck of the Berge algorithm is the search procedure that is required to guarantee that all computed MCSs are truly minimal. This is done by performing a test that verifies that a new MCS candidate is not a superset of any of the already determined preMCS [

^{5}]. A simple but slow approach to test a new candidate against all already existing preMCSs is to linearly iterate through the complete set of existing preMCSs and check if this set contains a subset of the candidate. To speed up the superset test, we stored the existing preMCSs in a binary tree structure. This tree structure can be used to skip a significant amount of the necessary comparison events between a candidate and the existing preMCSs. Consequently, the total runtime for the computation of the complete set of MCSs is strongly reduced, and hence, numerical analyses and gene intervention strategies of much bigger systems can be calculated by our approach. Note that the determination of the MCSs requires the calculation of the EFMs first. Even though significant progress has been made in recent years on the efficient computation of EFMs [

^{16}], [

^{22}], [

^{23}], in many situations the EFM computation is the limiting factor of MCS analyses. However, recently alternative approaches have been reported which are based on the duality of EFMs and MCSs [

^{3}], [

^{24}]. These approaches allow the computation of MCSs directly from the stoichiometric matrix without having to calculate the EFMs beforehand.

^{17}]. These fields might also benefit from the idea of using binary bit pattern trees to speed up the computation of MCSs.

# Acknowledgments

*C. Jungreuthmayer and J. Zanghellini are with the Department of Biotechnology, University of Natural Resources and Life Sciences, Vienna, Austria, and with the Metabolic Modeling Group, Austrian Centre of Industrial Biotechnology (ACIB), Muthgasse 11/DG, 1190 Vienna, Austria. E-mail: christian.jungreuthmayer@acib.at.*

*M. Beurton-Aimar is with the Laboratoire Bordelais de Recherche en Informatique (LABRI), University of Bordeaux, 351, cours de la Liberation, Batiment A30, Bureau 310, Talence 33400, France.*

*Manuscript received 3 Apr. 2013; revised 9 July 2013; accepted 4 Sept. 2013; published online 18 Sept. 2013.*

*For information on obtaining reprints of this article, please send e-mail to: tcbb@computer.org, and reference IEEECS Log Number TCBB-2013-04-0108.*

*Digital Object Identifier no. 10.1109/TCBB.2013.116.*

#### References

| |||