Issue No. 06 - November/December (2004 vol. 21)

ISSN: 0740-7475

pp: 504-512

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/MDT.2004.97

Shuo Sheng , Mentor Graphics

Michael S. Hsiao , Virginia Tech

With the growing complexity and size of designs, the VLSI design industry has seen an increasing need for formal verification. Although designers often use simulation to validate designs, simulation alone cannot guarantee verification completeness. Furthermore, designers frequently overlook corner cases during simulation. (A corner case is a case or condition that is very difficult to reach and activate through general simulation effort.) Formal verification, on the other hand, attempts to verify the underlying design without a full-blown simulation. A formal verification technique successfully adopted in industry is model checking, which builds a finite-state model of a system and verifies that a property holds in that model. The checking is an exhaustive state space traversal, which is guaranteed to terminate because the system is finite. The technical challenge in model checking is finding data structures and algorithms that can completely represent and manipulate large state spaces.

Symbolic unbounded model checking based on binary decision diagrams (BDDs) is complete but is vulnerable to space explosion. Bounded model checking based on branch-and-bound techniques such as satisfiability (SAT) solving or ATPG is memory efficient but can be incomplete. We are interested in the problem of applying branch-and-bound methods to unbounded model checking to create complete algorithms. By nature, such algorithms require the engine to enumerate all solutions of a given SAT or ATPG problem to perform quantification of input variables. This problem is known to be #P-complete. (The " Model checking" sidebar provides more background.)

We focus here on preimage computation using ATPG. We have developed an algorithm that enables the ATPG engine to quickly compute all solutions. It learns from the solutions it has already found and skips subspaces that contain repeated solution structures. Because the algorithm's learning is invoked by solutions already found, we call it a success-driven-learning algorithm. Experiments in which we computed preimages for several large benchmark circuits have demonstrated the effectiveness of our approach.

Overview

ATPG is widely used in testing, and in terms of branch-and-bound searching, it is equivalent to a SAT solver. ATPG has two attractive features: it uses the structural information of the circuit in decision making, and it handles

*X*states more easily than a SAT solver. A branch-and-bound algorithm's most important capability is that it learns to prune redundant search space. In contrast to popular conflict-driven-learning algorithms used by state-of-the-art SAT solvers^{1}to prune search space containing no solution, our algorithm identifies and prunes redundant search space where solutions overlap. Thus, it is efficient for all-solution SAT problems. Moreover, our algorithm transforms the ATPG decision tree into a reduced free BDD, which becomes the desired preimage set's representation at the end of the ATPG call. The free BDD significantly reduces the memory required to store solutions.The prototype ATPG algorithm we use as the basis of our algorithm is PODEM (path-oriented decision making),

^{2}a branch-and-bound decision-making procedure that employs the structural information of the circuit. A naive way of using such a decision procedure to enumerate all solutions is to enforce a backtrack whenever a solution is found, so that the algorithm can continue searching for the next solution until the entire search space is exhausted. Researchers have proposed various search-space-pruning techniques to improve search efficiency.^{3}^{,}^{4}However, these methods (such as conflict-driven learning, dependency-directed nonchronological backtracking, and conflict clauses) aim at pruning conflict spaces, which contain no solution. In other words, these algorithms learn from mistakes. In the all-solution SAT problem scenario, this is far from sufficient because many subspaces contain solutions and they can overlap heavily. They do not conflict with one another.We explain this in Figure 1, which shows multiple solutions found by PODEM for a preimage computation SAT problem. The figure shows part of a decision tree. A numbered circle denotes a decision node, which is either a state element or a primary input (PI) in a sequential circuit; the number is the decision node's gate ID. The left branch corresponds to the decision 0, and the right branch corresponds to 1. A triangle denotes a conflict subspace (where no solution exists). A rectangle marks a solution's terminal node. Each solution is defined as a cube characterized by the decision path from the root node to a terminal node. There are three solutions under the left branch of decision node 107, marked Solution 1, Solution 2, and Solution 3, in the order they were found. PODEM found solutions 2 and 3 simply by enforcing a backtrack and having ATPG continue after Solution 1 was found.

However, when the ATPG engine continued to search and found solutions 4, 5, and 6, we observed an interesting phenomenon: These solutions repeated the same partial assignments for decision nodes 93, 139, 9, and 101. This implies that the subspace immediately before the two 93 nodes (marked A and B) are somehow equivalent. In other words, earlier node assignments (5 = 0, 6 = 0, 149 = 0, 108 = 0, 107 = 0) and (5 = 0, 6 = 0, 149 = 0, 108 = 1) resulted in the same circuit state. Therefore, if we could learn from the first three solutions, we would be able to skip digging into subspace B and directly return solutions 4, 5, and 6. When we advanced the search further, this phenomenon appeared again. We found that the entire subtree under leftmost decision node 149 (within the large dotted circle) was repeated again under the other three 149 nodes on the right (denoted by the three small dotted circles). Therefore, if we could learn the structure in the area inside the large dotted circle and its corresponding search space, we could avoid digging into the complete decision tree for the other three by backtracking earlier and returning all related solutions. The savings in enforced backtracks gained by these solutions will be enormous when many of them have repeated structures. This is our foundation for constructing a learning algorithm that captures this phenomenon and improves searching efficiency for the all-solution SAT problem.

Search State Equivalence

In a high-performance ATPG engine or SAT solver, learning plays an important role: The knowledge learned is stored and used for pruning search space in the future—for example, knowledge is represented and stored in the form of conflict clauses.

^{3}In our approach, we define the knowledge as equivalent search states.As Figure 1 demonstrated, different complete solutions can share the same partial solution structure. Figure 2 illustrates this phenomenon again in an example circuit fragment. There are four PIs (decision nodes) in this circuit:

*a*,*b*,*c*, and*d*. OR gate*z*is the primary output. Assume that we wish to derive all solutions for the objective*z*= 1. We observe that two different partial PI assignments, (*a*= 0,*b*=*X*,*c*= 1) and (*a*= 0,*b*= 0,*c*=*X*), will result in the same internal circuit state, (*g*= 0,*f*= 1,*h*=*X*). Then, to satisfy objective*z*= 1, we must set*d*= 1 in both cases, corresponding to the repeated partial solution at the bottom of the decision tree. We characterize an equivalent search state by its cut set with respect to the objective. In this example, the cut set is simply (*g*= 0,*f*= 1,*d*=*X*), which is a unique internal circuit state.This cut set consists of two parts: internal specified gates (

*g*= 0,*f*= 1) and unspecified PI gates (*d*=*X*). They are obtained by the following procedure:1. Beginning from objective site *z* (currently unsatisfied), backtrace to PIs along all the *X*-paths in its fan-in cone.

2. Record every specified input to all unspecified gate outputs encountered in this depth-first search (in our example, there is only one *X*-path, *z*- *h*- *d*; *f* = 1 is the specified input of unspecified gate *h*; and *g* = 0 is the specified input of unspecified gate *z*).

3. Record the unspecified PIs at the end of each *X*-path ( *d* = *X* in the example).

After this depth-first search, all marked gates (specified gates and unspecified PIs) and their values (1, 0,

*X*) define a cut set of the circuit state (shown by the dashed line in Figure 2). We define this cut set with respect to the objective rather than to the entire circuit, and it actually characterizes the*X*-path's cone to the objective from the inputs. Cut sets are stored in a hash table managed as the knowledge database. The algorithm uses this database to determine whether it has encountered an equivalent search space, and if so, it skips this search space.You might wonder why partial assignments with the same cut set will have the same assignment for remaining variables. To explain this, we must look at the basic PODEM algorithm. PODEM always backtraces from the unjustified objective, along one

*X*-path to the PIs, and chooses one as its next decision variable. Therefore, as long as the*X*-path's cone (the cut set we defined) is the same and we follow the same heuristics in backtracing (for example, always using Sandia controllability/observability analysis program [SCOAP] measurement in PODEM), the remaining decision variables and their chosen order will be exactly the same. Note that a random decision order should not be used here, or the order of the remaining variables will not be the same. Concepts similar to the cut set are the head line in the FAN algorithm^{2}and Marques Silva and Sakallah's dynamic head line.^{4}Success-driven Learning

The success-driven-learning algorithm determines when to invoke search-state-equivalence learning and when to reuse it. Success-driven learning is built into a basic recursive PODEM algorithm that implicitly enumerates the entire search space. Figure 3 shows the success-driven-learning algorithm.

Unlike the conventional PODEM, which returns either success or fail, this algorithm does not have a return value because it always enforces a backtrack when a solution is found so that it can search for the next solution. However, it does remember how many solutions it finds under each node in the decision tree. We set up a success counter pair for each decision node in the decision stack. This pair,

*SC*^{0}*i*and*SC*^{1}*i*, counts the total number of solutions found for the 0 and 1 branches of node*i*in the decision tree. For example, in Figure 1, for leftmost decision node 93 (marked A),*SC*^{0}(93) = 0 and*SC*^{1}(93) = 3. The subroutine update_success_counter() at step 1 in Figure 3 is called every time a solution is found. It increments all the success counters in the current decision stack. Using the same example in Figure 1, when the solution at the 0 (left) branch of leftmost decision node 139 (below point A) is found, update_success_counter() increments*SC*^{0}(139),*SC*^{1}(93),*SC*^{0}(107),*SC*^{0}(108),*SC*^{0}(149),*SC*^{0}(6),*SC*^{0}(5), and all such counters above decision node 5 to the top of the decision tree. Next, when the solution at the left branch of succeeding decision node 9 is found, update_success_counter() performs the same update again, except that*SC*^{1}(139) rather than*SC*^{0}(139) is updated, because it is on the right branch of node 139. In addition,*SC*^{0}(9) is initialized to 1. This mechanism synchronizes all success counters as they are dynamically maintained.When a decision node pops out of the decision stack, the subspace below it has been fully explored. Therefore, the sum of this node's two success counters should indicate the final numbers of all solutions in the subspace below it. If any of them is greater than 0 (indicating that at least one solution exists), we perform search-state-equivalence learning by computing the corresponding cut set, and we delete the success counters in the subtree. Step 6 subroutines check_success_counter() and update_search_state_database() perform this operation. We compute the equivalent search space only when there is at least one success (solution); hence, we call this algorithm success-driven learning. Because we allocate success counters only for decision nodes active in the current decision stack, and because only PIs and flip-flops can be decisions in PODEM, the maximum number of success counters managed is 2 × (

*PI*s +*FF*s).At step 2 in the algorithm, the function computes the current search state and looks it up in the knowledge database. If it detects a hit, it immediately returns and the entire subspace below is pruned, with solution BDD updated. At steps 4 and 5, subroutine imply() performs logic simulation and implication when a new decision variable is assigned a specified value. If it detects a conflict, it increments counter nBackTrack and skips the exploration of the subspace below. At step 6, update_search_state_database() creates a new entry in the hash table if a new search state cut set is found. Subroutine update_solution_BDD() at steps 1 and 2 constructs the BDD so as to record every partial solution found by the function. This BDD grows from partial to complete when the entire search space has been explored and the last decision node has been popped out of the decision stack.

Constructing a BDD

To avoid space explosion while constructing a BDD from the ATPG decision tree, we need a graph representation of the preimage set. The underlying idea of the construction process is the observation that the decision tree in Figure 1 resembles a free BDD, although this BDD might not be canonical.

Figure 4 shows the BDD node data structure. The structure contains six members. The first is the decision node label, which identifies the PI. The second is the address of the left child. The third is the number of total solutions under the left branch (the value passed from success counter

*SC*^{0}during success-driven learning). The next two fields are for the right branch. The last member is the index to the hash table that identifies the search state cut set.The success-driven-learning algorithm builds the BDD in a bottom-up fashion. Whenever a new solution is found, a BDD node is created for the leaf decision node and its associated search state. The new BDD node's parent nodes are created when the ATPG engine backtracks to higher levels in the decision tree and detects solutions at lower levels by using success counters. Figure 5 shows the solution BDD constructed in chronological order for the solutions found in the decision tree in Figure 1. It also shows the updating of the two success counters at each node. For example, in Figure 5a, when the first node is constructed, the left branch contains one solution (1), and the right branch contains no solution (0). The BDD grows gradually from a single node in Figure 5a to a complete graph in Figure 5f. When the last node (the root node in Figure 5f) is constructed, the sum of its two success counters is the total number of solutions (12 + 12 = 24) in this BDD. When the algorithm detects a search state equivalence and reuses this knowledge, subroutine update_solution_BDD() links the current decision node to an existing BDD node, instead of creating a new BDD node, as the dashed edges in Figure 5 illustrate. This reduction saves significant space for storing the overlapping portion of solutions.

What decides the variable ordering of this free BDD? The ATPG decision procedure implicitly decides the variable ordering when it picks a new decision variable, which is dynamically generated by subroutine get_obj() at step 3 in function success_driven_podem() in Figure 3. Function get_obj() is a standard ATPG function that uses controllability and observability as heuristics to find next-decision PIs through backtracing along

*X*-paths. Unlike SAT solvers, which choose the variable that directly satisfies the largest number of clauses, ATPG's decision variable selection is guided more by the structural information of the circuit and the objective. Researchers have shown that such testability-based heuristics often yield a good variable ordering for constructing compact shared ROBDDs.^{1}Experimental Results

We implemented the success-driven-learning algorithm together with a basic PODEM in 5,000 lines of C code. We conducted ATPG experiments on a 1.7-GHz Pentium 4 machine with 512 Mbytes of RAM and a Mandrake Linux 8.0 operating system. We designed the experiments to evaluate the proposed method's effectiveness and compare its performance with Bingo, a Colorado University Decision Diagram-based BDD package. We conducted the Bingo experiments on a Sun Ultra-1 200-MHz machine with 512 Mbytes of RAM.

In the first set of experiments, we computed one-cycle preimages for some properties of ISCAS benchmark circuit s1423, which has 74 flip-flops and 754 gates. The properties we used are random conjectures of 10 state variables. Therefore, each property actually specifies a set of target states. Given a set of target states specified by such a property, we applied the success-driven-learning algorithm to compute all preimages that can reach any of these target states. We also imposed the constraint that the preimages must satisfy the same property. This is a typical step for checking liveness properties in model checking, where every state in the path must include such a property.

Table 1 reports results for four s1423 properties. The "No. of solutions" columns report the number of solution cubes found by the ATPG engine. If the ATPG engine cannot exhaust all solutions within the 100,000 backtrack limit (for example, for properties 1, 3, and 4 with the original ATPG method), the "No. of solutions" columns reflect the maximum number of solutions the ATPG engine can enumerate within that backtrack limit.

Table 1. Computing preimages for s1423 (74 flip-flops, 754 gates).

Table 1 shows that success-driven learning can significantly reduce the computation necessary to exhaust all solutions. For example, for properties 1, 2, and 4, the original ATPG exhausted all 100,000 backtracks while finding only a very small subset of all the solution cubes. On the other hand, ATPG with success-driven learning computed the preimage completely in fewer backtracks and shorter execution times. The BDD sizes for storing all these solutions were also very small.

Next, we computed one-cycle preimages for a larger benchmark circuit, s5378, with 3,043 gates and 179 flip-flops. Table 2 shows the results. This time, we compared three methods. Again, we imposed a backtrack limit of 100,000.

Table 2. Computing preimages for s5378 (179 flip-flops, 3,043 gates) and s38417 (1,636 flip-flops, 23,950 gates).

Table 2 shows that compared with the original ATPG, success-driven learning significantly reduced execution time (a speedup of about two to three orders of magnitude) for finding all solutions, and was more memory efficient than Bingo. For example, for property 2, both the original ATPG and success-driven learning found all 7,967 solutions; however, the original ATPG took more than 18,000 backtracks and 22.6 seconds, while success-driven learning took only 200 backtracks, 0.42 seconds, and 10 Mbytes of memory to do the job. Bingo took 14.7 seconds and 27 Mbytes of memory. Bingo's BDD size is 230 nodes, whereas our general BDD contained 139 nodes.

For property 6, the original ATPG could not exhaust all solutions; it found 22,266 solutions before exceeding the 100,000 backtrack limit and took 60.49 seconds. ATPG with success-driven learning enumerated all 67,379,200 solutions in 1.03 seconds, using only 806 backtracks and 739 nodes to represent these solutions. Note that we didn't need a separate graph traversal to obtain those solution numbers—we needed only to add the two solution counters in our BDD's root node because they were synchronized to reflect the number of total solutions in the subtrees. Also, although ATPG trades time for space, our ATPG's performance (less than 2 seconds on a 1.7-GHz Pentium machine) was on the same order as Bingo's (about 14 seconds on a 200-MHz Sun UltraSparc). For all 10 properties, our ATPG completed the preimage computation using 10 Mbytes of memory for each property, whereas Bingo required 27 Mbytes for each property.

Finally, we performed a similar experiment for a property of circuit s38417, which contains 1,636 flip-flops and 23,950 gates. The bottom row of Table 2 shows the results. For this circuit, the 512-Mbyte memory limitation prevented Bingo from constructing the transition function, whereas our method successfully enumerated all 129,171,456 solutions within 0.77 seconds, using only 187 Mbytes of memory.

Conclusion

We believe our success-driven-learning ATPG method has the potential for solving large-scale hardware verification problems. To extend this work, we need an efficient way to iterate the preimage computation procedure in multiple cycles. Also, the method must address fixed-point checking, which is usually accomplished through a separate SAT check. Our ultimate target is a new unbounded model-checking framework based on ATPG.

We thank Koichiro Takayama at Advanced CAD Research, Fujitsu Labs of America, for providing the BDD experimental results. Our work was supported in part by NSF grant CCR-0196470 and by the New Jersey Commission on Science and Technology.

#### References

**Shuo Sheng**is a research and development staff engineer in the Design for Test Division of Mentor Graphics. His research interests include ATPG and formal verification. Sheng has a BS from Huazhong University of Science and Technology, Wuhan, China; an MS from Tsinghua University, Beijing, China; and a PhD in computer engineering from Rutgers University. He is a member of the IEEE and Sigma Xi.

**Michael S. Hsiao**is an associate professor in the Electrical and Computer Engineering Department of Virginia Tech. His research interests include design verification, testing, and architectural issues of digital systems. Hsiao has a PhD in electrical engineering from the University of Illinois at Urbana-Champaign.

| |||