Chair of Computer Architecture

University Freiburg

Lecture 05

Simple Translation of Formula into CNF

Formula formula2cnf_aux(Formula f) {
  if (is_cnf(f))
    return f;
  if (op(f) == AND) {
    l = formula2cnf_aux(left_child(f));
    r = formula2cnf_aux(right_child(f));
    return new_node(AND, l, r);
  } else {
    assert(op(f) == OR);
    l = formula2cnf_aux(left_child(f));
    r = formula2cnf_aux(right_child(f));
    return merge_cnf(l, r);
  }
}

formula2cnf_aux: This function recursively processes the formula. If the formula is already in CNF, it returns it. If the operation is AND, it recursively converts the left and right children to CNF and combines them with an AND operation. If the operation is OR, it processes the left and right children similarly but combines them using the merge_cnf function.

We assume that the input is in NNF. If there is an AND, then we take two conjuncts and calculate the CNF recursively. If there is an OR, then we compute the left and right child, and then we OR the two new CNFs.

l might be quadratic in left_child

r might be quadratic in right_child

With every recursive call where we hit OR, we square the size of the formula, which is even worse than exponential.

img/Untitled.png

Trivial case, we’re already done

img/Untitled%201.png

Here it gets more complicated, and we must use the algebraic distributivity law.

img/Untitled%202.png

In general:

Untitled

As a shorthand, we leave out the $\wedge$ symbol.

img/Untitled%204.png

img/Untitled%205.png

Here we have taken $l$ and $r$ from above and created a CNF, which expanded quadratically in size.

Merging two CNFs

Formula formula2cnf(Formula f) {
  return formula2cnf_aux(formula2nnf(f, 0));
}

Formula merge_cnf(Formula f, Formula g) {
  res = new_constant_node(TRUE);
  for (c = first_clause(f); c; c = next_clause(f, c))
    for (d = first_clause(g); d; d = next_clause(g, d))
      res = new_node(AND, res, new_node(OR, c, d));
  return res;
}

For binary tree-like formulas with ANDs and ORs, the algorithm has a worst-case time complexity of $\mathcal{O}(2^n)$, where $n$ is the depth of the tree. This is because, for every OR operation, we are squaring the size of the formula, leading to exponential growth.

img/Untitled%206.png

In practice, one way to potentially reduce the time and space complexity is to use a Directed Acyclic Graph (DAG) representation of the formula instead of a tree. This would allow sharing of common sub-formulas and could significantly reduce the formula size for certain input types. This, however, would require a more sophisticated implementation and is not represented in the provided pseudocode.

I will give you the contents of today’s lecture on SAT solving. Please create a “check your knowledge” section consisting of several questions that are meant for students to see if they have understood the material. You may also include small exercises and practical examples, if you think it is appropriate.

Today’s lecture was about:

img/Untitled%207.png

This computes the parity of 10 input bits. The function behind it is linear (if-then-else, the ternary operator). It is a BDD. There are nine if-then-else operators and nine negations. If you replace this with a tree, you get an exponential explosion and obtain $2^{10}=1024$ operators.

img/Untitled%208.png

img/Untitled%209.png

This is, again, an equivalence-checking task. Are the two circuits in these two boxes equivalent?

This thing is called a Miter. From “Satisfiability and Verification - From Core Algorithms to Novel Application Domains:”

Untitled

There was a paper about proof complexity, and the Tseitin transformation was a side note. This is one of the most important tricks for encoding problems into SAT. The basic idea is to use auxiliary variables; every sub-expression gets its variable.

The variable $o$ is an XOR, and it will only produce True if and only if its inputs differ. The constraints until pink $o$ are just setting up the semantics of the whole thing. It is satisfiable. The pink $o$ makes the real problem interesting. By setting this, the formula is only satisfied if the outputs of the yellow boxes are different.

The formula on the right does not have the same meaning as the circuit because it has additional variables. You can falsify the right formula by setting $a=c=\bot$, $x=\top$, while on the left, there is only $a, b, c$ , and $o$ ($\bot$ means False and $\top$ means True). If you project the formula on the right down to the inputs $a, b, c$ , and the output, they are equivalent.

We call this a “satisfiability preserving translation”: If we solve a newly generated problem, we can get a solution of the original one and vice versa. However, the logical formulas are not the same, which is why it is called “transformation.”

Transforming this formula to CNF, we only have to take care of individual parts because there is an AND between all individual parts.

img/Untitled%2011.png

img/Untitled%2012.png

Following is a demonstration of why Tseitin is useful. We used this website.

Untitled

It continues to go down. There are 256 clauses in total.

Untitled

Untitled

Untitled

Introducing a Tseitin variable, we have reduced the number of clauses from 256 to 24.

img/Untitled%2017.png

Untitled

A Tseitin encoding would introduce variables $x$, $y$ , and $z$. For the lower-left part, we obtain

\[(\lnot a \vee y) \\ (\lnot b \vee y) \\ (\lnot y \vee a \vee b),\]

which we leave to the reader to verify as an exercise. So in total, we obtain nine clauses and seven variables.

Untitled

The CNF now looks like this:

\[(\lnot x \vee a \vee b \vee c \vee d) \\ ( \lnot a \vee x) \\ (\lnot b \vee x) \\ (\lnot c \vee x) \\ (\lnot d \vee x),\]

and we have only five variables, as opposed to seven.

Untitled

This optimization can get complicated when a variable is used in multiple contexts, causing some loss in the simplicity of the transformation.

All optimizations discussed above can be simulated by pre-processing on a CNF, which makes their use questionable.

img/Untitled%2021.png

Untitled

An effective approach to handling the theory of floating point is to reduce it to the theory of bit-vectors. Bit-blasting is a method used to convert a high-level problem into an equivalent low-level problem that a SAT solver can handle. In this case, the high-level problem is the addition of two 4-bit numbers, and the low-level problem is a series of Boolean operations that a SAT solver can process.

Two 4-bit numbers ($x$ and $y$) are added to produce a 4-bit result. Each of these 4-bit numbers is represented as a bit vector.

The operation is broken down into a series of full adder operations. A full adder is a digital circuit that takes three bits (two bits to add and a carry-in bit) and produces two output bits: a sum bit ($s$) and a carry-out bit ($o$). This is done using XOR and AND logical operations. XOR is used to calculate the sum bit, while AND and OR operations are used to calculate the carry-out bit.

The full adder operation is then applied bit by bit, from the least significant bit 0 to the most significant bit 3. A carry-in bit ($c$) is passed from the previous full adder operation for each bit position. The initial carry-in bit ($c_0$) is false as no previous operation exists. The sum bit and carry-out bit are then computed for each position.

The output of this process is the 4-bit sum of the two input numbers, bit-blasted into a series of logical operations that a SAT solver can handle.

This technique is commonly used in formal verification, where checking the correctness of digital circuits or software is essential. By bit-blasting a problem, applying powerful SAT solver techniques to problems that might otherwise be difficult to handle becomes possible.

img/Untitled%2023.png

XOR as AIG

img/Untitled%2024.png

img/Untitled%2025.png

Check Your Knowledge

True or False