Lecture 14
Inprocessing started in 2010. Preprocessing started in 2006, but stuff from the 90s is considered classical and has more of a historical nature now. Both methods presented here didn’t work on CNF.
Stalmarck’s method: works on a sea of triplets. Triplets: you connect variables on the left-hand side to the operation on the right, with the compiler: static single assignment. Describe the whole formula with such triplets. We could also do other variables like integers and polynomials. The basic idea is to split the whole representation into triplets. This gives a function representation (the right-hand side is computed and should be the same as the left-hand side), but this could be cyclic. With the $z$ here, it could be used recursively in itself. For example, $e$ depends on it. So it’s not a circuit. It might be cyclic.
On the bottom, these guys had circuits.
In this case: recursive learning was working on the circuit, and Stalmarck’s procedure was working on triplets on the right. So far, we’ve only worked on CNF on the bottom. If we consider abstraction, CNF might be the most abstract because it can simulate others. The circuit cannot be cyclic (we could theoretically plug in the wire from end to beginning, but that’s not what recursive learning has in mind). In triplets, it’s simple: just put recursive dependency there.
Look at three representations: you might have different ideas of optimization and preprocessing there. So what kind of optimization is best for which kind of preprocessing? Most techniques work on CNF, but not all. And not clear how to use some techniques on CNF.
BCP over (in)equalities
What kind of simplification do we have on CNF? We had this unit propagation, and we’ll see some other optimization. If you have triplets, when the only rules there are are on top.
You have assignment $x=0$ and one triplet $z=(x \vee y)$. Then you get $z=y$.
The other way around, $x=y$ is not a triplet. It’s inequality actually because $y$ could be a literal, e.g., $\lnot a$. Then you can also plug this information in such a triplet where for instance, on the right-hand side, you have $x \vee y$, then you get out $z \vee y$. If you had access to $x=0$, triplet $z = (x \oplus y)$ then you’d get $z=y$.
You might plug in constants at the outputs or equalities between output and input. If you take a paper and write down all the rules, you’ll hit two pages which is quite a bit.
This is the idea of triplet propagation. It has two things that are different.
- You have all different operators (e.g., complicated operators, XOR, if-then-else), and
- it can already natively not just propagate over constants (like $x=0$). You can also propagate if two variables are the same.
There’s something similar in SMP solving called the congruence closure algorithm, where you can do this quickly. The claim then is that you can do this constant propagation which you can also do in CNF, but this equality propagation almost with the same speed.
Biere’s feeling: is a local minimum, cannot do more. If you add more propagation, it will be more expensive to implement. That’s why Stalmarck’s procedure is a pretty instructive algorithm.
Structural rules
Structural rules are just hashing. Remember, we had AIGs and structural hashing, automatic detection of isomorphic subgraphs, and common subexpression elimination. Sometimes, it’s also called hash consing.
If you have two expressions, like $a$ or $b$ , here, and it’s used twice, then we know that $x$ has to be equal to $y$. This may lead another triplet (the right-hand side of the triplet) to be the same as another one. Then you get equality of two other outputs. That’s why it’s called hash consing.
This was more of a hack. BCP over (in)equalities was very natural. It’s like two pages for all operators one can think of. Are there more? They added the structural rules on top, and it’s very natural because it occurs in many places, and structural hashing is important.”
But then there’s also a generic semantic one, the first one in literature where this is done. The patent is mostly about this thing. Biere thinks it’s ridiculous to patent something like that.
Test rule
Intel bought the engine from the people who patented it. Intel used it only for the first two rules, though. Because the third one has a kind of quadratic or cubic flavor, not possible on a median gate circuit, but the first two can be implemented in that they’re almost linear in the size of the whole circuit. So it’s a good preprocessing thing. You read the problem, you make these rules, and you can implement this like it’s just reading the circuit.
So what’s the test rule? We need to think about states. Think back to the DP solver: what’s the algorithm’s state? We had a control stack, this trail, but one important part which was blue: array of values. Map of variables to 0, 1, or undefined. What is the state here? It’s the same partially because we can assign variables to true or false. However, we can also have that the two variables are the same. That’s why this is capital $E$: it’s the assignment of variables to constants or other literals.
What does rule do? Have some global state $E$ (equalities). Then you assume one $x=0$ and one $x=1$, two cases. Then we make rules here on top (1 & 2) and get new equalities; the old ones are semantically still valid.
Semantically, you keep old equations and derive new ones here $x=1$, but maybe more. Here on top $x=0$, so maybe we derive $z=y$, that’s now in $E_0$. What does the test rule in Stalmarck’s paper say: do the case split, and in both cases, we derive something, and we can keep globally what we derive in both cases, the intersection between these two inequalities? You’re doing a case split, you check what’s true in both cases, and if you find something, you can “lift it” outside of the case split. And you can do this recursively.
You’d call this thing with one recursion level deep, 2-saturation: if you apply this until nothing changes. You keep on applying this rule for all variables until nothing changes. It’s called 2-saturation because you go two levels deep. If you only do 1 level, take all $x$s and continue until nothing changes. This is called 1-saturation.
On top (BCP over (in)equalities), it’s called 0-saturation because you don’t apply the rest rule. The first one is used at companies because it’s very fast and can do it almost in linear time.
These structural rules (implemented with a hash table) are not strictly necessary because they’re subsumed by 3.
But three is expensive because you need to go over all $x$s.
Two is almost constant in time, like accumulated linearly in the size of the formula. That’s why you add this rule explicitly. So if you have a very generic rule (3), but there’s an instance of it (2) that you can implement way faster, then maybe doing (2) is better because you get fast results.
The structural rule for $\wedge$. Let’s derive this rule by applying the test rule
Here we have two cases. Have to do a test rule on inputs. On outputs it will not work.
In logic, we’d do: Equivalence class {0, a, x, y} (on the right). Internally, you’d pick a representative, and wound put pointers to 0
$a$ and 1 are in one equivalence class, $b,x,y$ are in another equivalence class.
Implementation on a computer: you have your table of variables, and they’d have a pointer to their representative. Initially, they point to themselves, but as soon as you add something, you change this data structure. What if, for instance, you have such a connection to $z$, then $y=v$ , and now you’d say $x=y$? How do you do this? You’d have an operation called “merge,” in literally “union.” When we do this, we first find the representative ($z$ is the representative of $x$).
The representative of $z$ is $w$. Doing this traversal, we redirect the pointer directly to $w$ (called “pointer jumping”). While you traverse these lists to find the representative, you go down and overwrite the result (you need to go over the list twice).
Then you pick the smaller one of the two. Suppose $v$ is smaller. Then you’d put a pointer from $v$ to $w$. The pointer from $y$ to $v$ we can delete. This algorithm allows us to work with equivalence classes with partitions extremely fast. This algorithm is called “union-find”.
The complexity of union-find is strange. log* grows slower than log n. This algorithm works on billions of things because it has very low complexity. It allows us to implement merging equivalence classes and intersecting all these operations, which we had on the slide.
How do we merge $E_1$ and $E_0$? We go through 1, through all equivalence classes, and in this one, $a=1$, then we look up at the non-representative whether $a$ is also equal to 1 at $E_0$. And we can see that it’s not because $a$ equals 0, and 1 is not even there. For the second one, $x=b$, $y=b$, so $x=y$. So that’s why we’d add here $x=y$ , which shows that this rest rule subsumes this structural rule.
Now we have $E \models x=y$.
You could not afford a quadratic algorithm if you had a circuit with 10 Million gates. What’s shown here can be done with large instances because of low complexity.
Union-find is also the idea behind congruence closure algorithms which is one of the most important parts of SMT solvers.
$k$ is the recursion depth of applying these rules. This idea is that you have a simple rule, and you can limit its application to a certain depth. This is also a general idea. It’s costly here. If you want to do this for $k=2$, it’s almost impossible for big things like more than 1000, 10000 gates or variables. Industrial usage only 0-saturation used, meaning the test rule is not used.
Learning can mean different things to us:
- ML domain
- Clause learning
- Here we have another usage of the term “learning.”
This circuit has 2 OR gates which feed into a common AND gate. This is a circuit that is not optimized.
What is learning? If, for instance, you put here a true (on the very left), then everything becomes true. This is called “learning” Learning was called “propagation” in the old days. Recursive learning: what happens if we plug in 0 ⇒ we’re stuck because we cannot propagate. These ORs do not propagate. The same is true for plugging in 0 at the end.
If we put in 0 at the end, then we only know that one of these to the left is 0, but we don’t know which one.
The basic idea now: same as Stalmarck, do a case split. We have cases 0 and 1.
If we look at the upper half, because 0 is the output of an OR, we have to put 0 to both its inputs too. For the 2nd test case, it’s symmetric.
So what did we do with Stalmarck? We said we take the intersection between these two cases, which is the yellow circle on the very left, and that has to be true no matter what. We do the case split, and no matter what, we get this case. That means the 0 on the right implies the 0 on the left. That’s what they called “recursive learning.”
This is particularly useful if you have redundancies in circuits or if you do some testing.
You could also do this for CNF. Exercise: do a Tseitin encoding of this thing and try to do the same on CNF as we did here on a circuit.
We’d need to give names. So we have nine clauses plus one unit clause for output. What we want to derive: $\bar{o} \Rightarrow \bar{b}$
Jan-Friso Groote: this is what his solver did (his preprocessor). He applied it to railway interlocking systems.
You can apply the same intuition here that we now saw already twice, this intersection. It’s explained on the slide “Recursive Learning” a little bit.
Remember DP procedure was the one where you do elimination. However, you don’t do it completely. Because if you recall, if you do this DP and you have $n$ positive and $n$ negative variables, you need to do $n^2$ resolvents, so every time you’re eliminating a variable, your formula could grow quadratically. This happens in practice. We proposed that we only do it if it doesn’t add more clauses than you remove.
Look at the CNF on the left. It has five clauses in $x$ (we’re looking only at clauses in $x$ right now, everything else we keep untouched). What does DP do? It does all resolvents (on the right). DP would take out those clauses and replace them with those on the right. You take all clauses with $x$, produce all resolvents, and then replace those on the right. What we’re doing now is exactly Fourier Motzkin elimination. We’re eliminating all variables by combining them in all ways.
The number of produced clauses is limited here. We go over all variables, then try, and then we only do this if the variable produces fewer resolvents than the number of clauses in which it occurs. So here, the variable occurs five times, and in principle, you’d have six resolvents, so you’d not do it because you’d increase the size of the formula.
However, you resolve these two if you see resolvents 1 and 4. The resulting clause 14 is tautological, and therefore you can drop it. The same applies to 2 and 4.
1, 2, and 4 are just an AND gate here if you do the Tseitin encoding. If you take a Tseitin encoding of a gate and resolve the clauses of the gates, the clauses of the Tseitin encoding against each other, you’ll always get tautological clauses. That means that the Tseitin encoding is already helping this idea of limiting the bounded variable elimination because the Tseitin clauses themselves against each other, they all produce tautological resolvents. If you take the gate clauses and resolve them against the ones in which the output of the gate occurs, you’d have non-tautological resolvents. That’s also the reason why this is very common that you’re going to remove many variables.
This has 3 million variables. This is from a French company that checks the interlocking systems of the French railway systems. Task: you want to give signals such that no trains are in the same part.
The p option is turned on here, and it’s pretty dramatic here.
- 1st line: they have a pretty redundant encoding because it’s automatically generated. You read some description of railway track, then they put units, and then over 30% of the variables go by propagation.
- 2nd line: equivalent to literal substitution, tries to find components in the implication graph and removes 5% of the variables twice here.
- 3rd line: there are redundant binary clauses
- 4th line: do probing (not effective)
e
line: we see what variable elimination does for us (this is variable elimination we just discussed). It also takes quite some time, 20 seconds here.
This takes an hour or so, the smallest they sent us. If you let it run for an hour, it goes down to 9% of variables through inprocessing and then solves it. It’s unsat. So the trains cannot be on the same track.
The reason why there’s such a reduction is because they have just a default Tseitin encoding.
These are 1162410 variables, and this variable elimination was removed from 60-18, so millions of variables were removed.
How do you try the variables? You try the variables, and you enumerate the resolvents of the variable. You stop as soon as you hit the limit (you count how often the variable occurs) because then you cannot eliminate the variable. This raises the question of how you try the variables, in which order. Originally, they did this with a priority queue. In the latest solvers, Biere switched this off (it might be due to many other reasons).
For a pure preprocessor (not inprocessing), you probably want to do this: you have a priority queue where you put in variables and, in essence, give the priority by the number of occurrences.
Because the variables which occur not often have a higher chance to be eliminated, and whenever you eliminate variables, then all those variables which are in clauses that should be removed should be tried again if they have not been eliminated because now their occurrence count drops. That’s like the implementation which gives you that you can in 20 seconds eliminate 2 million variables ⇒ reducing search space by 2^2_000_000
Something important in this context: if you do this DP, there’s lots of redundancy generated. You can also think about: you have these gates, and they encode somehow the logic of these trains or the executing of a C program, etc., and then you eliminate things, and then it’s clear that this elimination is completely dumb and it will produce redundant clauses in a certain sense. And that’s why we combine this with subsumed clauses and something called “strengthen of clauses.”
Why can’t we also do the opposite? On top: eliminate variables limiting the number of clauses. So at most, we allow the same number of clauses to be added as removed. This is what we saw in this industrial example. But we can also ask the other way: can we reduce variables to reduce the number of clauses?
They had one answer to this question. That’s the only one we have at this point. This is SOTA, but not used in SOTA solvers because it has an issue. Look at examples: abc, abc. Then with two variables, e, and d. If you have this pattern, you can inter-choose an $x$ for these two parts, the $x$ encodes we’re in the left part ($d$), and the $\bar{x}$ encodes we’re in the right part ($e$). Then that reduces from 6 to 5 clauses.
We’re going from 5 to 4 on the top, and on the bottom, from 6 to 5. What if we went from 5 to 5 on top and 5 to 5 on bottom? That, of course, we’d not allow. We’d only use the lower part here if we reduced the number of clauses.
$a,b,c,d,e$ could be whole clauses.
You need to find this pattern in your CNF, and if you find it, you introduce a new variable, and you do that we just discussed. But why is this not in SOTA solvers? It’s very simple: where do you put this variable $x$? We’re in the middle of the SAT solver and need to introduce new variables. Then we might simplify the CNF by some other means, and the variable might disappear. The prime object of our CNF is these variables and clauses; now, we’re inventing new ones and throwing them away. We’d need to garbage collect these variables, and that’s tricky to implement, particularly when you implement incremental SAT solving (which we haven’t discussed).
Practical usage of this algorithm: they applied it to a Bioinformatics problem and were able to shrink down the formula. The people from Bioinformatics did a dumb at-most-one encoding, though.
This is an at-most-one constraint. They occur very often. The complexity is usually quadratic.
Commander encoding on the right, these are like 16 variables. Amo constraint of the rows. So you get a recursive formulation. Commander encoding is not $n^2$ anymore.
Example bioinformatics: What the thing here is doing: it takes the naive encoding, the square root, and produces in essence, a better encoding similar to the one with the commander encoding
The bioinformatics had at-most in there all the time, and it was badly encoded. This technique recovered and optimized the encoding, they didn’t even do this on purpose, but it was a side effect.
We also do s
here, and this is the subsumption of clauses. This is before elimination starts. So the number of original clauses goes down only slightly by subsumption. But during this variable elimination, we also apply subsumption all the time. Whenever we get a new clause, we check if that one can subsume other clauses.
Self-subsuming resolution
Definition subsumption: you have a clause shorter than another one and contains all the literals of the other one.
Consider the clauses
\[a \vee c \, (1)\\ a \vee \bar{c} \vee d \vee f \, (2) \\ a \vee \bar{c} \vee g \, (3)\]Resolution of (1) and (2) gives $a \vee d \vee f$ (4)
Resolution of (1) and (3) gives $a \vee g$ (5)
Any satisfying assignment of (4) will also satisfy (2), and any satisfying assignment of (5) will also satisfy (3). So we can discard (2) and (3).
Also, read this.
Fast (Self) Subsumption
Now the problem is, how do you do this for this situation for many clauses, like 3 million? From the previous demonstration: this thing didn’t even take a second.
So if you have 2 million clauses, you have to check among the other 2 million clauses if there’s one which is subsumed. How do you do this fast?
First, you connect all literals, a “dense representation of CNF.” You have a container for each literal in which there are references to all clauses in which they occur. We need this anyway for variable elimination. Then when you want to check whether one clause subsumes another, you take the smaller clause and try to find bigger clauses in which it is contained. This is called “backward subsumption.” You take the smaller clause, and you search for bigger clauses.
Forward subsumption does the opposite: you get a new clause, e.g., a resolvent of your variable elimination, and when to check: is there another clause already in the CNF which subsumed that one, then we don’t need to add it. In first-order theorem proving provers, forward subsumption is considered costly while backward subsumption is considered cheaper.
- Backward: take a small clause and search for a bigger one
- Forward: take a big clause and search for a smaller one
The key insight to make it fast is: when you try to find another clause that is subsumed by a candidate clause, you take the candidate clause and try to find longer clauses that are subsumed. Then you only need to search for occurrences of one of the literals. Then you’d pick the literal with the least number of occurrences. So you go over the candidate clause, check how often it occurs, then pick the literal which occurs least frequently, and then go through the other clause in which that one occurs. For all the other clauses, need to check: is this clause subsuming the other one? There are two variants of checking:
- 1st variant: mark all the literals in this clause, the candidate clause, and then when we go through the other clause, we check ($n$ is the size of the candidate clause) whether you find $n$ literals in that other clause. Then you can improve this by adding multiple optimizations. You can add Bloom Filters: hache literals into bits, and then you use bit maps to check whether the other clause is subsumed.
- 2nd variant: you sort the clauses. This has the advantage: your candidate clause is sorted, and your other clause is sorted. Then you do kind of merge sort. During this, you have an early abort. When to use early abort: if you need to jump over literals in the smaller clause, which is not in the other one.
That’s why you could do sort of a preprocessing check of subsumption. You sort all clauses (sorting means literals in each clause, think about DIMAX numbers and sort them increasingly.) That makes it feasible. Before Biere and others published the paper, people thought it was not feasible with subsumption. With this trick, it’s possible.
Now forward subsumption: how can you do this only through backward subsumption, we have our 2 million clauses, and we want to find these 400 subsumed. How to do this? We disconnect all clauses, start with the largest ones, and try to add them to the CNF. Adding means: you connect the literals in them, and you put containers in them. While you do this, you check whether there are other clauses with this we just did by going over the literal with the smallest occurrences that that one subsumes. Because you start with the largest one, you will always find the bigger subsumed clauses. So you’re rebuilding the connections of the CNF, and doing that, you’re finding all subsumed clauses. This way, you can use backward subsumption to implement what you see here. The question is: is it worse? You spend one second to remove just a tiny fraction of clauses. This means you could, in principle, only do backward subsumption.
However, during variable elimination, maybe it’s really useful to know whether a clause you’re adding is already subsumed. Because that would maybe allow better variable elimination immediately, that’s why in actual solvers, Biere started to implement forward subsumption too, and it turns out that this one is done by forward subsumption (s
) because it’s faster. And this goes back to the pattern of data mining [BayardoPanda-SDM’11]. They took the backward subsumption algorithm described by Biere and applied it to data mining. Because you have very similar questions in data mining, we call it “item set,” actually. Think about things you’re shopping for; then you also want subsumption dilation. If one basket contains everything which another basket has, maybe you remove one of the two.
So this forward version of subsumption is way better if you do what Biere calls “bulk subsumption,” the one just shown. So you want to review all millions of clauses and remove all subsumed clauses once. At the same time, this backward is good during variable elimination. When you get a resolvent, and then you already have full occurrence lists anyhow, you’re in dense mode connected to everything, then you’re checking for other clauses which these new ones subsume.
Biere claims that the fastest one out there is probably Cadical, refined in KISSAT. One key aspect is the line “monitor variables in added and removed clauses,” which can only be explained by discussing incremental SAT solving. So how do you schedule later attempts for subsumption?
You see, there was one after 13 seconds and another after 52 seconds.
The one at 52 seconds was faster. So which clauses do you want to check here? It’s clear that, in essence, you only want to check clauses for forward subsumption, which have variables that have been all added since the last time you tried this. So you kind of monitor whether the variables have been added or removed in a bit. Then depending on that, you only schedule parts of the variables.
So this one here doesn’t need as much time as before. Whenever subsumption is mentioned, there’s a “self” in it. Why?
Variables are connected with OR. In essence, we’re removing $c$. The effect of this self-subsuming resolution is not to remove a clause but to remove a literal. This shortens clauses, also call it “shrinking.”
Do you recall where it happened in CDCL? In CDCL, this occurred because if you make resolutions on the highest decision level, we need to get to the first UIP clause, and this is only possible because we will be guaranteed to have this self-subsuming resolution. Because you stop as soon as the resolvent (you start with a conflict in a clause that is empty) has only one literal from the highest decision level left, this is only possible if you remove at one point at least one of the literals in the resolvent. The general rule looks like this:
That’s the generic rule. How do you implement this now? And that’s an insight they had while writing these papers: When you check whether a clause subsumes another one, then this candidate clause, you find the literal which occurs least often, and then you go through all the other clauses which contain that one and then you check whether the other clauses are a superset of the candidate clause. By this merge sort idea or by counting, how often do you find marked literals in this other clause?
Now, if you simply do the following trick in the implementation: instead of marking literals or sorting, you only look at the variables, you do the absolute value of the literal to get the variable. That’s almost it. Then you’d find clauses that have a superset of the variables you have in the candidate clause. But now, you only allow this if at least one variable is flipped.
So you do the whole thing instead of with literals, with variables, but you also maintain how often you had to flip the variable to continue searching. As soon as this stays below 1, you continue the whole thing, and then at the end, you check:
- If you never had to flip the variable, then the clause is subsumed.
- If you had to flip it once, you can remove this flipped variable from the subsumed clause and strengthen it this way.
While we check if $D$ subsumes $C$, we only check it for variables, not literals. But you only allow this exception that it’s not literals, but variables at most once. If it happens twice, it’s not subsumed. On the bottom, there’s the counter-example for that.
This is probably useful if you don’t have variable elimination. But with variable elimination, it just makes it worse.
What’s a blocked clause? Similarly to Autarky, it’s a clause you can remove. Because it does not change the satisfiability of the formula, if the formula was unsatisfiable before, it will remain unsatisfiable after removing the clause. If it was satisfiable, then removing a clause will not make it unsatisfiable. If the formula is unsat, it allows an unsat proof, then a blocked clause can be removed, and it cannot be part of a resolution on this particular literal.
Definition: see an example; you have $a \vee b \vee l$. You need to pick one literal to show that it’s a blocked clause. Only one is necessary. In this case, $l$. We also call it “blocking literally,” even though it has another meaning. There’s also a “blocking clause,” which differs from a “blocked clause.” So “blocked clause” means you have a clause with a blocking literal, the $l$ here, and what does it mean that this literal is blocking for this clause? You go through your formula and check for all possible resolution candidates with this clause on this literal. You need to find all clauses which have $\bar{l}$ in them. And then try to resolve them. So in this example, try to resolve the left clause with the two candidates on the right. Like in the example with variable elimination, if you resolve with these clauses, it would produce a tautology.
Another way of thinking: if you take this clause on the left and want to make it false, assigning all the literals to false, then the resolution candidates on $l$ are doubly satisfied. They have all $\bar{l}$ , and through $a$ or $b$, you’d get true. In principle, you can flip the value of $l$ because those guys are satisfied by $\bar{a}$ and $\bar{b}$. So flipping $l$, the clause on the left also becomes true. The proof of why you can remove this clause is at the bottom.
Why is this double satisfied an interesting concept? The clause on the left can only make a difference if you remove it if it’d be falsified by a satisfying assignment that satisfied the rest of the formula. So all these clauses here are satisfied, but the one on the left one not, and also all the other clauses. How can we fix this now? And this was exactly the argument: because the guys on the right are all doubly satisfied, we can, of course, flip $l$ , and this is the only situation where flipping $l$ (because these on the right are all clauses in which $\bar{l}$ occurs) would make a difference. So on the right, flipping makes no difference because they stay satisfied, and the clause on the left is satisfied, so we now have an assignment for the simplified formula plus this clause we removed.
Another way of looking at this: if you have a satisfying assignment for the right part and the rest of the formula and remove the clause on the left, we don’t know the status of this thing. If it’s, of course, satisfied by the assignment of them, then we’re done. But if it’s falsified, then we need to fix it. And that’s the trick with flipping the literal.
Circuit simplification: given AIG (like a circuit) and we want to encode it to CNF, but before we do it, we want to simplify this.
Cone of influence (COI): the most important one. If you check software, there’s the concept of slicing: if you have a variable that cannot influence the property you want to check (e.g., only reading something in a program), you can, of course, remove this variable. In hardware, an abstract circuit version looks like a (inverted) ice cone in the US. The output is on top, and the inputs are bottom. Everything in this cone should be a part of your CNF encoding. But if you have two cones overlapping, you can remove the part not in the property. This is called the cone of influence reduction. When this was invented, people said you should do it; before you do any translation into SAT, model checking, or any formal proving, you want to focus on the part which can influence the property, and for that, we don’t need a formal tool, we can do it by a static analysis or dependency analysis.
Monotone input reduction (MIR): assume you have a circuit and a single property output. And this input only influences the output monotonically. You consider all the possible paths to this input, then it always has an even number of negations, every path. How do we implement this on an AIG? For every input, we want to know the number of odd or even phase occurrences to the output. It relates to what we discussed at the beginning, like this NNF. It has this flavor, but it’s not trivial. You could compute this, and if you have monotone input, and for instance, the output is a bad state detector, you want to set it to true, then you can have like a positive monotone input, you can set it to true, a negative monotone input, you can set it to false and then propagate, thus simplifying your circuit.
Non-shared inputs elimination (NSI): you have any boolean expression or any expression, for instance, adders. Then there’s one part that only occurs through one output in the rest. You have like a complicated circuit, and there’s one part which only occurs on one part, and then furthermore, this thing is a tree. So it doesn’t have any shared thing, any DAG-like in it (directed acyclic graph). Most operators allow you to generate all possible values. It’s not true completely. If you think of the division operator for bit-vectors, can you generate with your assembler instruction an output 11111, which is the output of a division for unsigned integers? So in principle, for almost all operators, you can generate all possible values. Therefore, you can always set these inputs in the tree-like circuit to such a value that the output of this tree, the only place where the tree is used somewhere else, then of course, shared because you then could extend the tree, you can remove this complete tree because you can generate all possible inputs at the output of the tree. Because the operators are completely subjective in the value of the tree. You can implement this. You’d need to find parts of the circuit that are tree-like.
From this paper:
The fanout (fan in, resp.) of a gate is the number of parents (children, resp.) the gate has.
Plaisted-Greenbaum polarity-based encoding: we already talked about this. There you only use the constraints that point downwards, but you don’t need to have them upward if a node only occurs positively, and the other way around, you only take those upwards if it only occurs negatively.
Tseitin encoding: we know what
The blocked clauses rule simulates COI, MIR, NSI, and PG. You implement the blocked clauses rule until completion, which is even confluent. So every CNF has a unique simplified CNF after applying this rule until exhaustive, and you will simulate all of that if you start with a Tseitin encoding of a circuit. Biere tried to implement these in a SAT solver but had bugs. You need to combine these, so it’s better to make those rules on the SAT solver.
On the bottom right, there’s the Tseitin encoding, and you can do these three optimizations, then apply these preprocessing things on the CNF, and then simplify all these things here. On top, the PG is also simplified. In essence, this picture tells you if you were to do blocked clause elimination, you’d get the same as all these optimizations. Then, of course, the only question is can you implement this fast? It turns out it can be implemented rather cheaply, and it’s in the code of Biere’s solvers, and it simulates all the picture here.
The only problem here is that Plaisted Greenbaum is not good. They have not completely figured that out. Remember that PG can potentially remove half of the clauses in the encoding (by adding a model, so there is less restriction), but it’s not good for the current solvers, and they don’t have a theoretical reason for that. Practically, doing PG is worse, although it was invented as optimization originally. Remember we started today with sat sort, Biere added these backward clauses, and some of them are redundant in this sense. They could even be blocked clauses. It’s good to tell the solver that these are interesting blocked clauses. It’s good to add them instead of removing them. So PG is something one should not do; unfortunately, these blocked clauses simulate PG. Therefore, this picture on the slide is nowadays only of theoretical interest, at least because of the left part here. Biere doesn’t know how to eliminate blocked clauses (BCE) and avoid PG.
In KISSAT and Satch, there’s no blocked clause elimination anymore
Literature overview of what happened after they did these blocked clauses
RUP (reverse unit propagation) idea: in the context of proof checking. Remember, we had this clause minimization algorithm. You might produce bugs when you implement this. How to avoid that? They proposed: for every clause we learn, we check with a separate SAT solver, which can do nothing else, but this vivification checks that the clause is implied. So we write this down here like that: $F \vdash_1 C$. So you read this as you have $F$, you negate the clause $C$ and then you propagate, this 1 means unit propagation, and then this should get you a conflict. Then you can say that $C$ is implied by $F$ by unit propagation.
[VanGelder’12]; had the idea why can’t we make this a proof format for SAT solvers. So we would, whenever we learn a clause, dump this clause, and then a proof checker would just take this clause, negate it and check it with accumulated clauses in the original CNF whether it’s implied. This turned out to be a very useful development. Marijn Heule, in particular, had lots of papers on this and produced very large proofs for these big mathematical problems. Even more interesting here is that these checkers that read the sequence of clauses are certified in higher-order theorem provers, like Isabelle, ACL2, or coq. And therefore, you can be sure that these proofs are true.
If Marijn Heule claims the Schur Number Five is a certain number, then this proof is huge (2 peta byte) in this format, but you have Isabelle or ACL2 checker, which is verified which then can check this proof if it’s true, so that you can trust this. This contrasts other mathematical problems, which combinations of search and computer Algebra solved because we have completely trusted proofs here. Some work on the four-color theorem was also ported to Isabelle. There’s an attempt to take some of these proofs, which only had machine proofs in Mathematics, and turn them into something true because you proved it with a theorem prover.
DRUP: one part which is very important in practice. You also add deletion information. It makes proof checking faster.
Reading: Proof Traces for SAT Solvers
Here are these different formats, and you can really generate proofs nowadays for big things and then check these proofs with a certified theorem prover. -d means this clause can be dropped at this point, then you learn a new clause, drop other. Then learn a unit, and at the end, learn an empty clause.
An example of why removing blocked clauses does the same as Plaisted Greenbaum.
Resolution Asymmetric Tautologies (RAT): looks almost like blocked clauses. Blocked clauses would be “resolvents on $l$ are tautological,” and now you replace “tautological” with “unit implied,” like RUP. On the left, you had to check double satisfied, but now we check when we assume the literals in the resolvent. Would we then get into a conflict? It’s a generalization.
Example: have $a \vee l$, look at candidates, e.g., $\bar{l} \vee b$. This is not blocked because $b$ has nothing to do with $a$. But if you now also set $b$ to false, then on the right for $a \vee b$ , this $b$ here implies $a$ , which would conflict with setting $a$ to false. That means again, if you drop this class and later you get a satisfying assignment for all these clauses on the right but not for the left one, you can flip $l$
DRAT: official format in the SAT competition since 2003, but almost nobody does DRAT. Most people do DRUP, so in essence, just implied clauses.
Can we do more than this RAT or RUP?
One thing which came out here is that instead of having a single literal to have a subset of literals, all of them are flipped. There’s a local version of RUP (so you don’t propagate), and there’s a version that then propagates. What’s the most general version of this redundancy?
Then they came up with a notion that allowed them to solve pigeonhole problems with short proofs. This has a flavor of this propagation: $F \mid_\alpha \models F \mid_\omega$. $\alpha$ is our partial assignment, the negation of the clause, and then you want to have this witness $\omega$ in RAT or SAT block??? This witness would be this set $l$ or the single literal $l$, but in general, it can be outside of this negation of the clause, but this (ii) is the most natural way to write this down, and we could even prove that any redundancy criteria you can imagine for CNF would have this property, you can always find such a witness who tells you I can ignore this partial assignment because, without loss of generality, I can always assume this, so this is the flavor.
It makes the above practical, replacing semantic version with unit propagation: $\vdash_1$
You have a chessboard, and you remove two corners. Then, can you cover the board with dominoes so that everything is covered? The answer is no. Human version, mathematical proof: every domino you place on the board will cover in black and white. No matter how you place them, there’s a black and white cover. But we’re removing two black corners, so covering the whole board is impossible. It’s kind of like a parody argument.
If you encode this into SAT, you’d introduce variables for every field here and say: domino to the right, domino to the left. They’re not allowed to overlap. They cannot touch both. So you put these constraints there, they’re kind of pretty local, and then if you give this to a CDCL solver, it will say conflict for this situation. A learned clause on the top right is a subset of the top middle. It removes lots of parts of the search space. Everything not covered here on the top right is removed from the search space, so CDCL is effective here, but it’s not good enough because there are still lots of combinations.
It turns out that we can generate learned clauses and PR clauses as we had on the previous slide, and it’s like a without-loss of generality argument. The clauses are the following: you have these two dominoes placed here on the middle bottom left, and it’s not possible to get this to a solution, then placing the dominoes like this on the bottom right upward down is without loss of generality the same situation as the middle one. You can code these as clauses, which are all PR clauses if you find the right $\omega$. So you need to encode in this $\omega$ what’s this argument without loss of generality. There are also the three dominoes here, the same situation.
The PR solver they had from paper solved this problem here. It turned out that the solver found these PR clauses, and after seeing this, they could manually construct this PR proof. You’d add these possibilities. For all pairs here, you’d add these clauses so that they’re PR; if you add them, this is very fast, and then you give it to a CDCL solver and immediately solve it.
Here’s the whole zoo of these redundancy properties; on the top left, there’s the semantic one, BC is blocked clauses. SBC is subsumption; RS is resolution subsumption. IMP is the semantic implication, and PR is our strongest one.
R is the strongest version of redundancy in clauses, but only if your constraints are clauses. This $\alpha$ is a boolean function. Then it might get interesting again.
PR is the strongest version they made practical, PR for propagation redundant.