Chair of Computer Architecture

University Freiburg

Lecture 13

img/Untitled.png

Remember DP procedure was the one where you do elimination. However, you don’t do it completely. Because if you recall, if you do this DP and you have $n$ positive and $n$ negative variables, you need to do $n^2$ resolvents, so every time you’re eliminating a variable, your formula could grow quadratically. This happens in practice. We proposed that we only do it if it doesn’t add more clauses than you remove.

img/Untitled%201.png

Look at the CNF on the left. It has five clauses in $x$ (we’re looking only at clauses in $x$ right now, everything else we keep untouched). What does DP do? It does all resolvents (on the right). DP would take out those clauses and replace them with those on the right. You take all clauses with $x$, produce all resolvents, and then replace those on the right. What we’re doing now is exactly Fourier Motzkin elimination. We’re eliminating all variables by combining them in all ways.

The number of produced clauses is limited here. We go over all variables, then try, and then we only do this if the variable produces fewer resolvents than the number of clauses in which it occurs. So here, the variable occurs five times, and in principle, you’d have six resolvents, so you’d not do it because you’d increase the size of the formula.

However, you resolve these two if you see resolvents 1 and 4. The resulting clause 14 is tautological; therefore, you can drop it. The same applies to 2 and 4.

1, 2, and 4 are just an AND gate here if you do the Tseitin encoding. If you take a Tseitin encoding of a gate and resolve the clauses of the gates, the clauses of the Tseitin encoding against each other, you’ll always get tautological clauses. That means that the Tseitin encoding is already helping this idea of limiting the bounded variable elimination because the Tseitin clauses themselves against each other, they all produce tautological resolvents. If you take the gate clauses and resolve them against the ones in which the output of the gate occurs, you’d have non-tautological resolvents. That’s also the reason why this is very common that you’re going to remove many variables.

img/Untitled%202.png

This has 3 million variables. This is from a French company that checks the interlocking systems of the French railway systems. Task: you want to give signals so no trains are in the same part.

img/Untitled%203.png

The p option is turned on here, and it’s pretty dramatic here.

This takes an hour or so, the smallest they sent us. If you let it run for an hour, it goes down to 9% of variables through inprocessing and then solves it. It’s unsat. So the trains cannot be on the same track.

The reason why there’s such a reduction is because they have just a default Tseitin encoding.

img/Untitled%204.png

These are 1162410 variables, and this variable elimination was removed from 60-18, so millions of variables were removed.

img/Untitled%205.png

How do you try the variables? You try the variables, and you enumerate the resolvents of the variable. You stop as soon as you hit the limit (you count how often the variable occurs) because then you cannot eliminate the variable. This raises the question of how you try the variables, in which order. Originally, they did this with a priority queue. In the latest solvers, Biere switched this off (it might be due to many other reasons).

You probably want to do this for a pure preprocessor (not in-processing). You have a priority queue where you put in variables and, in essence, give the priority by the number of occurrences.

Because the variables that occur not often have a higher chance to be eliminated, and whenever you eliminate variables, all those variables in clauses that should be removed should be tried again if they have not been eliminated because now their occurrence count drops. That’s like the implementation, which gives you that you can, in 20 seconds, eliminate 2 million variables ⇒ reducing search space by 2^2_000_000.

Something important in this context: if you do this DP, there’s lots of redundancy generated. You can also think about having these gates, which somehow encode the logic of these trains, the execution of a C program, etc. Then you eliminate things, and then it’s clear that this elimination is completely dumb and it will produce redundant clauses in a certain sense. And that’s why we combine this with subsumed clauses and something called “strengthen of clauses.”

img/Untitled%206.png

Why can’t we also do the opposite? On top: eliminate variables limiting the number of clauses. So at most, we allow the same number of clauses to be added as removed. This is what we saw in this industrial example. But we can also ask the other way: can we reduce variables to reduce the number of clauses?

They had one answer to this question. That’s the only one we have at this point. This is SOTA, but not used in SOTA solvers because it has an issue. Look at examples: ABC, ABC. Then with two variables, e, and d. You can inter-choose an xx for these two parts if you have this pattern. The $x$ encodes we’re in the left part ($d$), and the $\bar{x}$ encodes we’re in the right part ($e$). Then that reduces from 6 to 5 clauses.

We’re going from 5 to 4 on the top, and on the bottom, from 6 to 5. What if we went from 5 to 5 on top and 5 to 5 on bottom? That, of course, we’d not allow. We’d only use the lower part here if we reduced the number of clauses.

$a,b,c,d,e$ could be whole clauses.

You need to find this pattern in your CNF, and if you find it, you introduce a new variable and do what we just discussed. But why is this not in SOTA solvers? It’s very simple: where do you put this variable $x$? We’re in the middle of the SAT solver and need to introduce new variables. Then we might simplify the CNF by some other means, and the variable might disappear. The prime object of our CNF is these variables and clauses, and now we’re inventing new ones and throwing them away. We’d need to garbage collect these variables, and that’s tricky to implement, particularly when you implement incremental SAT solving (which we haven’t discussed).

Practical usage of this algorithm: they applied it to a Bioinformatics problem and were able to shrink down the formula. The people from Bioinformatics did a dumb at-most-one encoding, though.

img/Untitled%207.png

This is an at-most-one constraint, and they occur very often. The complexity is usually quadratic.

img/Untitled%208.png

Commander encoding on the right, these are like 16 variables. At-most-one constraint of the rows. So you get a recursive formulation. Commander encoding is not $n^2$ anymore.

img/Untitled%206.png

Example bioinformatics: What the thing here is doing: it takes the naive encoding, the square root, and produces, in essence, a better encoding similar to the one with the commander encoding

The bioinformatics had at-most in there all the time, and it was badly encoded. This technique recovered and optimized the encoding, they didn’t even do this on purpose, but it was a side effect.

img/Untitled%209.png

We also do s here. This is a subsumption of clauses. This is before elimination starts. So the number of original clauses goes down only slightly by subsumption. But during this variable elimination, we also apply subsumption all the time. Whenever we get a new clause, we check for others that one can subsume.

img/Untitled%2010.png

Self-subsuming resolution

Definition subsumption: you have a clause shorter than another one and contains all the literals of the other one.

Consider the clauses

\[a \vee c \, (1)\\ a \vee \bar{c} \vee d \vee f \, (2) \\ a \vee \bar{c} \vee g \, (3)\]

Resolution of (1) and (2) gives $a \vee d \vee f$ (4)

Resolution of (1) and (3) gives $a \vee g$ (5)

Any satisfying assignment of (4) will also satisfy (2), and any satisfying assignment of (5) will also satisfy (3). So we can discard (2) and (3)

Also, read this.

Fast (Self) Subsumption

Now the problem is, how do you do this for this situation for many clauses, like 3 million? From the previous demonstration: this thing didn’t even take a second.

So if you have 2 million clauses, you have to check among the other 2 million clauses if there’s one which is subsumed. How do you do this fast?

First, you connect all literals, a “dense representation of CNF.” You have a container for each literal in which there are references to all clauses in which they occur. We need this anyway for variable elimination. Then when you want to check whether one clause subsumes another, you take the smaller clause and try to find bigger clauses in which it is contained. This is called “backward subsumption.” You take the smaller clause, and you search for bigger clauses.

Forward subsumption does the opposite: you get a new clause, e.g., a resolvent of your variable elimination, and when to check: is there another clause already in the CNF which subsumed that one, then we don’t need to add it. In first-order theorem proving provers, forward subsumption is considered costly while backward subsumption is considered cheaper.

The key insight to make it fast is: when you try to find another clause that is subsumed by a candidate clause, you take the candidate clause and try to find longer clauses that are subsumed. Then you only need to search for occurrences of one of the literals. Then you’d pick the literal with the least number of occurrences. So you go over the candidate clause, check how often it occurs, then pick the literal that occurs least frequently, and then go through the other clause in which that one occurs. For all the other clauses, need to check: is this clause subsuming the other one? There are two variants of checking:

That’s why you could do sort of a preprocessing check of doing subsumption. You sort all clauses (sorting means literals in each clause, think about DIMAX numbers, and you sort them increasingly.) That makes it feasible. Before Biere and others published the paper, people thought it was not feasible with subsumption. With this trick, it’s possible.

Now forward subsumption: how can you do this only through backward subsumption? We have our 2 million clauses, and we want to find 400 subsumed. How to do this? We disconnect all clauses, start with the largest ones, and try to add them to the CNF. Adding means: you connect the literals in them, and you put containers in them. While you do this, you check whether there are other clauses with this we just did by going over the literal with the smallest occurrences that that one subsumes. Because you start with the largest one, you will always find the bigger subsumed clauses. So you’re rebuilding the connections of the CNF, and doing that, you’re finding all subsumed clauses. This way, you can use backward subsumption to implement what you see here. The question is: is it worse? You spend one second to remove just a tiny fraction of clauses. This means you could, in principle, only do backward subsumption.

However, during variable elimination, maybe it’s really useful to know whether a clause you’re adding is already subsumed. Because that would maybe allow immediate better variable elimination, that’s why in actual solvers, Biere started to implement forward subsumption too, and it turns out that this one is done by forward subsumption (s) because it’s faster. And this goes back to the pattern of data mining [BayardoPanda-SDM’11]. They took the backward subsumption algorithm described by Biere and applied it to data mining. Because in data mining, you have very similar questions, we call it “item set,” actually. Think about things you’re shopping for; you also want subsumption dilation. If one basket contains everything which another basket has, maybe you remove one of the two.

So this forward version of subsumption is way better if you do what Biere calls “bulk subsumption,” the one just shown. So you want to review all millions of clauses and remove all subsumed clauses once. At the same time, this backward is good during variable elimination. When you get a resolvent, and then you already have full occurrence lists, you’re in dense mode connected to everything, then you’re checking for other clauses which these new ones subsume.

Biere claims: the fastest one out there is probably in Cadical and has been refined in KISSAT. One key aspect is the line “monitor variables in added and removed clauses,” which can only be explained by discussing incremental SAT solving. So how do you schedule later attempts for subsumption?

img/Untitled%2011.png

You see, there was one after 13 seconds, and then another after 52 seconds.

img/Untitled%2012.png

The one at 52 seconds was faster. So which clauses do you want to check here? It’s clear that, in essence, you only want to check clauses for forward subsumption, which have variables that have been all added since the last time you tried this. So you kind of monitor whether the variables have been added or removed in a bit. Then depending on that, you only schedule parts of the variables.

img/Untitled%2013.png

So this one here doesn’t need as much time as before.

Whenever subsumption is mentioned, there’s a “self” in it. Why?

img/Untitled%2014.png

Variables are connected with OR. In essence, we’re removing $c$. The effect of this self-subsuming resolution is not to remove a clause but to remove a literal. This shortens clauses, also call it “shrinking.”

Do you recall where it happened in CDCL? In CDCL, this occurred because if you make resolutions on the highest decision level, we need to get to the first UIP clause. This is only possible because we will be guaranteed this self-subsuming resolution. Because you stop as soon as the resolvent (you start with a conflict in a clause that is empty) has only one literal from the highest decision level left, this is only possible if you remove at one point at least one of the literals in the resolvent. The general rule looks like this:

img/Untitled%2015.png

That’s the generic rule. How do you implement this now? And that’s an insight they had while writing these papers: When you check whether a clause subsumes another one, then this candidate clause, you find the literal which occurs least often, and then you go through all the other clauses which contain that one and then you check whether the other clauses are a superset of the candidate clause. By this merge sort idea or by counting, how often do you find marked literals in this other clause?

Now, if you do the following trick in the implementation: instead of marking literals or sorting, you only look at the variables, you do the absolute value of the literal to get the variable. That’s almost it. Then you’d find clauses that have a superset of the variables you have in the candidate clause. But now, you only allow this if at least one variable is flipped.

So you do the whole thing instead of with literals, with variables, but you also maintain how often you had to flip the variable to continue searching. As soon as this stays below 1, you continue the whole thing, and then at the end, you check:

img/Untitled%2016.png

While we check if $D$ subsumes $C$, we only check it for variables, not literals. But you only allow this exception that it’s not literals, but variables at most once. If it happens twice, it’s not subsumed. On the bottom, there’s the counter-example for that.

img/Untitled%2017.png

This is probably useful if you don’t have variable elimination. But with variable elimination, it just makes it worse.

img/Untitled%2018.png

Something classical which was not used in practice, but now we have two usages.

Autarky means: you have a partial assignment, for instance, the state in SAT Solver where you haven’t assigned everything yet. Maybe you just have assigned and propagated ten variables, so 100 variables are assigned, but 100000 are left. Then you can review your CNF and check the clauses touched by partial assignment. This is called “touched”.

[[While you do this scan, you can also check whether these clauses are satisfied. Each clause that is touched also has a variable which makes it satisfied by assigning the literal to true. If this is the case, you call this partial assignment an Autarky. You can think of this as completely independent of the rest. It has this nice effect that if you have an Autarky, you can remove all the clauses in this Autarky. If you have an Autarky, these clauses are not part of the resolution proof, unsat, etc.

An example on slides 1 and 3 would not occur on the right. Then you can set 1 and 3 to false and remove these three clauses. Also note: this is an AND gate where 1 is the output, and 2 and 3 are the inputs. On the right part, there’s a formula, and you put an AND gate on top of it. It’s computed by 1, but you don’t constrain the 1. It’s just a dangling output. The AND gate has another input, the 3, which is not used anywhere else. Formula with AND gate on top comes in another 3. You can completely ignore this AND gate, then.

Why did they use target phases in the paper? They had the idea of rephasing, which means that once in a while completely reset the saved phases. That’s bad if you have these partial assignments that satisfy parts of the formula. For instance, a disconnected component of your formula is satisfied by your current assignment. It turns out that if you have such a situation, you need to compute the Autarky.

There’s a simple algorithm for computing Autarkies: looking at all touched clauses, and if the clause is not satisfied, you need to unassign all the variables in it, then there might be other clauses that are satisfied by those which you unassign, and then you have to reconsider and continue doing this. In the end, you have a fixed point that gives you the largest Autarky inside your current assignment. It’s a polynomial algorithm.

If you have any assignment, in our case, the saved phases, and you want to know if they contain an Autarky, you can compute the largest Autarky of this assignment in polynomial time. Of course, a satisfying assignment that satisfies the whole formula is also an Autarky. If you don’t care about variables, this is also an Autarky. First time now, they reported on practical results with these Autarkies, but the concept is much older.

img/Untitled%2019.png

Two groups almost at the same time invented this in parallel.

HanSomenzi: they had a nice trick. They used tries (prefix tree). They’d store the CNF on a prefix tree (explained at the bottom again). But the tries have issues. At the bottom, you can get the benefits of both.

Trick: you have your clauses $C$ and want to show that this clause is redundant, i.e., implied by the rest of the formula. Why important? If you know that it’s redundant, you can remove it and can help variable elimination to remove more variables. How do you show that it’s redundant? You assume one of its literals, one after the other, set them to false, and propagate. But you need to ignore $C$ during propagation. You assume $C$ is not there. Now want to know: can you actually learn this clause from the rest of the formula? We’re assuming the negation of the clause, and if this leads to conflict, then the clause is implied, and we don’t need it.

It’s very similar to how our learning worked. In the learning, we said we want to learn a clause, and if we get into a conflict, we can always return to our decisions to learn this clause. Here the decisions are just the negation of the literals in the clause you want to show that it’s redundant. And therefore, if taking these negations as decisions, we learn the negation of the clause, so it’s implied, and we don’t need to learn it.

However, there’s an interesting part. But during this process, you take the clause, go literal by literal, and then assume its negation and propagate. You know you can remove the clause as soon as you hit a conflict. But what happens if you get to a literal which is already assigned during this process? For instance, you assume the first one, $l_1$ , to False, then check the second one, but the second one is already false or true. These are the two cases:

Let’s assume we have $a \vee b \vee c$ and $F$ , our formula. Now we realize that if we do $F \wedge \bar{a}$, this implies through unit propagation $\bar{b}$. So this means that we could, in principle, learn the clause $a \vee \bar{b}$.

img/Untitled%2020.png

Then we have an instance of self-subsuming resolution, and we can remove $b$.

img/Untitled%2021.png

Going back to the slides: if one of the latter literals is assigned to false (previously, we had $\bar{b}$ through unit propagation,

img/Untitled%2022.png

Choosing $a \wedge \bar{b}$ yields $\bar{c}$ and we can remove $c$ in self-subsuming resolution

You go over the clause. You assign everything to false. You propagate. Before you assign, you check whether it’s already assigned. If it’s assigned to true, you can remove the whole clause. You can remove that literal from the clause if it’s assigned to false.

What [LuoLiXiaoManyàLü] proposed: why don’t we do this on learned clauses? The intuition is that these learned clauses are relevant to your search, and shrinking them might benefit your search. They only do it (1) once for every learned clause and (2) for small LBD only. This helped a lot, even though their implementation takes 30% of the runtime. It’s very close to CDCL, not preprocessing if you think about it. (1) and (2) together make it work. So do it once for every learned clause and only for those which are interesting

How can you improve on this? Not published, just in the system description. They have a prefix tree. Advantage of prefix tree: you need to assign all literals in the prefix to false, which is valid for all the clauses with shared prefixes. And this way, you can save propagation. At the same time, their implementation (first big bullet of the slide) does exactly what is written here, so like after you get to one of these cases or nothing happens, you have to backtrack again and remove all decisions, and these tries allow you to share the propagation between different prefixes.

How can you do this without building tries? These try to use a lot of space. In particular, if you want to apply them to the learned clause, you probably don’t want to do that. A try usually has children, potentially as many children as you have characters in your alphabet. Another option is that you have child pointers and row pointers, and a doubly linked list. For each literal, you’d have, on average, at least 2 (usually more like 3) pointers, one literal 4 bytes, and adding three pointers every byte. This is huge. This is 24 bytes, and you turn 4 bytes into 28 bytes.

Biere implemented this in Cadical. The trick is as follows: before you start the vivification process, you sort the literals in each clause by the number of occurrences globally. Then all the literals in the clauses are sorted in the same way. Then sort clauses in a way as to build the try, but don’t build the try. You keep the paths of the try separated in each row. Now you can still go over one clause after the other, make assignments, and propagate. Then if you don’t get to a conflict or you get to a conflict we backtrack as much as needed to avoid this conflict, or if you don’t get any conflict, you make all these assignments to the end. Then you go to the next clause. There you can figure out how much you have to backtrack. Because you can start at the beginning and you know you have to make these decisions again, you only have to backtrack to the necessary place where the first literal differs as if in the try. This is, of course, in place, you can sort in place, and it’s one of the places where you use sorting in SAT Solver.

This was the newest practical addition to preprocessing. You almost don’t need anything else in the solver which is not there already. If you have a CDCL solver, there’s not much implementation overhead.