Lecture 11

img/Untitled.png

Reference: should be [AudemardSimon’09]

Paper: made a big contribution to which clauses are useful. You shouldn’t keep learned clauses forever. Why? When you learn, you’re adding a clause, then propagating, making decisions, and adding another clause. Then propagation and going over your CNF is somehow linear in the number of clauses that you have, and that’s already an understatement. This is only possible if you have very fast-watching schemes (which we haven’t discussed yet). It’s, in principle necessary to go over these learned clauses, and, therefore propagation in the solver which is the core work done, is linear in the number of clauses you have. If you learn (that’s the idea of CDCL), you keep accumulating clauses, getting slower and slower. So if you keep all clauses, then for every newly learned clause, you increase the number of clauses and you get a quadratic runtime in the number of conflicts. That’s even true in practice. In CompSAT, Biere completely ignored this argument.

You have these clauses; you need to propagate over them, and then they’re redundant. They’re not used for the current state of search.

Memory is not a big deal these days. Increase the size of the formula. Propagation is getting slower and slower, but don’t hit the memory limit for small running problems because the solver is just getting slower. That’s not true if you want to solve some puzzle, for example. When you want to solve something and let it run for days or months.

In very old SAT solvers, e.g., relSAT, there was the initial idea we don’t learn long clauses at all. In SATO we never keep a clause that has size 20 or more. So eagerly throw away clauses after you’ve used them. When you backtrack over a variable forced by a learned clause, you delete the learned clause.

MiniSAT: just keep those recently used clauses because they are the most important ones. In clause move to the front from HaifaSAT, you have recently learned clauses at the end anyhow. So after a while, whenever you bump a clause, you take it out and move it to the end. You throw one away at the very end of the queue if it’s not a reason for an assignment.

Maybe it’s necessary to keep a large number of clauses around because there are no proofs around with a small memory footprint. In MiniSAT: they don’t keep a fixed size of clauses. They have a cache and increase it on demand. That’s better than just keeping all clauses. It still allows us to eventually keep as many clauses as we need to close the proof. The least recently used can also be done similarly to VSIDS score. Original implementation in MiniSAT: they put activities to clauses (which is a score), and whenever a clause is used, you bump the score. However, the alpha was extremely low, like 1/10000. So the clauses move very slowly to the end. In MiniSAT, that was SOTA for about ten years.

Then came paper [AudemardSimon’09], and they invented the idea of glucose level. They originally called it LBD (level block distance?)

The idea of Glucose is related to glue, it glues together literals. They had the intuition that when you learn, you want to learn clauses that glue together previously learned clauses. And there are other learned clauses that glue together other learned clauses. Then you want to measure how strong this gluing is. Now comes the bad part of terminology: the smaller the glue, the better it glues.

So what’s the glue? It’s simply looking at the learned clause and then counting how many decision levels occur in that clause. When we had a minimization example, we argued that the first UIP clause is better than the one where you resolve until the end, where you pull in decision levels.

img/Untitled%201.png

This is a good example. The clause has a glue of 2, even though it has three levels in it. Why 2? A binary clause is a clause that propagates after you assign one of the literals to false. So a binary clause has a glue level of 1. This clause has five literals, but almost acts like a ternary clause because it has these two decisions (d, g) and then it would propagate. That’s why this clause would get a glucose level of 2. You want to have as few such levels in the clause. To compute this, go over the literals in the clause and compute how many decision levels you have. So $d$ is at 1, $g,h,i$ are all at 2, and then $s$ here is at 4. Therefore, this clause has a glue of 2 (off-by-one bug).

Acts almost like a ternary clause because of that. Why? Because $d$ and $g$ are implied by the decisions and $d$ and $g$ force $s$ to be False.

img/Untitled%202.png

Previous clause better than this one because this one has a glue of 3, it needs 3 decisions to be propagated.

[LaurentSimon] observation: we should prefer $s$ over $r$ because $r$ has a glue larger than $s$

Now what’s the idea? You’re now ordering the clauses with this LBD. So you have your cache over the clauses, then you order them by this LBD (you sort them), and then you throw away half of them. Before in MiniSAT, they just used this activity which was a score which is bumped every time and you just sort this with this activity. With LBD, it’s better, but there’s one additional trick they invented which is actually more important than LBD, but you need concept of LBD to do that. The basic idea is: they keep all the clauses which have extremely small LBD.

img/Untitled%203.png

$y$-axis: how often was a learned clause used. You can plug in several things for used, e.g. how often used in deriving another clause

Clauses with glue of 1 are kind of binary and glue of 2 are kind of ternary. So clauses with glucose level of 2 or less are kept forever

For the others, half is thrown away

img/Untitled%204.png

In this loop here, while it’s not sat or unsat, we do lots of things, like restarting, switching between stable and focused mode and now we’re looking at reducing

#ifndef: configuration, you can completely switch off these parts, it’s a preprocessor option

img/Untitled%205.png

Reduce is a very simple function. It just has here like a global statistic, a limit which is triggered if the current number of conflicts is higher than CONFLICTS (i.e. a threshold)

img/Untitled%206.png

These are now the reductions happened. The solver is doing something and then we have 600 conflicts and then (insight of two French colleagues), the delta between this is almost like around 300 (not completely true actually). So this is an arithmetic series of first order because the delta here between the limits here is increasing by around 300. If you do the math, this actually means if you throw away half of the learned clause and every conflict gives a learned clause, that you only keep the square root of number of learned clauses around in the long run. If you do this first order arithmetic series, your cache size of number of learned clauses, whenever you hit this limit, after this limit we throw away half of the learned clauses. In this solver, it’s 75% even. If you threw away half of the clauses, maybe in the end you get a square root behavior. After like 10000 conflicts, you’d only keep 100. The previous people, they increased this geometrically, the delta would increase by a factor 2 every time. Then you’d keep a fixed fraction of the learned clauses over time. So you get away with this by increasing this limit geometrically, but you’re actually keeping a fixed fraction.

img/Untitled%207.png

Whenever you see a - here, that’s actually a reduce. Option -v set here for verbose

img/Untitled%208.png

Limit from 300 to 600 to 1076

img/Untitled%209.png

This is the 11-th time we call reduce and we have only 43% as candidates. So how do we get from 100% learned clauses to only consider 43%? Now we’re going to reduce (delete) a certain fraction of all existing clauses. Originally, they put 50% here. Actually, with trick explained in a little bit, you can even throw away 75% of learned clauses. So keep all clauses lower than level 2, for the remaining guys you throw away half of them (in this case 75%) except that some are not considered as candidates.

img/Untitled%2010.png

So how do we get from 100% learned clauses to only consider 28%.

img/Untitled%2011.png

Trick is used flag

img/Untitled%2012.png

The clause has single bit which says used. What does it do? Whenever you do a reduction, you clear this used bit and only if you resolve with this clause during the next round (like from reduce 11 to reduce 12) and you use this clause in conflict analysis, then you set this bit to true.

And then you have only those as candidates to delete which were not used in last round. And then recall we increased this thing arithmetically, make it like 300 bigger every time. So you’re saying if in this duration which increases geometrically you have not used this fact, only then it’s a candidate to be thrown away. If you’ve used it, then better keep it. That’s the most important trick actually. You want to keep all the glue 2 clauses, used clauses, only of the rest you use the glue to sort.

img/Untitled%2013.png

This is the sorting function for this. It’s faster to do this with a radixsort, but this is only for those who know why this is faster. You’d need to merge the glue and the size into one 64-bit ranking function. And then radixsort is way faster for sorting.

So this is the reduce function which sorts the clauses and those clauses which are at the beginning are kept and those at the end are thrown away and this is a C style sorting function, it returns -1 if p is pointing to a clause pointer which is smaller. -1 means you keep the first one, the first one is better than second one. You sort first by glue which you compute during learning and then by size and then as a type-breaker you use the kind of id which kind of a timestamp where clause where was used. We prefer the younger ones over the older ones.

img/Untitled%2014.png

If you go to the original code, this is extremely convoluted. Here it’s not that convoluted, but it’s still in a configuration, like 3 version here. You could not use any used flag, if NUSED is defined, no use, then this whole code goes away. Or you use just a single bit, or you use 2 bits. What’s the difference ⇒ goes back to Changsek Oh.

img/Untitled%2015.png

Tier 1: always useful
Tier 2: Somewhat useful
Tier 3: Really useless

Tier 1 clauses are always kept, give tier 2 clauses a second chance if they were not used. Tier 3 clauses are always considered to be collected

Whenever you learn a tier 2 clause, it get its 2 bits set to 2 and the tier 3 clause gets its 2 bits set to 1. And then you decrease it only every time you do a reduction

The tier flag is not as important as used flag which is kind of a tier 2 scheme, but it gives some improvement. That’s why it’s implemented in SATCH

Strange thing with word “glue”: small is better here. So here low glue means high stickiness which is not true in the real world

./satch cnfs/prime4294967297.cnf

img/Untitled%2016.png

Already for this small example, the number of conflicts was 51000 and we reduced 32000 clauses. For longer runs this goes up dramatically. For small problems which are very combinatorical, you will in one hour remove 99% of learned clauses. For industrial larger instances, it’s more like 90 or 90%. It’s really remarkable that you learn all these clauses and you keep them for a small amount of time, but you really throw almost all of them away after a while. So reduce is a big improvement in the last 20 years, that Biere would port back to a 2000 solver

Reduce you can interpret as garbage collection or cache because we’re searching in the space of assignments and our learned clauses kind of remove some assignments because you’ve already searched them, they’re overlapping. So you put into this case those assignments which you already refuted, but then you move somewhere else and therefore you should actually clean this up

img/Untitled%2017.png

Definition LBD: think of it as clause size

One important rule of thumb in debugging: first try dirty and very fast, simple things (this is restarting). Reason: “heavy tail phenomenon” in statistics, high kurtosis. If you have a heavy tail distribution on run time, in debugging: time spent in debugging program, if you spend days on specific reason, maybe there’s another way of finding the bug by doing another method. Then maybe we’d get bug immediately. So if you’re doing something and there’s different ways of doing it, some take forever, some are fast, then you shouldn’t stubbornly stick to one method forever, you should switch between fast ones and slow ones. Statistically: as soon as you have problem with heavy tail runtime distribution, it’s good to abort frequently and restart

Old idea particularly in simulation, in Monte Carlo simulation

When doing machine learning, optimization, often good to just do same problem multiple times: maybe on a cluster, run software with 20 seeds. Then because of this heavy tail phenomenon of optimization, you’re simulating restarts in parallel, then the fastest one which gives you solution: take that one. The heavy-tailedness is then in the quality of the solutions you get. Run like 20 times, the algorithm, and then if you’re heavy tailed in terms of optimization criteria, then it might be if you just focus on one, you just get a bad optimum, but because you’re restarting because you’re running it multiple times, one of them might be a way better solution

In SAT people discovered it in the late 90s and they only focused of sat instances there. The looked originally on local search (local search: start at random point and doing local moves to improve towards the goal of satisfying all clauses). In this context, they’d try restarts with different starting points. It was completely clear that you shouldn’t start at one point and let solver run forever in these local moves, but once in a while restart. They measured heavy tailedness of runtime distribution and this gave a clear argument

In SAT, it’s very simple what you should do: count the number of conflicts, if it hits certain number of limit: restart. But: avoid running to same dead end, you still need to guarantee termination. Two ways to achieve this:

Keep all learned clauses (or increasing size of learned clauses), because they rule out parts of the search space and if you increase number of kept clauses, you would eventually have enough clauses (even if you’d restart every 10000 conflicts) to really terminate if it’s possible
Just increase restart limit. First you restart for 10000 conflicts, next time do 10% more and next time another 10% more. That’s what they did initially, having geometric schedule of restarts, but this is not considered good anymore

For local search, maybe randomization is only thing you’re interested in. Random search just picks global assignment and checks whether it’s satisfiable. If it’s not, maybe you flip a literal and then you do a next flip. Then by randomly picking a global assignment you already put like lots of randomization in there.

From paper:

[BiereFröhlich] argued that the original implementation of Glucose restarts is actually a combination of simple moving averages with cumulative moving averages. They then proposed to use exponential moving averages, because of several desirable properties. In particular, exponential moving averages do not require the implementation of a queue and, more important, gradually smooth the influence of earlier values.

Last point: Alternative between aggressive and non-aggressive restarts and this brings us back to 2nd bullet of 1st bullet: real reason we’re doing restarts now is not first one (1.1) which seems to be more intuitive. If a formula is satisfiable and you waste lots of time in some unsat part of the search space, then maybe it’s good to restart because you might have a chance to get to the sat part. This was the original motivation.

However, nowadays restarts, in particular very fast restarts are needed for certain unsat instances. Why? Same motivation in certain sense also there. Because you want to give the solver chance to find short proofs. You don’t wanna find satisfying assignment, but you want to give a chance to find short proofs. And if you’re doing this learning, you might get lost learning long clauses. Maybe there’s another place where you might get short proofs. And there empirically it’s way more important to do restarts for unsat. Nowadays: we can actually stop restarting for sat instances, but for unsat we really need it

Most important addition to restarts lately: last point, Chanseok Oh

img/Untitled%2018.png

Used in SMT solver in Microsoft, also in SAT4J (package configuration for eclipse)

$x$-axis: time, but measured in terms of restarts, how often did you restart

$y$-axis: the delta of conflicts between two restarts

So the actual conflicts is the integral of this thing here. So this is kind of the schedule of the delta between two restarts. Has fractal shape: first we restart after 100 conflicts, then we increase by 10%, then we go back to 100. Then we start at 100 to 110 and then 10% more like 121, then we go back to 100 again. We have this inner limit which exponentially increases and outer limit which exponentially increases by 10. Simple to implement and captures the essence, but it’s not the best one

img/Untitled%2019.png

... is CDCL loop and you stop after limit is hit (inner, initially 100). Then you increment it by 10%, then re-run CDCL loop for 10% more conflict. However, if inner is larger than outer limit, you increase outer and reset inner. You have really long runs without restart, but once in a while you have really fast ones. Following this alternating model is actually good

img/Untitled%2020.png

This really looks like fractal now

This came out of investigations when you should restart in methods with heavy tail runtime distribution, from Monte Carlo research. Called “Luby” restarts, Knuth calls it “reluctant doubling”, it’s the same

On the $x$-axis, we have the number of restarts, here there are like 70 restarts. On the $y$-axis we can see values of $2^n$, then it’s multiplied by a base interval. Lowest one is 512, next one is 1024, then go back to 512. Then 512, 1024, 2048 and so on. The highest one in this figure is 32 * 512

It’s reluctantly doubling; every time you hit the limit, next time you wait in principal twice as long, except that you’re reluctant, you’re going back to previous scheme

What we’d get using geometric restarts: you keep outer interval to infinity, go up exponentially there. For instance: could be that you really need 1000 conflicts without restarts to solve the problem. In a geometric scheme, you’re paying 1/percentage: factor 10, spending 10 times more. With a geometric scheme, you’re paying inverse of percentage increase in worst case (which is not too bad actually). For the geometric scheme, it’s constant with the inverse of the rate of what you pay.

With Luby, we pay log. If you’re willing to pay the logarithm overhead before you uninterruptedly can do the work you are supposed to do, then the Luby scheme is the one you should do. If you don’t know anything (how long you should try), Luby is a good guess because you then only pay logarithm

img/Untitled%2021.png

Code from the paper, not understandable (Biere; implements and plots the picture)

Every restart you have to recursively call this function. Each $i$ is the $i$th conflict. Very rarely you call this function, only when you hit the restart limit

img/Untitled%2022.png

Knuth’s version to get rid of recursion. Uses bit twiddling hack

You need a state: $(u_1, v_1)$, two variables 64 bit, initialize with 1, 1

Every time restart: do this update: $u_{n+1}, v_{n+1}$, the & is really bit-wise

Then reluctant doubling is just second component of resulting things

img/Untitled%2023.png

SATCH code, $u$ and $v$ need to be kept as state, can’t do this stateless. Like Fibonacci, you can do it in linear time with states as opposed to exponential time without states

Solver has two modes, stable and unstable mode. In stable stable we do very few restarts. Nowadays, Luby restarts are actually considered slow restarts.

stable_restart_interval: here it’s 1000, previously it was 512. This was considered fast in the past, but nowadays that’s not true anymore. Below it’s faster because restart_interval=1 and we do logarithm of restarts as interval ⇒ way more faster. Current research: you should switch between very slow and very fast restarts

Down below (after #endif) is very important for unsat, slow restarts above important for sat instances

./satch cnfs/prime4294967297.cnf

img/Untitled%2024.png

Pink thing is where solver is doing hardly any restarts, went from 413 to 418 restarts while the conflicts went from 11_000 to 18_000

img/Untitled%2025.png

In the focused part, non-stable part, we’re doing very fast restarts, 600 restarts in these 7000 conflicts

img/Untitled%2026.png

This is averaged: number of conflict per restarts is 33, but this is only because of the stable (in the stable phase, you almost never restart except for this reluctant doubling), in the other phase you’re doing lots of restarts

img/Untitled%2027.png

Main thing here: exponential moving averages. Now not taking the average over the scores (like involvement in conflicts), but taking clause size. LBD: approximation of clause size

$x$-axis: number of conflicts
$y$-axis: LBD (think of it as clause size)
They grey lines are restarts
The pink lines are inprocessing (kind of pre-processing between conflicts), ignore those for now
Blue: complete average, take all previous ones for the average, cumulative moving average CMA

Black dots at large gap on the right: we’re making progress, learned clauses are really small, going well, but then size of clauses go up and restarts are done

Back then: this is very similar to stock market analysis. Nowadays: 7 day incidence for Corona

img/Untitled%2028.png

Yellow: daily prices
Pink: 29 day window
Green: 9 day window

Heuristic for buying and selling stock. Interesting parts: short time average crosses long term average

img/Untitled%2029.png

img/Untitled%2030.png

In reality there’s a margin because there’s also costs with buying: 10% or 25%

How to compute these curves? 2 classical ways:

Window version, also used for Corona incidence
Exponential moving average (in Biere’s view better)

img/Untitled%2031.png

7 day window: 7 numbers, old average, new average. But last one is forgotten with new average. This is called the “moving average”

img/Untitled%2032.png

$s$ is the new incoming value, $f$ is the old one

In ML regime, $(s-f)$ is called “error”, got $s$, but predicted $f$

If your $f$ was too small, then you should update the $f’$ to go further up. If $f$ was too big, then $f’$ should go down

The update in programming language is on the bottom. It describes the same update. Similarly, we would write a = a + 1 instead of $a’ = a + 1$ in mathematical terms

img/Untitled%2033.png

For SAT solvers, these are the values

img/Untitled%2027.png

In stock market rule of thumb: inverse of $\alpha$, they are of a factor 2 of moving average window. If you want to do 7 incidence window, you want to do $2/(7\alpha)$ which would give you almost the same

Initially, the green thing doesn’t look so good, also blue one is strange (we see that zooming in). In machine learning terms “biased towards the initial value”. So how initialize this thing?

If you initially with 0, the slow update will ramp up really slowly while the fast one will be immediately on top.

img/Untitled%2034.png

$\beta=1 - \alpha$ here, you add a correction term which depends on $\alpha$. For a fast moving average, the correction term will be smaller and for a slow moving it will be bigger and this way they’d be close initially

ADAM is a way of doing meta parameter tuning for machine learning. In one part of the paper, they describe initializing these moving averages

Highlighted part is about initialization of EMA

img/Untitled%2035.png

Get two averages here, fast and slow one

glue is LBD, think of size of learned clauses

Then you multiply it by a restart margin (1.1 in this case)

Then times the slow version. Stock market analysis: take monthly average, add 10 percent and then if your current weekly average goes 10% above the monthly average, then you buy the stock. In our case, it’s negative so we don’t buy, we say this has to be restarted, if the fast is 10% over the slow

img/Untitled%2027.png

You will only do this in one phase where you’re doing extremely fast restarts. You also see it with these grey lines, they’re extremely close. But they kind of inhibit it if you’re doing well. Other way around: if you’re jumping up with the size somehow, then you’re doing restarts

Empirically, really helpful. Everybody does this now. It was unclear 10 years ago, but now we have a much better picture when we should restart

img/Untitled%2036.png

This is a margin of 25% here (in the code it was 10%)

img/Untitled%2037.png

Knuth doesn’t like floating points

Fixed-point arithmetic there, no need for floats here, this has some advantages, especially for machines that have slow floating points which nowadays almost doesn’t exist

Phase Saving

Phase saving: You pick a decision variable and then independently over that you will pick a value for the variable (either 0 or 1). For sat instances, this is actually the most important part

img/Untitled%2038.png

Trivial: the only thing important for sat instances is that you pick for each variable the right value. If we only want a solution (e.g. cryptoanalysis, generating test cases for software), picking the right phase is essential.

DPLL: need to pick variable, and then literal. But that’s not completely true because few solvers would have heuristic for just picking literal, but this one is the more used version where you separate these two heuristics and these phase heuristics are the most important ones for sat ⇒ order is irrelevant, you just need to pick the right value

Phase saving only works if you combine it with rapid restarts

Luby picture: $y$-axis: number of conflicts you wait before doing a restart. Only additional parameter is multiplication with base interval, in the original paper it was 512. Back then it was considered rapid.

Very fast restarts (e.g. every 1000 conflicts) and phase saving together gave huge improvement. And all this is just two lines of code.

Marijn Heule: when you restart, you look at the trail and then you analyze it and try to figure out would the trail change if you restarted. You can do this by first getting “what would be the next decision you’re going to make” and then if the heuristics would tell you “I would actually pick the same decisions as before and also the same propagation would happen up to a certain point a new decision would trigger, then you can keep the trail (don’t have to restart to the top). Give you some reduction of cost of restart

img/Untitled%2039.png

There are two heuristics: variable heuristics and phase heuristics.

This talk is about phase heuristics, which value do you assign

In minisat: this one would set things to false always. Turns out this is better than setting it to true in many cases and often better than other ways. For example, computing static phase. Only with phase saving that changed, that you have something else

2 parts which are new ideas in this talk:

Phase saving is actually kind of a reset.
The opposite effect: even doing something more aggressive than saving phase: maximizing the trail

“Target phases”: targeting phases of previous partial assignments

img/Untitled%2040.png

New view: CDCL is supposed to maximize the trail. At the end, you might want to get this partial assignment, you make it bigger and bigger and bigger until you find a sat solution. This is actually a classical theme in AI: generic idea: intensification vs. diversification in search.

In sat solvers, we’re trying to find everything, go through whole search space. However, that’s not completely true because the method we’re using is a heuristic in a sense we’re trying to find short proofs. So our goal is finding short proofs if there are any. So we’re still having heuristic search, even though we’re covering the whole search space. So search is still heuristic because there might be short proofs out there, but if you’re stupid, you will only will find the big ones.

AI: routing, what your GPS would do, it’s imprecise because it won’t give you the very best solution. Here we have to be precise because we’re in sat solving. In the most important part, that’s searching for the shortest proof, we’re imprecise. That’s why all these ideas from AI on imprecise search really apply to our case here, too.

img/Untitled%2041.png

Our two things we discussed before, rephasing saved phases (resetting saved phases) and maximizing the trail. Maximizing the trail has the intensification flavor, we want to make this trail bigger and bigger until we hit the sat assignment. This rephasing does the opposite because if you’re stubborn in intensification, you will not be good.

img/Untitled%2042.png

The new thing is target phases, rephasing is obvious. It works as follows: whenever you make an assignment and you propagate it and you do not reach a conflict, then you increased the trail to a partially consistent assignment. Now we call this our target assignment. If we now hit a conflict and backtrack, then do again decisions, we try to do the same target assignments we saved before and this way hope to get an even bigger trail (more assigned variables without getting a conflict), then you save this whole trail and if you now backtrack, we do the same thing again. Actually, it’s more complicated to implement, but this is the basic idea.

img/Untitled%2043.png

Hacktrack: take a solver and add hacks to it to make it better. Hacks are small changes. Phase saving is an argument for hacks (2 lines could make a huge difference)

Hack to implement phase saving:

1st line: The first change you need to do is as soon as a variable is assigned, save its phase. Think back on lecture where data structures of solvers are being discussed. There’s this array of partial assignments: there was X, 0 or 1 in there. This is your value array. Now we’re going to have a second array saved which is always overwritten by the current value, but you don’t reset it to X (unassigned). So whenever you make a real assignment (set variable to 0 and false), you also set 0 and false to the other array.

2nd line: If you have to pick a value for a decision variable, you just go into the saved array and pick the value from there.

You need to initialize the array with values. In every sat solver, you could just set everything to false. In some of Biere’s sat solvers, different optimizations were used.

Biere’s approach to initializing array: count the number of variables like DRIS heuristic, number of occurrences.

Untitled

If a variable occurs more often positively, then you put the 1 there, if it occurs more negatively, you put a 0 there. Seems natural.

However, there were some instances generated from colleagues in Vienna (for AI argumentation instances) where the problems were pretty hard unless you set everything to false. Had a solution which sets everything to false, but otherwise, if you try something else, you will just get lost.

In 2012, they submitted this to the competition and the organizers for the next year selected benchmarks and they did this by running last year’s winners and trying to make the selection of benchmarks as fair as possible. They were also running Biere’s sat solver because he was in the top 3 and his solver couldn’t solve any of them while all the other solvers solved these benchmarks instantly because they put 0 as default value.

So they included these benchmarks just in the competition which then for some years was considered to be the gold standard for papers because Biere was clever in this initialization. The result of all of that is that in CaDiCal, there’s a lucky phase which simply check if 0000… or 11111…. satisfies the formula and then you immediately return if that’s the case. This was the fix to the fix, but it’s not best. There’s things to do here, but initialization is not that important, the two things above are way more important.

This phase saving gives you opportunity to think about initialization because you only have to do it once

img/Untitled%2045.png

If this is our formula and we have somehow cut it in two (remember we had min cut heuristic). And now for some reason, we get like a lot of conflicts here (squiggly lines). But then at one point here you’re finding a sat solution (dots on the left). We will see later: these VSIDS heuristic will actually keep you in this part, so this is really separated. Then this left here is sat, but not the whole formula. We need the right part for that.

The decision heuristic would pick a new variable and pick another one and maybe it finds a unit learned by conflict analysis which says “this variable has to be false/true”. Then you need to continue searching for other variables, maybe more units.

The point is: whenever you learn such a unit, what’s going to happen with CDCL? It will forget all of these guys because it jumps back and erases the whole trail and then makes this variable to true. And this happens every time you’re learning a unit.

What does phase saving give you? It look at variables, what they were saved to, the saved phase. If at the end, you get sat solution for this guy, then you need to go back and re-assign these variables to the solution and now phase saving just magically gives us exactly the thing we saw before.

So phase saving has this effect that if your formula splits into these connected components and if you have partial assignment for one part, then without much more implementation than the simple trick just described, you save the whole partial assignment. And you don’t need to know that these compose into two components, or 4 or 5.

So this phase saving gives you this saved assignments for satisfying components. However, this also happens deeper in the search. You will do some assignments, and then you spend lots of time in that search part. But also there, it might happen that one part of your whole CNF is actually satisfied, so you never need to get conflicts in there because you know there’s a sat assignment for that part. And it’s disconnected from the part where you maybe have on that level, you learn some implications. So this component argument goes down in the search.

Analogy: if you cut a tree with an axe, you will of course not try to hit it everywhere, but try to hit it one particular spot all the time. That’s what phase saving gives you.

Rapid restarts: they figured: if you have this phase saving, you’re not wasting much time by going back doing restarts. Because if you restart, you erase your whole trail and now you’re going again decisions and you pick the decisions by the last assignment, you’re actually just rebuilding the trail. Until maybe you do some shortcut, and this shortcut is then useful, that’s why this combination is actually good for unsat formulas because you learn something here, it’s kind of useless or maybe too verbose, now you’re restarting and you should try to do the same again, but maybe now you found a shortcut.

img/Untitled%2046.png

Partially obsolete because recent experiments show that not all of this is necessary

Resets: one in a while (phases which are triggered by a geometric scheme, for instance) where you reset the saved phase. And that’s the diversification part. If you do search, you want to explore other alternatives once in a while, and that’s what we’re doing here.

We’re taking the original phase (solver has a default phase like false or true) and that’s of course like going back to what you started with.

img/Untitled%2047.png

Then you have the inverted phase, you just do the opposite. If everything was 0 by default previously, now it’s true

img/Untitled%2048.png

This is actually still needed. In essence: remember we have saved phases where we have original values (the values which current assignment), then we have the phase saved phases, then we’re going to have target phases, like a third array of values and then this one, this is the fourth one, the best ever seen. Best means: the one where the trail was biggest because this is largest partial assignment which is consistent

First: current assignment
Second: saved phases
Third: target phases
Fourth: best one yet

img/Untitled%2049.png

What you do here: you take a completely different algorithm (local search, not much touched in this lecture) where you do locally changes in variables and try to minimize number of unsatisfiable clauses as locally. This hasn’t been used in practice, not used in industrial instances and now people seem to have figured out how to make use of local search in sat solving. Here we just use it to kind of initialize the saved phases. So we run a local search to get a minimum of number of unsat clauses and then we take this assignment and use it as saved phase. That’s our reset.

img/Untitled%2050.png

img/Untitled%2051.png

“Flipped” means you just take a saved phase and flip everything. So kind of looking at other corner of search space

img/Untitled%2052.png

This was version last year, so start off with original phase, then with its inverted phase (lucky)

Competition: benchmarks which do factoring, factor product of two big prime numbers, build a circuit which has as output product of the two prime numbers and then you’re asking are these inputs to this multiplier circuit which produces the output. This is a hard sat formula. Now you can flip the literals in the circuit in such a way that only 0000 as input would be true, then every sat solver would solve it. You could also do the opposite, flip the literals such that 11111 is an input. This was not accepted because it was considered cheating. Only Biere’s solver would’ve been able to solve these instances and also this one which after the first reset, would try 11111.

$\omega$ means you continue to do this forever

img/Untitled%2053.png

Guys from Glucose solver: They figured that if you run their solver and then on sat instances, it’s a little bit strange: the trail goes up, the head of the trail and they looked at cryptographic instances. On these instances they saw that their heuristic for restarting would still restart too often, so they tried to find a way to avoid to restart too often. So they limit restarting if they see that the derivative of the size increase of the trail is big and getting bigger and bigger because that means you’re getting closer and closer to a satisfying assignment. You can think of this as a passive optimization in Glucose.

Biere: his idea is way better, active optimization of Glucose. We take the trail which is supposed to get bigger and bigger and then we force this assignment until we find another trail which is even bigger and in that point we use the next bigger trail as the target. Think of it as an improvement of phase saving, but it’s more of a global nature. In phase saving, you take every assignment you make and you save it and here for target phases, you stick to the this one partial big trail you saw before and only change it if you find a bigger one. So it’s way more eager going towards this big thing.

img/Untitled%2054.png

img/Untitled%2055.png

A hat under a line means decision

This is an example where you see what is happening on the trail which is marked as current, the target phases and then the saved phases. One difference to what Biere said before is that actually in the saved phases, you save either true, false or nothing if it has not been set.

The point here indicates that the value is not decided yet.

The first thing you do is a decision: + and then you have to check whether to want to decide it positively or negatively and in this case, original phase was set, so positively. Not only do we set it positively, but we also mark it in the saved phases (+ in saved). After this decision, we propagate values and they’re all negative, so it’s denoted by -. Then we take another decision which again is positive and we continue. After a while, we reach a conflict (this is a pigeon hole problem which is not satisfiable).

At before conflict analysis: we have trail (current) and the saved phases are actually identical. The first thing we do is we save the phase in the target. That’s why the target and saved phase are identical. We backjump then (visible by ....). The trail and saved phases are now different. The saved phases and target phases remember where we were before.

img/Untitled%2056.png

Then we continue and do decision again and look if we have saved phases. In this case yes, so we take the assignment from there (set variable negatively). In that case, the saved phases and target phases are still identical. Then we continue until last example at bottom. There’s a difference now between target phases and saved phases when we take a decision here. This is the first point where saved phase and target phase are different, so depending in what part of the search you are, you will either set this variable to positive or negative.

Output of KISSAT

⬜ White part: focused mode. In this mode, we take the values that is given by the saved phases. Here we solve more unsat problems.

🟪 Purple part: stable mode. In this mode, we take the values given by the target phases. Here we solve more sat problems.

Trail column: average height of the trail with respect to the total number of variables still left. The total number of variables are second to last column. And then this is the percentage of the trail which is consistent and you see that for target phases and stable mode, it’s higher and it’s increasing, so it goes from 43% to 44% and these exponential moving averages which compute that they are actually separate for every phase. When you switch the mode (between focused and stable), you use a new set of EMA. And you see that these target phases really improve, here not dramatically, but they go up by 1%, so this is effect of target phases.

You can also see that during the phase, the clauses that you learn, so the conflict clauses, are actually much longer: 18 → 48

img/Untitled%2057.png

This is the output from the slide before and just phases grabbed.

On left (red letters), you can see alternations of rephasing: start with original (O), then inverted (I), then alternation. The first phases last about one second or so (column after red letters) and it’s geometrically increasing, so at the end it takes 5 seconds, we search more and more in one direction.

img/Untitled%2058.png

So we alternate between focused mode and stable mode. In stable mode, we change slowly and in particular we use target phases, so we really search in one direction. In focused mode, we’re more agile, we’re changing fast.

We do that in either geometrically increasing conflict intervals or in arithmetically increasing intervals. Rephasing happens within these phases.

img/Untitled%2059.png

img/Untitled%2060.png

The features here were deactivated. At the bottom, there’s the version with no phase saving at all, so we just use original phase there. This is the worst performing solution. Above (pink): phase saving is done, however we don’t do target phases and we don’t rephase anything. We solve more problems and faster here. Above are ones with features added. The best one (black) is the one where target phases are always used. The default version alternates between always target and no target at all.

img/Untitled%2061.png

img/Untitled%2062.png

The default version performs best here: default policy of switching between target phases and no target phases performs better on unsat instances. The worst performing one is still the one without phase saving.

img/Untitled%2063.png

img/Untitled%2064.png

The phase saving is less important than it is in KISSAT (no phase saving is not bottom one)

img/Untitled%2065.png

If you do always target phases, but not rephasing (blue) it performs even worse than no phase saving. It’s interesting because it’s very stubborn in one direction, very hard to change direction because of how it works. Basically, you set everything to true in the beginning and it’s very hard to make it change in one direction.

img/Untitled%2066.png

Original implementation of Glucose didn’t work. Then Mathias implemented “bumping of reasons” which seems completely orthogonal and somehow it started working.

Now they have 5 SAT Solvers with Glucose working, so they’re writing that up right now

Conclusion

img/Untitled%2067.png

If you discard idea of target phases and just do rephasing, you’d expect that it helps because you learn slightly different clauses which you can combine with inprocessing, but that’s not so clear in their benchmark. So just rephasing doesn’t seem to help and it seems fragile, so if you do it wrongly, it won’t work.

However, target phases really help. But it only helps if you have rephasing too.

Now they do believe that random walk and autarky have a clear effect, they had bugs previously.

img/Untitled%2068.png

How to implement thing with $\omega$? Here: just put schedule there: array of function. And then each function

img/Untitled%2069.png

This function here just takes initial phase and then goes over all saved phases here and get assignments into the inverted initial phase

img/Untitled%2070.png

img/Untitled%2071.png

What Mathias pointed out: you only overwrite if the best is really set: if tmp is non-zero, then you set the saved phase. So you save the best trail you saw ever, then when you do this b, you’re going to use this. The functions are pushed on the stack (fixed size)

img/Untitled%2072.png

This will be $IO(BWO \, BWI)^\omega$

img/Untitled%2073.png

Reason that this is all complicated is that you can switch these options off individually

Watches

ZChaff Occurrence Stacks

Watches are references placed in clauses. They offer a more efficient method than connecting all literals in a clause, by only connecting two. In the original SAT solver implementation, two references, or watches, were placed in each clause.

The precondition for placing these watches is the maintenance of an invariant that these two references will not point to literals that are both false. This means that these literals can either be satisfied or remain unassigned.

The purpose of using watches in the process of SAT solving relates to the process of propagation. Propagation involves going over all clauses where a literal appears negatively. However, with watches, the solver only needs to visit the clauses that have the literal that just became false, thereby enhancing the efficiency of the SAT solver.

When a literal turns false, the solver needs to re-establish the invariant of the clause by finding another literal that can replace the false one. This ensures that there always exist two literals watching each clause, where none of them are false. This is also true if one of the literals is satisfied or unassigned, however, there should never be a clause with two literals where both are false.

In the early days of SAT solving, watches were placed anywhere in the clause. However, the methodology has since evolved, and modern SAT solvers watch the first two literals of each clause. In essence, the clauses remain unchanged but the watches, or literals, can move around within the clause.

This modern approach of watching the first two literals not only preserves the original function but also makes the system more straightforward.

The introduction of watches in SAT solving has significantly enhanced the speed and efficiency of SAT solvers. By observing only two literals per clause, the stacks of occurrence pointers are significantly reduced. For example, a clause of length 100 would require 100 literals to be watched in the traditional implementation. However, with the implementation of watches, only the first two literals need to be watched, resulting in an order of magnitude speed improvement in propagation.

Also a good resource: https://school.a4cp.org/summer2011/slides/Gent/SATCP3.pdf

Watched literals may be patented: US Patent 7,418,369

From satlive.org (internet archive):

Chair of Computer Architecture

Lecture 11

Phase Saving

Conclusion

Watches

ZChaff Occurrence Stacks