Lecture 01
SAT solving is NP-complete. However, it can be done in polynomial space (look at the DPLL algorithm).
Motivation to use SAT
Are the original and optimized the same? We could use a test case. However, even if the test case passes, we cannot guarantee that the optimized and original circuit are functionally equivalent. SAT solving gives us a definite answer
Encoding
Encoding = translating a problem into SAT
It is enough to model the execution of $f$, $g$ , and $h$ by boolean functions, even if these functions perform complex tasks. It is similar to testing sorting algorithms; the use of 0 and 1 is enough
The boolean expression on the left and right must be equivalent
Now we have two formulas, and we want to check their equivalence. To do that, we must bring these formulas into a special format; a CNF. Getting a CNF is not trivial, and there are special algorithms.
In the formula above, you see the equivalence symbol. In practice, we do not check for equivalence but inequivalence. If the circuits are not equal, we also get a counterexample if using inequivalence.
There is a difference between checking SAT and whether it is a tautology. SAT solving is always a proof by contradiction.
Why do we use CNF? Because it is practical since we can use various rules, such as input propagation, resolution, and other heuristics. Moreover, the efficient Davis-Putnam-Logemann-Loveland (DPLL) algorithm, which is the basis for most modern SAT solvers, works on CNF.
Encoding Example: Graph Coloring
Graph coloring has many practical applications, such as in compilers, where it is used for register allocation. This has been a longstanding use for graph coloring, as typically, only a limited number of registers are available (e.g., 32). Still, a program may have more variables than this (e.g., 50). By assigning registers to variables like colors and considering their dependencies, graph coloring can be used to solve this problem. This practical application is why graph coloring is relevant and useful.
Consider a graph with three nodes (a, b, and c) and edges connecting them. Graph coloring refers to assigning colors to the nodes such that no two nodes connected by an edge have the same color. In this particular case, three colors are needed to accomplish this task.
Usually, we represent variables by integers
The initial encoding for SAT that we will perform only involves two colors, each represented by one bit. We should get unsat from this.
In SAT solvers, numbers represent variables corresponding to bits in binary form. In this case, we have three variables or three bits. To give meaning to these bits, we will assign them to two colors, for example, green and blue. The objective is to color each node with one of these two colors. To accomplish this, we can create Boolean variables for each node, which are mutually exclusive and can only take on 0 or 1. This will allow us to represent the color of each node using these Boolean variables.
They both should not be true:
\[\overline{x_1 \wedge x_2}\]They both should not be false:
\[\overline{\overline{x_1} \wedge \overline{x_2}}\]We perform De-Morgan and obtain
\[(\overline{x_1} \vee \overline{x_2}) \wedge (x_1 \vee x_2)\]Similarly, we add clauses 1-3 and 2-3. In the DIMACS format:
p cnf 3 6
-1 -2 0
1 2 0
-2 -3 0
2 3 0
-3 -1 0
3 1 0
There are three variables and six clauses, hence p cnf 3 6
The SAT solver will now tell us that this CNF is unsatisfiable, which makes sense. With this graph, not all nodes that share an edge may have different colors.
We need another technique for three colors. We introduce following encoding: All variables are two-digit integers. The first digit denotes the node, and the second digit the color. So literal 21 means that node two has color 1. Literal -21 means that node two does not have color 1.
This rules out that both 1 and 2 have the same color 1:
\[\overline{x_{11} \wedge x_{21}} \equiv \overline{x_{11}} \vee \overline{x_{21}}\]Following rule out that 1 and 2 have the same color 2 and 3:
\[\overline{x_{12} \wedge x_{22}} \equiv \overline{x_{12}} \vee \overline{x_{22}} \\ \overline{x_{13} \wedge x_{23}} \equiv \overline{x_{13}} \vee \overline{x_{23}}\]We must repeat this process two more times. In DIMACS:
-11 -21 0 c 1 and 2 may not have same color '1'
-12 -22 0 c 1 and 2 may not have same color '2'
-13 -23 0 c 1 and 2 may not have same color '3'
-21 -31 0 c 2 and 3 may not have same color '1'
-22 -32 0 c 2 and 3 may not have same color '2'
-23 -33 0 c 2 and 3 may not have same color '3'
-11 -31 0 c 1 and 3 may not have same color '1'
-12 -32 0 c 1 and 3 may not have same color '2'
-13 -33 0 c 1 and 3 may not have same color '3'
Using this technique, we obtain 33 variables. Not all are meant for us, but that does not matter.
p cnf 33 21
-11 -21 0 c 1 and 2 may not have same color '1'
-12 -22 0 c 1 and 2 may not have same color '2'
-13 -23 0 c 1 and 2 may not have same color '3'
-21 -31 0 c 2 and 3 may not have same color '1'
-22 -32 0 c 2 and 3 may not have same color '2'
-23 -33 0 c 2 and 3 may not have same color '3'
-11 -31 0 c 1 and 3 may not have same color '1'
-12 -32 0 c 1 and 3 may not have same color '2'
-13 -33 0 c 1 and 3 may not have same color '3'
11 12 13 0 c node 1 has at least one color
21 22 23 0 c node 2 has at least one color
32 32 33 0 c node 3 has at least one color
-11 -12 0 c 1 cannot have color '1' and '2' at the same time
-11 -13 0 c 1 cannot have color '1' and '3 at the same time
-12 -13 0 c 1 cannot have color '2' and '3' at the same time
-21 -22 0 c 2 cannot have color '1' and '2' at the same time
-21 -23 0 c 2 cannot have color '1' and '3' at the same time
-22 -23 0 c 2 cannot have color '2' and '3' at the same time
-31 -32 0 c 3 cannot have color '1' and '2' at the same time
-31 -33 0 c 3 cannot have color '1' and '3' at the same time
-32 -33 0 c 3 cannot have color '2' and '3' at the same time
From a SAT solver, we obtain
s SATISFIABLE
v 1 2 3 4 5 6 7 8 9 10 11 -12 -13 14 15 16 17 18 19 20 -21 22 -23 24 25 26 27
v 28 29 30 -31 -32 33 0
We can see that it assigned color 1 to node 1, color 2 to node 2, and color 3 to node 3.
Encoding Example: Sudoku
A Sudoku puzzle has a 9x9 grid with 9 possible numbers for each cell. To solve the puzzle using SAT, we must represent each cell as a variable requiring 729 variables. We use integers to represent these variables as three-dimensional encodings. The first two digits represent the row and column of the cell, and the third digit represents the number in the cell. For example, the variable 999 represents a cell in the ninth row and ninth column with the number 2.
To encode the Sudoku constraints, we must ensure that each row, column, and 3x3 block contains each number exactly once. We add “at least once” and “at most once” constraints.
To start the puzzle, we add the initial numbers as unary clauses, where each number is set to true for its corresponding cell. For example, if there is a 5 in row 3 and column 7, we set the variable 375 to true.
To get the variable number for a cell at row $x$, column $y$, and number $c$, we use the formula $100x + 10 y + c + 111$. Note that $x$, $y$ , and $c$ range from 0-9.
Please have a look at the code below for more detail. Try it out for yourself!
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
// clang-format off
int sudoku[9][9] = {
{ 0, 0, 2, 0, 5, 0, 0, 0, 6 },
{ 0, 0, 0, 0, 0, 4, 0, 7, 0 },
{ 1, 0, 8, 0, 9, 0, 0, 0, 0 },
{ 0, 0, 0, 7, 0, 6, 0, 8, 0 },
{ 7, 0, 6, 0, 0, 0, 2, 0, 4 },
{ 0, 8, 0, 5, 0, 2, 0, 0, 0 },
{ 0, 0, 0, 0, 1, 0, 3, 0, 8 },
{ 0, 9, 0, 4, 0, 0, 0, 0, 0 },
{ 8, 0, 0, 0, 2, 0, 4, 0, 0 },
};
// clang-format on
/*
- Note that in the context of Sudoku, "color" means "number"
- `x` means row, `y` means column and `c` means color
*/
int printing;
int clauses;
// We abuse integers for a three-dimensional encoding
// For example, variable 128 means in the first row, second column the color is
// 8 Variable 832 means in the eighth row, third column the color is 2 The first
// 110 variables are not used Variables **0 are not used
int lit(int x, int y, int c) { return 100 * x + 10 * y + c + 111; }
void unary(int a) {
if (printing)
printf("%d 0\n", a);
else
clauses++;
}
void binary(int a, int b) {
if (printing)
printf("%d %d 0\n", a, b);
else
clauses++;
}
int stack[9];
int size; // Tells us how many literals are on the stack
// Add something to the stack
void push(int a) {
assert(size < 9);
stack[size++] = a;
}
// Equivalent to clearning the stack
void clear() { size = 0; }
// This function adds to the CNF what is currently on the stack
// For rows, columns, blocks and colors (numbers) the constraints are all the
// same: A color must appear exactly once That we achieve by introducing an
// at-least-once and an at-most-once constraint, hence exactly once
void constraint() {
// At least one color per stack literal
if (printing) {
for (int i = 0; i != size; i++) printf("%d ", stack[i]);
printf("0\n");
} else
clauses++;
// At most one color per stack literal
for (int i = 0; i != size; i++)
for (int j = i + 1; j != size; j++) binary(-stack[i], -stack[j]);
// We have printed what we wanted to print, so we clear the stack now
clear();
}
void colors() {
for (int x = 0; x != 9; x++)
for (int y = 0; y != 9; y++) {
for (int c = 0; c != 9; c++) push(lit(x, y, c));
constraint();
}
}
void columns() {
for (int y = 0; y != 9; y++)
for (int c = 0; c != 9; c++) {
for (int x = 0; x != 9; x++) push(lit(x, y, c));
constraint();
}
}
void rows() {
for (int x = 0; x != 9; x++)
for (int c = 0; c != 9; c++) {
for (int y = 0; y != 9; y++) push(lit(x, y, c));
constraint();
}
}
void blocks() {
for (int bx = 0; bx != 9; bx += 3)
for (int by = 0; by != 9; by += 3)
for (int c = 0; c != 9; c++) {
for (int x = 0; x != 3; x++)
for (int y = 0; y != 3; y++) push(lit(bx + x, by + y, c));
constraint();
}
}
void constraints() {
colors();
columns();
rows();
blocks();
}
void instance() {
for (int x = 0; x != 9; x++)
for (int y = 0; y != 9; y++)
if (sudoku[x][y]) unary(lit(x, y, sudoku[x][y]));
}
void header() { printf("p cnf %d %d\n", 999, clauses); }
void parse_sudoku(char *filename) {
FILE *fp;
// open file
fp = fopen(filename, "r");
if (fp == NULL) {
printf("Error opening file.\n");
abort();
}
// read file and store in array
for (int i = 0; i < 9; i++) {
for (int j = 0; j < 9; j++) {
if (fscanf(fp, "%1d", &sudoku[i][j]) != 1) {
printf("Error reading file.\n");
abort();
}
}
}
// close file
fclose(fp);
}
int main(int argc, char *argv[]) {
// If the user provides a file, then read that one. Otherwise, use the default
// Sudoku board
if (argc > 1) parse_sudoku(argv[1]);
// We do not know how many clauses there will be
// So we start with a run where `printing` is 0
// In such a run, we only count the clauses and do nothing else
for (printing = 0; printing != 2; printing++) {
if (printing) header();
constraints();
instance();
}
return 0;
}
Encoding Example: Schur
Look at https://www.cs.utexas.edu/~marijn/Schur/ for an introduction to the Schur problem
Look at the sequence of natural numbers and assign colors. We want to ensure that whenever these numbers add up to the last one, they do not have the same color. So we want to avoid monocolored solutions. Can we color a subset of natural numbers up to some bound such that all those solutions are not mono-colored?
Following is a C program that encodes the Schur problem into DIMACS.
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char**argv) {
int n = argc > 1 ? atoi (argv[1]) : 5;
printf("p cnf %d 0\n", n); // FIXME
for (int i = 1; i <= n; i++) {
for (int j = i; j <= n; j++) {
int k = i + j;
if (k <= n)
printf("%d %d %d 0\n", i, j, k),
printf("%d %d %d 0\n", -i, -j, -k);
}
}
return 0;
}
$ ./schur 2|satch -f -q
s SATISFIABLE
v -1 2 0
True: blue, False: red
We must assign: 1 2
$ ./schur 3|satch -f -q
s SATISFIABLE
v -1 2 3 0
We must assign: 1 2 3
$ ./schur 4|satch -f -q
s SATISFIABLE
v 1 -2 -3 4 0
We must assign: 1 2 3 4
The problem is symmetric, so we could swap the color
$ ./schur 5|satch -f -q
s UNSATISFIABLE
Conclusion: Shur(2)=5, because it is unsat for 5
Encoding Example: Pythagorean Triples Problem
Look at https://www.cs.utexas.edu/~marijn/ptn/ for an introduction
Wikipedia link: https://en.wikipedia.org/wiki/Boolean_Pythagorean_triples_problem?oldformat=true
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char ** argv) {
int n = argc > 1 ? atoi (argv[1]) : 7825;
printf("p cnf %d 0\n", n); // FIXME
for (int i = 1; i <= n; i++) {
for (int j = 1; j <= n; j++) {
int k = sqrt (i*i + j*j);
if (k <=n && i*i + j*j == k*k)
printf("%d %d %d 0\n", i, j, k),
printf("%d %d %d 0\n", -i, -j, -k);
}
}
return 0;
}
This is an assignment calculated for all numbers up to 7824
White ones do not matter; they can be either colored blue or red
The solution for unsat has 5 Petabytes in size
People do not believe P and co-NP coincide
In practice: we do not care
In theory: if NP = co-NP, then there would always be short proofs: we guess an assignment, and checking if guessing is correct takes polynomial time (polynomial checking for everything that’s unsat)
Check Your Knowledge
- What is a CNF? Write the following as CNF:
- XOR
- If-then-else
- Bi-implication
- In SAT solving, do we usually check whether a formula is a tautology?
- What is the time complexity of SAT in general? What is the space complexity of SAT?
- Suppose we were to check whether the following two bits of C code were equivalent using a SAT solver. How would we approach the problem?
- Explain the Schur(2) problem! How would you create a SAT encoding for it?
- Explain the Pythagorean Triples problem! How would you create a SAT encoding for it?