Most software developers have never heard about SAT solvers, but they are amazingly powerful and the algorithms behind them are actually very easy to understand. The whole concept blew my mind when I learned about how they worked. This will be the first post in a series about SAT solving where we’ll end up building a simple solver in Java later on.
In this first part of the series we’ll cover:
- Look at what SAT is
- Explain the SAT DIMACS format
- Solve a puzzle: Peaceable Queens
What is SAT?
SAT is short for ‘satisfiability’, what people mean when they say SAT is: Boolean satisfiability problem.
It turns out you can write a lot of ‘problems’ in a specific way as a special boolean formula. These problems use boolean variables
that can be either true
or false
. When a variable is assigned true or false, it is called a literal
.
An example of a boolean expression:
(A or B or ~C) and (B or C) and (~B) and (~A or C)
Conjunctive normal form (CNF)
The statement above is written in a particular way, it is called the conjunctive normal form
(or CNF). All boolean statements can be rewritten/transformed into this form.
First we’ll need to learn some more terminology. The variables have a label here, for example A
. There is also the option to have a NOT A, this is written as ~A
.
The line above can be broken up into four pieces:
Each line is called a ‘clause’. Those clauses are connected together using AND
statements. A clause
can only contain variables (or negated variables) joined together by OR
.
Because each clause will have OR
between the variables
and the end of the clause is always an AND
statement, those things can be removed from the statement, turning it into:
DIMACS format
If we do one more step you’ll end up with a format called DIMACS
, which is supported by almost all SAT solvers. The last step is to replace the name of variable A
with 1
and B
becomes 2
etc. The NOT
statement is just writing the number negative, so ~C
becomes -3
.
Now add a 0
to the end of each line (as separator) and add a header which explains the amount of variables (= 3) and the amount of clauses (= 4) and you get:
This is a valid input file for almost all SAT solvers. You can run this for example in minisat:
And print the output:
Now we can see what a SAT solver actually does. It takes a boolean expression and tries to find a ‘solution’. It tries to find a particular combination of literals
so all the clauses are true
. If it is able to find one, it stops and returns SAT
. When it is unable to find a valid assignment it returns UNSAT
, the problem can’t be solved.
Most modern SAT solvers will not only return UNSAT
, they will also return “proof” of the UNSAT, a listing of all the clauses that conflict with eachother and cause the UNSAT. This is very powerful and can be used to verify the correctness of the solvers.
In our case we have a valid SAT
result: 1 -2 -3
.
When we assign A=true, B=false, C=false
we satisfy our entire boolean expression.
How can we use this?
The example above is very abstract with A’s and B’s. Lets look at some other problems that can be solved by SAT solvers.
A lot of ‘problems’ can be turned into big CNF
statements which means a lot of problems can be solved by a general SAT solver! Which is kind of amazing to me, translate a hard problem into SAT/CNF and a (rather simple) general purpose solver can solve it!
Calling SAT solvers ‘rather simple’ isn’t a mistake. Sure, some implementations are very complicated pieces of finely tuned optimized software. But the algorithms and general ideas behind SAT solving are surprisingly simple. In the next coming blogpost(s), once I write them, I’ll show you how to make one in Java.
Most tutorials use a SAT solver to solve Sudoku puzzles. It is an easy problem to understand and it shows the power of SAT. It involves translating a Sudoku puzzle and all its rules into one big CNF file. Next the SAT solver will calculate a solution.
Some other (random) things SAT solvers can be used for:
- Sudoku
- N-queens
- Hardware/software verification
- Scheduling (sport events)
- Cryptanalysis
- Routing/planning problems
Peaceable queens problem
Instead of doing what most tutorials do (writing a boolean expression for Sudoku’s) I want to do something different.
The challenge: Find solutions for a puzzle from a recent Numberphile video: Peaceable queens.
The problem is:
Given a N x N chess board, what is the highest amount of black and white queens you can place so they don’t attack each other?
How can we encode such a problem into one big DIMACS
file?
A chess board
First we need a way to describe our board. It is a chess board and can hold queens either white or black. So I decided to encode this using a variable for each square, first NxN white queens, next NxN black queens.
We don’t have to put anything in our DIMACS
file yet, but the SAT output
will contain these variables set to TRUE or FALSE.
White or black pieces
Now it is time to encode all the ‘rules’ of this puzzle. First a very simple rule: If a square has a white queen it CAN’T have a black queen.
Let’s take the first square, top left. If 1
is TRUE
we have a white queen, if 10
is TRUE
we have a black queen there. So we want either 1 or 10, not both. How do we encode a rule for this in our SAT input?
Simple right?
How does this work? For this clause
to be TRUE
we must have ‘NOT 1’ or ‘NOT 10’, so either one of them or both must be FALSE
. Excellent!
Rows, columns, diagonals
To have a valid solution we must add a lot more rules, for example a row can only have either a white queen or black queen or none.
How do we do this? Let’s look at row 1,2,3
/ 10,11,12
(the top row).
First we’re going to introduce two new variables
, I’ll just continue using free numbers, 20
and 21
. Those will represent the top row for white and black.
Let’s analyse this. The first clause
says we must have NOT 1
or the white row 20
is TRUE
. Next we state the same for NOT 2
and NOT 3
. So if we have a white piece on row 1 2 3
means 20
will be TRUE
. The same is done for black, if there is a black piece on the top row 21
will be TRUE
.
Now we can encode the final rule for the top row just like we did with the white/black queen on the same square:
This can be done for each row, column and diagonal.
As you can imagine, this will quickly become too large to write manually. So to enumerate all possibilities I wrote a piece of Java code to generate a DIMACS
file for me instead.
This happens a lot when using SAT solvers, most of the time you’ll want to create a piece of software to output all the rules in DIMACS
for a solver to solve.
At least N-queens
If we run the SAT solver with the generated file above we encounter a small problem. It will solve very quickly by placing NO queens at all. Very ‘peaceable’ indeed, no queens are attacking.
We must tell the solver that we want to have a certain number of queens set to true. This turned out to be harder than I thought. There are algorithms online that do at most N
booleans. I decided to implement the LTseq method from this scientific paper.
Having rules like this quickly explodes the amount of clauses needed. But LTseq
seemed to be the best one.
This is what I’ve ended up with:
How can we use at most N
for this puzzle? Well in case of N=10 we want to have both white and black to have at least
14 queens.
So I switched at least 14
of white and black squares TRUE
to be at most (N*N - 14)
of white and black squares FALSE
.
This is still not very good because with (NxN - 14) we have a very large k
, thus making a LOT of clauses. I have the feeling this can be done much better, but up to now I haven’t found a proper way. Perhaps forcing the sequential counter to overflow (instead of protecting it)? This didn’t seem to work for me. If you have SAT experience and know more about encoding problems, let me know!
I ended up doing the thing that works:
And it worked like a charm. I’m now able to ‘solve’ N=10 in under two seconds using modern SAT solver glucose. When I generate my ‘rules’ it comes down to about a staggering 17336
variables in 34518
clauses (far from optimal?).
But glucose is able to find a valid solution for N=10, k=14 in 1.2 seconds.
Below is the output where I’ve omitted most variables, we are only interested in the first 200 (10 x 10 x 2) that represent the board:
And if you change this back into a chess board we can see valid solution for N=10 with 14 queens each!
I’ve uploaded all the code for this series to GitHub and called it: Boolshit
You can find the code to generate the Peaceable Queens puzzle here.
I hope this gives you an idea on how powerful SAT solvers are and how you can encode a problem into DIMACS
format.
In the next blogpost we’ll write an actual SAT solver from scratch in Java! Although it won’t be a very fast one…