SAT solving: Introduction to SAT (part 1)

Most software developers have never heard about SAT solvers, but they are amazingly powerful and the algorithms behind them are actually very easy to understand. The whole concept blew my mind when I learned about how they worked. This will be the first post in a series about SAT solving where we’ll end up building a simple solver in Java later on.

In this first part of the series we’ll cover:

• Look at what SAT is
• Explain the SAT DIMACS format
• Solve a puzzle: Peaceable Queens

What is SAT?

SAT is short for ‘satisfiability’, what people mean when they say SAT is: Boolean satisfiability problem.

It turns out you can write a lot of ‘problems’ in a specific way as a special boolean formula. These problems use boolean `variables` that can be either `true` or `false`. When a variable is assigned true or false, it is called a `literal`.

An example of a boolean expression:

`(A or B or ~C) and (B or C) and (~B) and (~A or C)`

Conjunctive normal form (CNF)

The statement above is written in a particular way, it is called the `conjunctive normal form` (or CNF). All boolean statements can be rewritten/transformed into this form.

First we’ll need to learn some more terminology. The variables have a label here, for example `A`. There is also the option to have a NOT A, this is written as `~A`.

The line above can be broken up into four pieces:

Each line is called a ‘clause’. Those clauses are connected together using `AND` statements. A `clause` can only contain variables (or negated variables) joined together by `OR`.

Because each clause will have `OR` between the `variables` and the end of the clause is always an `AND` statement, those things can be removed from the statement, turning it into:

DIMACS format

If we do one more step you’ll end up with a format called `DIMACS`, which is supported by almost all SAT solvers. The last step is to replace the name of variable `A` with `1` and `B` becomes `2` etc. The `NOT` statement is just writing the number negative, so `~C` becomes `-3`.

Now add a `0` to the end of each line (as separator) and add a header which explains the amount of variables (= 3) and the amount of clauses (= 4) and you get:

This is a valid input file for almost all SAT solvers. You can run this for example in minisat:

And print the output:

Now we can see what a SAT solver actually does. It takes a boolean expression and tries to find a ‘solution’. It tries to find a particular combination of `literals` so all the clauses are `true`. If it is able to find one, it stops and returns `SAT`. When it is unable to find a valid assignment it returns `UNSAT`, the problem can’t be solved.

Most modern SAT solvers will not only return `UNSAT`, they will also return “proof” of the UNSAT, a listing of all the clauses that conflict with eachother and cause the UNSAT. This is very powerful and can be used to verify the correctness of the solvers.

In our case we have a valid `SAT` result: `1 -2 -3`.

When we assign `A=true, B=false, C=false` we satisfy our entire boolean expression.

How can we use this?

The example above is very abstract with A’s and B’s. Lets look at some other problems that can be solved by SAT solvers.

A lot of ‘problems’ can be turned into big `CNF` statements which means a lot of problems can be solved by a general SAT solver! Which is kind of amazing to me, translate a hard problem into SAT/CNF and a (rather simple) general purpose solver can solve it!

Calling SAT solvers ‘rather simple’ isn’t a mistake. Sure, some implementations are very complicated pieces of finely tuned optimized software. But the algorithms and general ideas behind SAT solving are surprisingly simple. In the next coming blogpost(s), once I write them, I’ll show you how to make one in Java.

Most tutorials use a SAT solver to solve Sudoku puzzles. It is an easy problem to understand and it shows the power of SAT. It involves translating a Sudoku puzzle and all its rules into one big CNF file. Next the SAT solver will calculate a solution.

Some other (random) things SAT solvers can be used for:

• Sudoku
• N-queens
• Hardware/software verification
• Scheduling (sport events)
• Cryptanalysis
• Routing/planning problems

Peaceable queens problem

Instead of doing what most tutorials do (writing a boolean expression for Sudoku’s) I want to do something different.

The challenge: Find solutions for a puzzle from a recent Numberphile video: Peaceable queens.

The problem is:

Given a N x N chess board, what is the highest amount of black and white queens you can place so they don’t attack each other?

How can we encode such a problem into one big `DIMACS` file?

A chess board

First we need a way to describe our board. It is a chess board and can hold queens either white or black. So I decided to encode this using a variable for each square, first NxN white queens, next NxN black queens.

We don’t have to put anything in our `DIMACS` file yet, but the SAT `output` will contain these variables set to TRUE or FALSE.

White or black pieces

Now it is time to encode all the ‘rules’ of this puzzle. First a very simple rule: If a square has a white queen it CAN’T have a black queen.

Let’s take the first square, top left. If `1` is `TRUE` we have a white queen, if `10` is `TRUE` we have a black queen there. So we want either 1 or 10, not both. How do we encode a rule for this in our SAT input?

Simple right?

How does this work? For this `clause` to be `TRUE` we must have ‘NOT 1’ or ‘NOT 10’, so either one of them or both must be `FALSE`. Excellent!

Rows, columns, diagonals

To have a valid solution we must add a lot more rules, for example a row can only have either a white queen or black queen or none.

How do we do this? Let’s look at row `1,2,3` / `10,11,12` (the top row).

First we’re going to introduce two new `variables`, I’ll just continue using free numbers, `20` and `21`. Those will represent the top row for white and black.

Let’s analyse this. The first `clause` says we must have `NOT 1` or the white row `20` is `TRUE`. Next we state the same for `NOT 2` and `NOT 3`. So if we have a white piece on row `1 2 3` means `20` will be `TRUE`. The same is done for black, if there is a black piece on the top row `21` will be `TRUE`.

Now we can encode the final rule for the top row just like we did with the white/black queen on the same square:

This can be done for each row, column and diagonal.

As you can imagine, this will quickly become too large to write manually. So to enumerate all possibilities I wrote a piece of Java code to generate a `DIMACS` file for me instead.

This happens a lot when using SAT solvers, most of the time you’ll want to create a piece of software to output all the rules in `DIMACS` for a solver to solve.

At least N-queens

If we run the SAT solver with the generated file above we encounter a small problem. It will solve very quickly by placing NO queens at all. Very ‘peaceable’ indeed, no queens are attacking.

We must tell the solver that we want to have a certain number of queens set to true. This turned out to be harder than I thought. There are algorithms online that do `at most N` booleans. I decided to implement the LTseq method from this scientific paper.

Having rules like this quickly explodes the amount of clauses needed. But `LTseq` seemed to be the best one.

This is what I’ve ended up with:

How can we use `at most N` for this puzzle? Well in case of N=10 we want to have both white and black to have `at least` 14 queens. So I switched `at least 14` of white and black squares `TRUE` to be `at most (N*N - 14)` of white and black squares `FALSE`.

This is still not very good because with (NxN - 14) we have a very large `k`, thus making a LOT of clauses. I have the feeling this can be done much better, but up to now I haven’t found a proper way. Perhaps forcing the sequential counter to overflow (instead of protecting it)? This didn’t seem to work for me. If you have SAT experience and know more about encoding problems, let me know!

I ended up doing the thing that works:

And it worked like a charm. I’m now able to ‘solve’ N=10 in under two seconds using modern SAT solver glucose. When I generate my ‘rules’ it comes down to about a staggering `17336` variables in `34518` clauses (far from optimal?).

But glucose is able to find a valid solution for N=10, k=14 in 1.2 seconds.

Below is the output where I’ve omitted most variables, we are only interested in the first 200 (10 x 10 x 2) that represent the board:

And if you change this back into a chess board we can see valid solution for N=10 with 14 queens each!

I’ve uploaded all the code for this series to GitHub and called it: Boolshit

You can find the code to generate the Peaceable Queens puzzle here.

I hope this gives you an idea on how powerful SAT solvers are and how you can encode a problem into `DIMACS` format.

In the next blogpost we’ll write an actual SAT solver from scratch in Java! Although it won’t be a very fast one…