Testing a chess engine

Testing a chess engine

About a week ago I decided to try and write a chess engine. I’ve encountered bitboards before, and I really liked working with them. Most references I found had to do with chess engines, so I decided to have a go.

The single most important and time consuming aspect of building a chess engine is legal move generation. In all situations, be able to generate all legal moves that can be made on the board. At first this seems pretty straight forward, all pieces can move and attack in certain ways. But when you get to specific rules like castling and en-passant things get really tricky.

But how do you know for sure your engine works and gets the right results? The many chess engine developers around the world found a great solution for this problem. Something I can only describe as universal integration-tests! They call it “perft” (from performance test). The first thing they do it create a particulair situation on the chess board. This can be described like in FEN notation.
For example:

rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
    (initial chess starting position)

All chess engines use this small language and understand how to set up their internal board. The next step is to generate all possible moves. From the example FEN above all possible moves are just 20 moves. But why stop here? From these 20 moves calculate all legal moves the opponent can make (also 20) which results in 400 moves. Continue doing this and you’ll end up with a table like this:

Depth
Nodes
Captures
E.p.
Castles
Promotions
Checks
Checkmates
1
20
0
0
0
0
0
0
2
400
0
0
0
0
0
0
3
8902
34
0
0
0
12
0
4
197281
1576
0
0
0
469
8
5
4865609
82719
258
0
0
27351
347
6
119060324
2812008
5248
0
0
809099
10828

As you can see, the numbers in these tables quickly become huge. This will ensure that a lot of situations are tested. Using these tables you can check if your program outputs the same values, and thus complies to all the rules. There are a lot of these tables online, using a FEN for starting position, and a table with all nodes that can be generated.

The next problem is, when your numbers don’t add up, how do you find the move 6 levels deep that goes wrong? Well, some engines can help you with that. Personally I used ROCE (Roman’s Own Chess Engine). His engine has a “divide” function. First you set the board in a certain position using the FEN, and then you call the divide function, for example “divide 6”. Now it shows a table like this:

move : nodes
    e2e4 1249
    e2e3 2356
    f2f3 5356
    f2f4 6436
    (etc)

This lists each node at level 1, and after that each number of nodes it results in at depth 6. If you also have this function in your own chess engine you can compare the numbers. Now you can pinpoint which of the first nodes contains the error. Do this move and try divide 5. This will guide you all the way to the specific move that is (or isn’t) created! This was a huge help, and I love the way chess engine developers devised a way to have a kind of universal integration-tests which will point out the most commonly made bugs. You can take these tables, load them in a automatic test and keep running them every nights to see if things are still working like it should.

How did my engine end up? Well, it worked perfect in the sense that it could generate all the good legal moves. Then I added a simple evaluation function and it could play chess. After that I implemented a simple search algorithm (alpha-beta using negamax) and it could beat two simple other chess engines and myself. And of course, after a week, I lost interest again and writing a blogpost about it is usually the nail to the coffin of most of my projects.

So, what should I tackle next? I’m looking forward to having a new AZsPCs competition, but I think this might take a while…

Update: A couple of readers have pointed out that these tests are obviously not unit tests, but rather integration or acceptance tests. You are completely right. I’ve called them Unit tests because I used JUnit and they run in the automatic build. But they do test integration..!