Generating de Bruijn sequences and Lyndon words
Not so long ago I encountered something called the de Bruijn sequence. For now I’ll only use this for an alphabet of (0,1), binary. But everything said here could also be applied to other alphabets. In this post I’ll describe what this sequence is, and how you can generate them, using Lyndon words.
What is a de Bruijn sequence?
Well, it is a sequence (again, in this case binary) which contains all combinations/permutations of a specific length. And it does this only once.
For example: B(2,4)
- 000100110101111 (000)
This sequence contains all possible permutations you can make in binary of length 4. The last part (000) is optional, and needed it you do not want the sequence looped. As you can see, the length of this sequence is 16. All sequences will have length 2^N.
How do you construct these sequences?
The next thing I wanted to know is how to construct these sequences. The first hit I got was from the MathWorld website. It states:
Surprisingly, it turns out that the lexicographic sequence of Lyndon words of lengths divisible by N gives the lexicographically smallest de Bruijn sequence (Ruskey).
It seems that the next step is generating Lyndon words. Lyndon words are the lexographically smallest rotation of a word. I haven’t yet found a proper way to do this (I know there are) so here is what I do:
And here is what I used to generate the smallestLyndonRotation (Java):
How does it work? Well, for example the input is: 0010.
It loops over all rotations (using some bit-masking):
And finds the smallest lexographically (0001) and returns this.
The above pseudo code will spit out the following numbers:
These are all the Lyndon words for N=4.
Splitting, reducing the Lyndon words
Now if we put the Lyndon words together we get:
- 000000010011010101111111 (our sequence)
- 0000100110101111 (de Bruijn sequence)
We aren’t quite there yet… but there is some resemblance. There is one more step left, we have to reduce/split the Lyndon words into its smallest unique part.
Lets take the Lyndon words again:
- 0000 -> 0 (4x)
- 0001 -> 0001
- 0011 -> 0011
- 0101 -> 01 (2x)
- 0111 -> 0111
- 1111 -> 1 (4x)
Results in: 0000100110101111, the sequence!
And that is it! If we’ve created the smallest possible de Bruijn sequence for B(2,4).
Here are some more sequences: