Generating de Bruijn sequences and Lyndon words

Not so long ago I encountered something called the de Bruijn sequence. For now I’ll only use this for an alphabet of (0,1), binary. But everything said here could also be applied to other alphabets. In this post I’ll describe what this sequence is, and how you can generate them, using Lyndon words.

What is a de Bruijn sequence?

Well, it is a sequence (again, in this case binary) which contains all combinations/permutations of a specific length. And it does this only once.
For example: B(2,4)

  • 000100110101111 (000)

This sequence contains all possible permutations you can make in binary of length 4. The last part (000) is optional, and needed it you do not want the sequence looped. As you can see, the length of this sequence is 16. All sequences will have length 2^N.

How do you construct these sequences?

The next thing I wanted to know is how to construct these sequences. The first hit I got was from the MathWorld website. It states:
Surprisingly, it turns out that the lexicographic sequence of Lyndon words of lengths divisible by N gives the lexicographically smallest de Bruijn sequence (Ruskey).

Lyndon word..?

It seems that the next step is generating Lyndon words. Lyndon words are the lexographically smallest rotation of a word. I haven’t yet found a proper way to do this (I know there are) so here is what I do:

Pseudo:

lastLyndon = -1
for all possible N's {
    currentSmallestLyndon = smallestLyndonRotation(N)
    if( currentSmallestLyndon > lastLyndon ) {
        lastLyndon = currentSmallestLyndon
        print currentSmallestLyndon
    }
}

And here is what I used to generate the smallestLyndonRotation (Java):

private BigInteger smallestLyndonRotation(BigInteger input, int size) {
	BigInteger lowestForm = input;
	BigInteger mask = BigInteger.ONE;
	for(int i = 1;i<size;i++) {
		BigInteger possibleLowerForm = (input.and(mask).shiftLeft(size-i)).or(input.and(mask.negate()).shiftRight(i));
		mask = mask.or(mask.shiftLeft(1));
		if(possibleLowerForm.compareTo(lowestForm) == -1) {
			lowestForm = possibleLowerForm;
		}
	}
	return lowestForm;
}

How does it work? Well, for example the input is: 0010.
It loops over all rotations (using some bit-masking):

  • 0010
  • 0100
  • 1000
  • 0001

And finds the smallest lexographically (0001) and returns this.

The above pseudo code will spit out the following numbers:

  • 0000
  • 0001
  • 0011
  • 0101
  • 0111
  • 1111

These are all the Lyndon words for N=4.

Splitting, reducing the Lyndon words

Now if we put the Lyndon words together we get:

  • 000000010011010101111111 (our sequence)
  • 0000100110101111 (de Bruijn sequence)

We aren’t quite there yet… but there is some resemblance. There is one more step left, we have to reduce/split the Lyndon words into its smallest unique part.

Lets take the Lyndon words again:

  • 0000 -> 0 (4x)
  • 0001 -> 0001
  • 0011 -> 0011
  • 0101 -> 01 (2x)
  • 0111 -> 0111
  • 1111 -> 1 (4x)

Results in: 0000100110101111, the sequence!

And that is it! If we’ve created the smallest possible de Bruijn sequence for B(2,4).

Here are some more sequences:

  1. 01
  2. 0011
  3. 00010111
  4. 0000100110101111
  5. 00000100011001010011101011011111
  6. 0000001000011000101000111001001011001101001111010101110110111111
  7. 00000001000001100001010000111000100100010110001101000111100100110010101001011100110110011101001111101010110101111011011101111111
  8. 00000000100000011000001010000011100001001000010110000110100001111000100010011000101010001011100011001000110110001110100011111001
    00101001001110010101100101101001011110011001101010011011100111011001111010011111101010101110101101101011111011011110111011111111
  9. 00000000010000000110000001010000001110000010010000010110000011010000011110000100010000100110000101010000101110000110010000110110
    00011101000011111000100011000100101000100111000101001000101011000101101000101111000110011000110101000110111000111001000111011000111
    10100011111100100100101100100110100100111100101001100101010100101011100101101100101110100101111100110011100110101100110110100110111
    10011101010011101110011110110011111010011111110101010110101011110101101110101110110101111110110110111110111011110111111111