Levenshtein Distance Challenge: Causes

Levenshtein Distance Challenge: Causes

Today I’ve been playing around with the Levenshtein distance. The Levenshtein distance is a number which measures the ‘distance’ between two strings. For example, the distance between “test” and “rest” is one.

The challenge

A Levenshtein distance of one is the key element in a challenge I’ve been reading about. I first encountered it on williamedwardscoder’s blog.

The problem description:

Two words are friends if they have a Levenshtein distance of 1. That is, you can add, remove, or substitute exactly one letter in word X to create word Y. A word’s social network consists of all of its friends, plus all of their friends, and all of their friends’ friends, and so on. Write a program to tell us how big the social network for the word “causes” is, using this word list. Have fun!

Java solution (8.1 sec)

After some Googling and tweaking I decided to make an implementation based on the Trie structure. How this helps is excellently described by Steve Hanov. I’ve also had a peek in another Java based Trie implementation by Ximus.

I’ve been able to get the code below run in 8.1 seconds, which is pretty good. But I’ve read that there are Java implementations running in just 4 seconds…!? Maybe based on Levenshtein Automata?

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.LinkedList;
import java.util.List;

public class FriendlyCauses {

	public static void main(String[] args) throws Exception {
		FriendlyCauses s = new FriendlyCauses();
		s.find();
	}

	TrieNode[] root = new TrieNode[26];

	public void find() throws Exception {
		long timeBefore = System.currentTimeMillis();
		File wordList = new File("word.list");
		BufferedReader reader = new BufferedReader(new FileReader(wordList));

		int wordsAdded = buildTrie(reader);
		System.out.println("Done building the Trie; " + wordsAdded + " words added.");
		findFriends("causes");
		System.out.println("Amount of friends found: " + countWords(root));
		System.out.println("Total time: " + (System.currentTimeMillis() - timeBefore) + "ms");
	}

	/**
	 * This method builds the complete trie with all the words.
	 */
	public int buildTrie(BufferedReader reader) throws IOException {
		int wordsAdded = 0;
		while (reader.ready()) {
			String next = reader.readLine();
			int start = next.charAt(0) - 97;
			TrieNode current = root[start];
			if (current == null) {
				root[start] = current = new TrieNode();
			}
			for (int i = 1; i < next.length(); i++) {
				int nextIndex = next.charAt(i) - 97;
				if (current.children == null) {
					current.children = new TrieNode[26];
				}
				if (current.children[nextIndex] == null) {
					TrieNode deeper = new TrieNode();
					current.children[nextIndex] = deeper;
				}
				current = current.children[nextIndex];
			}
			current.word = next;
			wordsAdded++;
		}
		return wordsAdded;
	}

	private class TrieNode {
		// The children:

		TrieNode[] children;
		// If this node is the end of a word, store the word here:

		String word;
		// This boolean indicates that the current node is a friend.

		boolean isFriend;
	}

	// Temporary buffer of friends to investigate further:

	public List<String> friendsToFind = new LinkedList<String>();

	public void findFriends(String word) {
		friendsToFind.add(word);

		while (friendsToFind.size() > 0) {
			String friend = friendsToFind.remove(0);

			int size = friend.length();

			byte[] currentRow = new byte[size + 1];
			for (int i = 0; i <= size; i++) {
				currentRow[i] = (byte) i;
			}

			// Search the Trie for the given friend:

			for (int i = 0; i < 26; i++) {
				if (root[i] != null) {
					searchTrieForWord(root[i], i + 97, friend, currentRow);
				}
			}
		}
	}

	/**
	 * Walk the complete Trie again finding all the friends
	 */
	private int countWords(TrieNode[] ts) {
		int amount = 0;
		for (int i = 0; i < 26; i++) {
			TrieNode t = ts[i];
			if (t != null) {
				if (t.isFriend) {
					amount++;
					// Print/add t.word

				}
				if (t.children != null) {
					amount += countWords(t.children);
				}
			}
		}
		return amount;
	}

	/**
	 * Recursively walk the Trie calculating the Levenshtein distance to the
	 * given word. When a friend is found, mark it and add it to a list. The new
	 * friend needs to be processed further on.
	 */
	private void searchTrieForWord(TrieNode node, int letter, String word, byte[] previousRow) {

		int size = previousRow.length;
		byte[] currentRow = new byte[size];
		currentRow[0] = (byte) (previousRow[0] + 1);

		int minElement = currentRow[0];

		for (int i = 0; i < size - 1; i++) {
			int newCurrentRowValue = currentRow[i] + 1;
			newCurrentRowValue = Math.min(newCurrentRowValue,
					previousRow[i + 1] + 1);

			if (word.charAt(i) == letter) {
				newCurrentRowValue = Math.min(newCurrentRowValue,
						previousRow[i]);
			} else {
				newCurrentRowValue = Math.min(newCurrentRowValue,
						previousRow[i] + 1);
			}
			if (newCurrentRowValue < minElement) {
				minElement = newCurrentRowValue;
			}
			currentRow[i + 1] = (byte) newCurrentRowValue;
		}

		// If the Levenshtein distance is 1 and the node at current depth is a

		// word (and not yet friend):

		if (currentRow[size - 1] == 1 && node.word != null && !node.isFriend) {
			node.isFriend = true;
			friendsToFind.add(node.word);
		}

		if (minElement <= 1 && node.children != null) {
			for (int i = 0; i < node.children.length; i++) {
				TrieNode child = node.children[i];
				if (child != null) {
					searchTrieForWord(child, i + 97, word, currentRow);
				}
			}
		}
	}
}

Orchard Planting: Greatest Common Divisor

Orchard Planting: Greatest Common Divisor

The Orchard Planting contest from infinite search space is over. So it is time for a quick write-up.

The rules are simple, on a grid of integers, place N points on the grid to get as much 4 points on a line and never more then 4 points on a line.

My big break-through was when I figured out a way to improve the calculation speed of a solution, and make it possible to extend existing solutions (going back and forwards). To do this I used a unique vector (greatest common divisor vector) which is the same for all point on the same line:

//Calculate unique vector(hash):

doubleIntVector(int x1, int y1, int x2, int y2) {
    int stepx = x2-x1;
    int stepy = y2-y1;
    int gcd = gcd(abs(x2-x1), abs(y2-y1));
    if(gcd > 0) {
        stepx /= gcd;
        stepy /= gcd;
    }

    if(stepx < 0) {
        return (10000 * -stepx) + -stepy;
    } else if(stepx == 0 && stepy < 0){
        return -stepy; //this looks curious... but fixed a bug

    } else {
        return (10000 * stepx) + stepy;
    }
}

Now we can evaluate the points:

for point A in points 0....N {
    create list vectorsA 
    for point B in points A+1....N {
         vectorA.add( doubleIntVector(A.x, A.y, B.x, B.y) )
    }
    sort(vectorsA)
    //if vectorsA has three similair vectors you have 4 points on a line! 

    //(3> = invalid and 2 = candidate for extending!)

}

If a point has three vectors that are the same, we have a line with four points! This can be checked easily if you sort the vectors and go through them once.

Also adding and removing points becomes very easy. A lot of the GCD calculations can be cached. To remove a point, just remove the vectors it made. And to add a point, calculate all the new vectors. So in the end it basically all boils down to a lot of GCD calculations and sorting.

Was this the fastest way to calculate solutions in this contest? I don’t know, but I was really pleased when I figured it out. With a better algorithm for picking possible numbers (instead of hill-climbing) and some more processor power I bet I could have ended a bit higher up the hill.

Also: Keep an eye out for the next contest, it is going to be an interesting one! January the 13th.


Parleys: Devoxx talk published

Parleys: Devoxx talk published

The guys at Devoxx/Parleys have already processed all the talks and post-processed them. So my talk it now available at Parleys.com.
There is one drawback though, the talk is currently for subscribers only. If you don’t have one you can only watch the first two (very nervous) minutes.

If you’ve attended Devoxx you will get a free subscription in the email.

(p.s. The intro movie for all the Devoxx ‘11 talks is made by me as well!)


Neil DeGrasse Tyson and Stephen Colbert

Neil DeGrasse Tyson and Stephen Colbert

Most internet videos are a waste of time, but this one isn’t! This is an interview of Stephen Colbert (out of his normal character) with Neil DeGrasse Tyson (Hayden Planetarium director, TV science host).


The project manager and his pregnant wife

The project manager and his pregnant wife

Adding resources

Today I’ve (again) seen this quote on Twitter:

“A project manager is someone who thinks that 9 pregnant women can create a baby in 1 month”

This obviously isn’t the case. The story demonstrates that adding more people to a team won’t (necessarily) make the team more effective. Because some processes can’t be cut into smaller pieces and taken up by more people. It will even cause a bit of overhead, more opinions and thus, more time.

Our feature baby (for real)

What about one woman?

Everybody knows 9 women can’t create a baby in 1 month… so it isn’t fair to make that comparison. How about one pregnant woman and the project manager?

Four weeks into the ‘project’ the project manager has his monthly meeting with his pregnant wife:

Month #1
Project manager: “How is everything going?”
Wife: “Oh, just fine, no problems at all”
Project manager: “Is there anything I can help you with?”
Wife: “No, currently not, everything is going like it is supposed to”
Project manager: “When do you think this baby-project is going to be completed?”
Wife: “Well, it takes 9 months, so 8 months from now!”

Month #2
Project manager: “Hey, is everything still going fine?”
Wife: “Yes, still feeling fine.”
Project manager: “Is there anything I can help you with?”
Wife: “No, nothing that I can think of right now.”

Month #3
Project manager: “Hey, still working on that baby?”
Wife: “Yes, I’m starting to have a bit of morning sickness now…”
Project manager: “Oh, can I help you with that?”
Wife: “Not really, it is just something I have to live with”
Project manager: “How does this affect the release date?”
Wife: “It doesn’t, I think… but I really wish we could release a bit sooner!”

Month #4
Project manager: “Hey, still having that morning sickness issue?”
Wife: “A little bit, but it is almost gone now”
Project manager: “Still on track for the release date?”
Wife: “Yes, 5 months from now!”

Month #5
Project manager: “How is it going? Morning sickness issue resolved?”
Wife: “Yes, but I need icecream and pickles.”
Project manager: “I’m on it. Any updates about the release date?”
Wife: “ICECREAM!”

Month #6
Project manager: “Anything I can do for you right now?”
Wife: “Yes, I think you can start building that baby room”
Project manager: “When do you need it, when is the baby ready?”
Wife: “Three months from now.”

Month #7
Project manager: “The baby room is almost ready, how is that baby going?”
Wife: “Just fine, just very tired. Two more months.”

Month #8
Project manager: “I’ve got some more pickles for you!”
Wife: “You know I hate pickles :-( I’m tired…”
Project manager: “Sorry, how is the release coming along?”
Wife: “One more month.”

Month #9
Project manager: “Is the baby there yet?”
Wife: “No, not yet, I’m tired but fine”
Project manager: “But…? What about the release?”
Wife: “Go away.”

Month #9 (plus one week)
Project manager: “What is the delay? Where is my baby?”
Wife: “It isn’t here yet.”
Project manager: “How come? How did this happen!? I kept asking about the release date and you kept saying it would be done a week ago!?”
Wife: “Get out.”

Month #9 (plus two weeks)
Wife: “I’m ready! I’ve got this lovely baby for you.”
Project manager: “Great, where did it go wrong with that planning?”
Project manager: “That babyroom has been ready for weeks now, and no baby!”
Project manager: “How can we keep this from happening next time? Scrum? Lean? Agile? RUP?”
Wife: “Screw you.”

This story also has an important message in it, I’ll just leave that as an exercise for the reader.


Bytebeat: Algorithmic Symphonies

Bytebeat: Algorithmic Symphonies

A couple of days ago I first encountered “bytebeat”. This is a new hype revolving algorithmic music and sounds. The basic idea is this:

main(t) {
  for(t=0;;t++)
    putchar( t * (((t>>12) | (t>>8) ) & ( 63 & (t>>4))));
}

This simple loop has one variable, t (time). And every iteration we use t to calculate some output. Next, we take the output and pipe it into 8-bit 8-kHz PCM channel (for example /dev/audio!).

A couple of examples in a YouTube video:

In Java you could do the following:

import javax.sound.sampled.*;

public class AudioPump {

	public static void main(String[] args) throws Exception {

		AudioFormat format = new AudioFormat(8000f, 8, 1, false, false);
		DataLine.Info info = new DataLine.Info(SourceDataLine.class, format);
		SourceDataLine soundLine = (SourceDataLine)AudioSystem.getLine(info);
		soundLine.open(format, 32000);
		soundLine.start();

		byte[] buffer = new byte[8];
		int t = 0;

		while(true) {
			for(int n = 0; n < buffer.length; n++) {
				buffer[n] = (byte)f(t++);
			}
			soundLine.write(buffer, 0, buffer.length);
		}
	}

	private static int f(int t) {
		return (t>>7|t|t>>6)*10+4*(t&t>>13|t>>6);
	}
	//return (t*(t>>5|t>>8))>>(t>>16);

	//return t*(((t>>12)|(t>>8))&(63&(t>>4)));

}

These simple one-liners are able to produce amazing sounds and music!

It becomes even easier to play around with bytebeat if you use one of the many Javascript implementations.
Here is one (including the best song I’ve created): http://t.co/9oognysS

The computer/synthetic sounds are also good for dubstep.

For more information on this subject:
- http://canonical.org/~kragen/bytebeat/
- http://countercomplex.blogspot.com/2011/10/algorithmic-symphonies-from-one-line-of.html


Math doodling: Vi Hart

Math doodling: Vi Hart

As you might know I’m a big fan of math contests and basically just math in general. But only as long as it doesn’t involve long/hard equations! Math is all around us, we just don’t see it because we’ve been taught that math equals equations.

There is one person that keeps amazing me with her math related blog: Vi

vihart



Devoxx 2011: Speaking at Devoxx

Devoxx 2011: Speaking at Devoxx

I’m back home from Devoxx, I’m still alive (kind of) and it was great!

First of all, I had never been to Devoxx before, so it was all a new experience for me. And second, I was speaking at this conference, which is my first international talk! In this blogpost I’ll try to describe how it is, talking at Devoxx.

Before the talk, other sessions

There were a lot of good talks on Devoxx this year, mostly about HTML5, Android and new languages (like Clojure, Scala, Fantom, Ceylon). But to prepare for the talk I decided to stick with the sessions which had the well known international speakers (Joshua Bloch, Mark Reinhold, Brian Goetz, Chet Haase and of course the JavaPosse guys). Instead of learning new technical stuff I focused more on learning presentation skills instead.

The first day I also brought my camera with me, and created the following video (which is currently featured at the Devoxx.com frontpage):

Friday

My own session was planned on Friday, the last day of the conference, in the last slot of the day. Initially I didn’t like this spot (especially since it was the day after the Noxx party!), but when I saw the line of people waiting for popular sessions like Joshua Bloch’s and/or the JavaPosse I was very glad I wasn’t scheduled at the same time as them.

When I arrived at the venue Friday morning I noticed the room I was scheduled in was empty and being torn down… that didn’t look promising! But I quickly found out that ‘room 6’ was now in cinema room 4. And until the last moment the screens on the floor didn’t show my session and the session (from David Geary) before me. So I didn’t have a lot of hope, only my own colleagues would probably be able to find it…

Then suddenly David Geary was done with his talk (HTML5 Game Development: The most fun you can have in a browser), time flew by… time to go to the front of the stage and prepare the laptop. When connecting the laptop I saw a huge line of people, queuing to leave the room… sigh. So I went to the sound-guy, getting a microphone strapped to my head, and warned the cameraman that I have the habit of walking around like bored zoo animal. He was playing some World of Warcraft in between the sessions, so I challenged him: you can practice your aim on me instead the coming hour.

Then I looked back up at the audience, and to my pleasant surprise, the room was loaded. I quickly snapped this picture:
Devoxx audience

Wow! And it kept getting more and more crowded, in the end people were sitting on the steps because all the seats were taken. Then the timer was suddenly at 59:50, my talk had begun! The microphone was open and I started talking to the audience. Up to this point I wasn’t very nervous, but now I had to talk to all these people, in a foreign language (English). After the first joke landed, and the people reacted like I hoped, my confidence returned. During the talk I had to explain something with a bright orange stick, and before I knew it I was waving the stick around like a nerd jedi. Suddenly I remembered the poor cameraman, but one quick glance at the screen behind me and I could see he was doing an awesome job tracking my every move.

When the countdown reached 18 minutes, I was pretty much done with the talk, but I was anticipating a lot of questions and I had an anti-patent rant ready. For the next 13 minutes I answered a lot of good questions. Then I decided it was time, I took the liberty (being the last talk) to thanks the audience and tell them Devoxx was now over. Something like: “This is the end of our Devoxx for this year! Have a safe trip home, and hopefully I’ll see you all again next year!”

Afterwards a lot of people congratulated me, and I’ve seen some great tweets about the talk. It felt great and recommend it to everybody: Just do it!

The slides can be found here: http://www.slideshare.net/royvanrijn/what-shazam-doesnt-want-you-to-know

And the talk will be freely available (with video, slides and screen capture!) on Parleys.com some time in the future (could take a while, months).


Code quality: Confirmation

Code quality: Confirmation

Last week I wrote a blogpost on rewriting code and improving quality. And now we have some great news that will support my claims. To verify our project and convince management we are doing it the right way, we’ve send a copy of our code to TÜViT.

TÜViT

Most people from Europe will know TÜV. They test and certify cars, elevators and even nuclear power plants. They also verify IT security, quality, infrastructure, products, processes and their requirements. TÜViT is accredited by organisations and official bodies for the areas of IT security and IT quality.

The team that made it possible (and some management stealing our thunder)

HaMIS

Their verdict couldn’t have been better, last week we’ve received a banner with 5 out of 5 stars! Just 5% of the projects that are submitted to get tested get 5 stars. More information can be found in the official press release and in the Automatiserings Gids (Dutch).

Agile architecture and refactoring

For some reason getting these 5 stars makes me very proud. Normally I don’t care much about certification, they usually focus on the wrong issues. We know we are doing the right stuff, it feels good, and we’re making good progress. And this is just a perfect verification from an external source, this will also assure management we are on the right track.


Great code is written twice (or more)

Great code is written twice (or more)

code
The last couple of years more and more people have been moving towards Agile development. These techniques aren’t new, most we’re devised in the 80s or 90s. But finally these days programmers and (more importantly) business consultants, architects and clients have learned to love and embrace Agile development.

Evolving requirements

It has now become common knowledge that you can’t write down all the requirements before you start the project. These requirements have to be written down in an evolutionary matter. In short cycles/sprints we build pieces of the program and also iteratively extract the latest requirements from our clients. This is all based on the principle of evolution. Like everything in life, you get the best with incremental steps towards improvement.

Evolving code!

But why stop there? Today most programmers have developed a mindset in which requirements have to be extracted evolutionary. But they seem to forget their own work!? They still think their frameworks and their architecture has to be solid at the start of their project. Also, once code is written, it is done.. right?

Wrong. In my experience all great code is written at least twice. The first time you are too busy interpreting the requirements and figuring out what it should do. Sure, when you see certain patterns, we know that we need to extract methods and shift responsibilities around. The code you’ll end up can be pretty good, but it isn’t great.

In our current project, almost all important features have been rewritten from scratch multiple times. And slowly but surely the code is becoming better and better. Once you’ve made the 3rd or 4th addition to a certain piece of code, or you’ve fixed another bug, you’ll know where the code smells. You’ll feel like avoiding that piece of code, and you are glad when you are done fiddling with it. What do I do when I get this feeling? I delete the code.

But… but… then you have to start all over again!?

Wrong again! Sure, there is nothing left in the IDE, the code is gone, maybe some tests survived. You do have a solid understanding what the code should do. You also know what the code looked like before, so you know where it hurt/smelled in the past! With this knowledge you’ll write much better, maybe even great code! Sure, we could have kept the code and played with it some more, refactoring methods and responsibility… but there is nothing like grabbing the opportunity to start over again, and do it better.

Again, like all things in life: To make something great, it has to go through multiple evolutionary iterations. This is true for your requirements, your architecture and also your code.

Doing it twice, twice as slow?

When I tell people that in my opinion all code needs to be written at least twice, they are afraid the project will take double the time. But this is far from true. Here is why:

  1. The second time you write the code, it’ll only take a fraction of the time it took initially.
  2. After rewriting the quality of the code is significantly higher, improving maintainability, extendability and also programming velocity.

Good luck, keep rewriting and evolve your code! And if you think rewriting doesn’t work… we have proof!


Programming Contest: Orchard Planting

Programming Contest: Orchard Planting

Yes! Another new programming contest: Orchard Planting

The problem:

On a grid you have to place trees (dots). If there are four trees in a line, you’ll get a point. But: more then 4 trees in a line is illegal. What are the best solutions for N=11 to 60 trees?

Up to now I haven’t found a good method of constructing solutions yet, but I do have a very fast scorer which is a bit faster than O(N^2). There are some ideas brewing, which I obviously won’t disclose here (maybe after the contest).

I’m looking forward to competing with you, good luck all!


Contacts lost after update to IOS5 (iPhone)

Contacts lost after update to IOS5 (iPhone)

Yesterday I decided to (finally) upgrade to IOS 5. This new firmware for the iPhone/iPod Touch and iPad adds a lot of new features, including iCloud, iMessage and more.

icloud

At first the upgrade seemed to be a huge success, everything worked and I loved the fact that all my apps were being updated runtime, all at once. But then I realized that all my contacts were gone! I could see all the names in the ‘missed calls’ section, but it said “No contacts” when calling/texting. I tried the “Recovery” option, but nothing seemed to help…

Until I tried this:

  • Go to Settings
  • iCloud
  • Disable ‘Contacts’ (slide)
  • At this moment, all the contacts on my phone appeared again!
  • Re-enable ‘Contacts’
  • Now iCloud asks if you want to combine the results, yes we do!

This restored all my contacts and synchronized them with iCloud. It seems that (for some reason) the initial merge with iCloud failed or something.
If you have this problem too, please let me know if this fix helps.


Meet me at Devoxx 2011

Meet me at Devoxx 2011

This year I’ll be presenting at Devoxx in Antwerp, Belgium. In my opinion it currently is the best Java conference in the world. It’ll be my first talk in English.

Devoxx

The titel of the talk is: What Shazam doesn’t want you to know.

During the talk I’ll be explaining how Shazam works and how you can recreate this in Java yourself. But that’s not all.. I’m also including the grand challenge of teaching everybody how the Fourier Transformation works, without going into math (I’m math-illiterate).

Wish me luck, I’m really looking forward to it!

Are you coming to Devoxx? Or do you have hints/tips for me, let me know!


Bitcoin mining

Bitcoin mining

A couple of months ago I’ve started using and mining bitcoins. If you don’t know what bitcoin is yet, please watch the following video:

Mining for bitcoins has become harder and harder. But as a pro, the price of the bitcoin has also increased quite a lot. Since I’ve started mining I’ve mined about 40 bitcoins (in pools). This is now worth about 9 dollars per bitcoin: 360 dollars!

Mostly I’ve used my GPU for mining, this gives me the greatest speed. I found the Diablo Miner works best in my case, I even pushed some improvements into the codebase to let it work through proxies. But since a couple of days I’ve had a bitcoin miner in Java applet form running on this website! This is currently giving me about the same amount of bitcoins. Still, it is only about 0.01 bitcoin per day. Which comes down to 10 dollar cents a day. But still this is more then I earn for ads!

I’ve removed the bitcoin applet from my main page (it can scare visitors if the CPU runs to 100%). But you can still use it on this website:

SCRIPT DISABELED

So, don’t hesitate to hang around here for a while, with this page open :-) Lets see how much bitcoins my visitors can generate!

Oh, to see the current ‘exchange rate’ of the Bitcoin, you can always view it on Mt Gox.