Music Matching (part deux)

After a comment on my previous blog entry (about creating a shazam clone) I started tinkering again.

Somebody asked: Could this be used to detect duplicate songs in my mp3 collection!?

That is exacly what I just tried!

The results

Here are some examples:

Duplicate found: 01 - everything in its right place.mp3.song matches with D:\data\v2\01-radiohead-everything_in_it’s_wrong_place-h8me.mp3.song and score: 134
(note: This is a remix of the original, with some rap mixed in)

Duplicate found: 01 - Joy Division - Exercise One.mp3.song matches with D:\data\v2\114 - Joy Division - Exercise One (From Still).mp3.song and score: 255
(note: Yes, duplicate!)

Duplicate found: 01 The District Sleeps Alone Tonight.mp3.song matches with D:\data\v2\The Postal Service - The District Sleeps Alone Tonight.mp3.song and score: 636
(note: Yes, duplicate!)

Duplicate found: 01-AudioTrack 01.mp3.song matches with D:\data\v2\06.Richard cheese -Id like a virgin- Butterfly.mp3.song and score: 144
(note: Yes, duplicate, the second was a radio snippet with jingle in the song)

Duplicate found: 01-editors-papillon_edit.mp3.song matches with D:\data\v2\02-editors-papillon_instrumental_edit.mp3.song and score: 382
(note: Almost a duplicate, the second is an instrumental version of the first)

Duplicate found: 01-the_strokes-you_only_live_once-ser.mp3.song matches with D:\data\v2\01-the_strokes-you_only_live_once.mp3.song and score: 450
(note: Yes, duplicate!)

Duplicate found: 01-the_wombats-backfire_at_the_disco.mp3.song matches with D:\data\v2\Backfire @ The Disco [Promo Version].mp3.song and score: 493
(note: Yes, almost duplicate, the second is a radio-promo announcement with jingle in it)

Conclusion

With a bit of tinkering this algorithm could be used to make a tool to detect duplicate songs, even if the mp3’s aren’t similair. Even live versions and instrumental versions are detected if you lower the threshold.