Music Matching (part deux)

After a comment on my previous blog entry (about creating a shazam clone) I started tinkering again.

Somebody asked: Could this be used to detect duplicate songs in my mp3 collection!?

That is exacly what I just tried!

The results

Here are some examples:

Duplicate found: 01 - everything in its right matches with D:\data\v2\01-radiohead-everything_in_it’ and score: 134
(note: This is a remix of the original, with some rap mixed in)

Duplicate found: 01 - Joy Division - Exercise matches with D:\data\v2\114 - Joy Division - Exercise One (From Still) and score: 255
(note: Yes, duplicate!)

Duplicate found: 01 The District Sleeps Alone matches with D:\data\v2\The Postal Service - The District Sleeps Alone and score: 636
(note: Yes, duplicate!)

Duplicate found: 01-AudioTrack matches with D:\data\v2\06.Richard cheese -Id like a virgin- and score: 144
(note: Yes, duplicate, the second was a radio snippet with jingle in the song)

Duplicate found: matches with D:\data\v2\ and score: 382
(note: Almost a duplicate, the second is an instrumental version of the first)

Duplicate found: matches with D:\data\v2\ and score: 450
(note: Yes, duplicate!)

Duplicate found: matches with D:\data\v2\Backfire @ The Disco [Promo Version] and score: 493
(note: Yes, almost duplicate, the second is a radio-promo announcement with jingle in it)


With a bit of tinkering this algorithm could be used to make a tool to detect duplicate songs, even if the mp3’s aren’t similair. Even live versions and instrumental versions are detected if you lower the threshold.