Yesterday a friend of mine pointed me to the following thread:
Giving.github is a project that tries to bring people together. There are a lot of programmers looking for fun projects in the world, and so are scientists/charities and other people with problems!
Be sure to check it out, maybe you can help somebody as well.
Mara-Naboisho Lion Project
One of the problems posted to giving.github has something to do with Lion collars. The Mara-Naboisho Lion Project has a couple of collars and some files, but no idea what the data inside the files mean. Their initial description was:
The Mara-Naboisho Lion Project has asked for our help decoding and plotting the positions of lions fitted with GPS collars.
They receive SMSes from the collars:
These are presumably SMS PDUs, although simple online decoders don’t seem to make much sense of them. There’s the ASCII text “IRIDIUM” in them, presumably being the SMSC or sender or something string, as IRIDIUM is a satellite telephone system and presumably the bearer when the collars send up their positions.
What format is it, and how do we decode it?
Not SMS PDU
This is where I started analyzing, and I quickly ruled out that the hex-code is pure PDU (which can easily be translated). It seems to be some dialect/own protocol.
The first major breakthrough was realizing that there is one big header that both files have in common, and the real message starts with 0A. And the two messages after that have a repeating pattern:
This sure adds some structure to the files. I also guessed the first file has actually 6 entries and the second file has only 5 entries.
First field deciphered
The second breakthrough came through Twitter. I posted this problem in a tweet and after a couple of hours I got a reply that there was progress on the stackoverflow question.
This was DStibbe’s reply:
Perhaps the ‘13’ is not part of the headers but the message instead?
The messages would then be:
It does seem like a log , with the message for Aug 06 (guessing) has 6 entries and the second from Aug 07 having 7.
What would a collar record? I’m guessing longitude,latitude and timestamp?
Chances are high that the first eight digits represent the timestamp. They are incremental from bottom to top.
Eg. 13ef348a = 334443658.
334443658 seconds = 10 years, 8 months, 6 days, 23:00:58
This matches the date in the title of the first sms: Collar07854_100806 210058.SMS
It even matches the timestamp minus 2 hrs in the title!
At this point the progress stopped a bit, we couldn’t get the other bytes translated. There seemed to be certain integer values: 0039 d0ed for example. This probably isn’t floating point (which we thought it would be, latitude/longitude/altitude) because floating point numbers have the exponent in the first 11 bits. When a number starts with 00’s it doesn’t have the exponent part filled.
So we asked the project for some more information, for example about the recorded data (where/when) and the manufacturer of the collars.
They gave us this KML file, and this company name: Vectronic-Aerospace.
Lat/lon, I’ve got you!
The biggest part we wanted to decipher was the lan/lon location (that is what the researchers want to have). I came to this breakthrough after reading a PDF file from Vectronic-Aerospace’s website. It has some screenshots from their original software (why aren’t they using that free software?!). In these screenshots I noticed integer coordinates very close to the integers I’ve managed to get from the hex-data! It was called ECEF data, this is X,Y,Z data! Until now my guess was that the SMS files had lat/lon coded in binary, but it has XYZ information.
After a quick google I managed to find some code to translate from ECEF coordinates to lat/lon and it gave me the following location:
When plotted in Google Maps it gave me a location in Germany, at first I thought it was another dead end, but then I looked here. It turns out it was the EXACT location of Vectronic Aerospace, the creators of the collar! Problem solved.
There are still some mysteries, for example… what are the last bytes? Looking at the screenshots in the PDF I suspect it would be main battery voltage, backup voltage and temperature. The project also probably wants to integrate the collar files with the website. And there are of course a lot more giving.github.com problems to solve!
With the sudden surge of visitors from Reddit (Hello!) I decided to take look at Google Analytics. And two things caught my eye:
First of all, Google Analytics ‘Realtime’ is awesome, you can see minute by minute how many people are on your website and what they are looking at, where they come from (internetwise and geographically).
Second, my website gets the right type of visitors, I’m so proud at you guys and girls…
I fired up the profiler to see what was causing the Java code to be so slow, and it turned out the method it spend most time in was Math.pow(). Other slow methods were Math.acos(), cos(), sin() etc. It turns out that the Math library isn’t very fast, but there is an alternative, FastMath. Apache Commons has implemented a faster Math library for commons-math. Lets see what changing Math.* to FastMath.* does to the performance:
Warning: This isn’t the same as Math.pow/FastMath.pow!
The slowest method in the program now is FastMath.acos. From highschool I know that acos(x) can also be calculated as atan(sqrt(1-x*x)/x). So I created a own version of acos. When benchmarked, the different methods: Math.acos(), FastMath.acos() and FastMath.atan(FastMath.sqrt(1-x*x)/x), the result is again surprising:
The custom acos() function is a bit faster than FastMath.acos() and a lot faster than Math.acos(). Using this function in the Mandelbulb renderer gives us the following metric:
Well surprisingly in this case it is. With the code a 100% the same, using arrays as vector and Math.* the code actually runs faster in my browser!
Edit 2: People have been asking me: What could have been done to make it faster in Java? And, why is it slow?
Well the answer is twofold:
1) The Math libraries are made for ‘double’ in Java. Having a power() method work with doubles is much harder than working with just integer numbers. The only way to optimize this would be to overload the methods with int-variants. This would allow much greater speeds and optimizations. I think Java should add Math.pow(float, int), Math.pow(int, int) etc.
2) All the Math libraries have to work in all situations, with negative numbers, small numbers, large numbers, zero, null etc. They tend to have a lot of checks to cope with all those scenario’s. But most of the time you’ll know more about the numbers you put in… For example, my fastPower method will only work with positive integers larger than zero. Maybe you know that the power will always have even numbers…? This all means that the implementation can be improved. The problem is, this can’t be easily achieved in a generic (math) library.
This afternoon Will (the friend I mentioned in this previous post) showed up again on Google Talk.
If you have a bit of time:
The code can be found here: https://github.com/royvanrijn/mandelbulb.js
I’ve already included one improvement by Will himself, scanlines to smooth up the rendering (good for inpatient people!).
Last friday a friend of mine was talking about ray marching, the Mandelbulb and programming his own 3D fractal engine. He also kind of challenged me to do the same… So I picked up the challenge and set to work on my own 3D (CPU only) ray marching fractal engine (in Java). It was a very steep learning curve for a programmer with limited math knowledge, but I’m pretty pleased with the first results!
My first 3d engine (tm)
On sunday I had the first things ready, lighting (Blinn-Phong), soft shadows, but still I had no perspective build on (all rays travelled in the same direction) and I had no way to change the view point/camera position:
Next step was to render something other than spheres and cubes, and get the camera position under control, that breakthrough came monday evening:
My first mandelbulb! The perspective is still flat and it misses detail, we’ve also rendered through the camera/near field.
So I made some more improvements and managed to render a nice wallpaper for myself:
Yesterday I’ve been playing around with optimizing the code a little bit (less memory usage, already twice as fast as it was, but still slow). And I’ve added some glow and the ability to add ‘distance fog’. Also I’ve recorded my first movie, just to show some moving fractals:
(yes, it has a glitch with the pink mandelbulb, can’t be bothered to fix…)
This afternoon I was reading about rotoscoping. Rotoscoping is a technique where an artist takes a filmed movie and animates a drawn (cartoon) character frame by frame. The end result is highly realistic. The technique was first used in 1915’s with an ink cartoon called Out of the Inkwell and has been used in many cartoons since.
But what about video games?
It turns out that Prince of Persia was the first computer game to use rotoscoping. That game brings back a lot of good memories to me, and probably everybody my age.
After a quick search I found this little gem of a video:
<iframe type="text/html" width="640" height="390" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen src="http://www.youtube.com/embed/WAjRNU3DbSY"></iframe>
The person in the video is David Mechner. He is the younger brother of Jordan Mechner, the creator of Prince of Persia.
Jordan used the clips shown above to rotoscope the movements of the main character in the game.
I had never realised why I loved Prince of Persia so much in my youth. But now I think it is probably because of rotoscopy! The animations are so realistic and life like, it has brought a whole new level of realism into video games.
Here is another video of Jordan, showing even more rotoscoping in Prince of Persia:
<iframe type="text/html" width="640" height="390" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen src="http://www.youtube.com/embed/gC3WEwSJoHs"></iframe>
And the icing on the cake, a video about rotoscoping in Mortal Kombat 1:
<iframe type="text/html" width="640" height="390" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen src="http://www.youtube.com/embed/CUBAKk64FS8"></iframe>
A couple of days ago I’ve received my Arduino Mega in the mail. Together with a breadboard, some plug-wires and Arduino-starters kit (with some resistors, capacitors, and a couple of sensors).
The Arduino is a handy little board with micro controller. It allows you to connect your computer using USB to the micro controller and easily program and upload the program to the board. I’m a very experienced programmer with almost no experience in electronics.
I know what a resistor is and what it does, but I have no idea how/when to use which resistor. But I’d love to learn more, maybe even build some toy robots in the end. A final goal would be to have a working humanoid with 20+ servo’s, but that goal might be a bit out of reach…
The first thing I’ve done after unboxing the Arduino was plugging it in my laptop and browsing to the arduino.cc website. There you can download the Arduino IDE.
In the ‘Learning’ section (http://arduino.cc/en/Tutorial/HomePage) you’ll find everything you need to get started. There are diagrams on how to wire everything into the breadboard, and there are also code examples.
Some stuff I made using the ‘Learning’ page:
- Blinking LED (the hardware Hello World, yay!)
- Slowly fading blinking LED
- Knight rider, KITT LED bar
- Tiny keyboard with 5 buttons
- Light reactive theremin
And some projects that aren’t on the example page, but could be made using the things I’ve learned:
- Extended keyboard with 5 buttons and LEDs!
- Light reactive LED bar (showing the amount of light in LEDs)
- Knock-sensor, LED shows when it ‘feels/hears’ a knock
All the above was made in a single evening! And there is much much more that you could make with the Arduino. I haven’t even started using servos and/or DC motors. The only problem left is lack of imagination!
For a programmer the code is very easy to understand, and the Arduino Programming Language used is closely related to the C/Java family.
The language has borrowed a lot of syntax from C, but there are some tricky issues (in the eyes of a programmer). For example:
This little piece of code above gives unpredictable results because ‘stringThree’ never got an initial value before you started concatenating different data types. Instead you’ll have to do:
Weird…! But I really suggest trying out the Arduino, give it a chance and you’ll probably like it and learn a lot about electronics and microcontrollers.
One of the things he mentioned was: If you don’t use caching, you are an idiot.
Where do websites cache?
There are multiple tiers where caching of websites is done, and is useful.
The best cache you can have is the cache inside the browser. If a website knows it has the latest version, it can just read it from disk. There is absolutely no reason to go online.
The second type of cache would be the proxy cache. As you would have guessed this is a proxy and it does caching. It sits between the user/browser and the internet gateway. This cache sees all the requests and stores pages that can be cached. If another user requests a webpage that hasn’t changed it can provide the page instantly.
Reversed proxy cache
You could also have a cache between the internet and the content providing server. If the server processes the request it might need to access databases and maybe other slow resources to build up the webpage. The resulting page can than be cached on the providing side in a “reverse” proxy cache. All subsequent requests can just be provided from the cache, as long as the page is still fresh.
Making pages cacheable
If you maintain a website, or you create web applications, you should be aware of caching. After Stefan’s rant, I’m completely convinced about that. If you don’t do anything all the requests will always go into the server and over the internet. There are HTML ways to control caching (META-Tags etc) but this just doesn’t work, and shouldn’t be used (!). So what could we do?
When sending a page back to the user you are able to set some HTTP headers. And “expires” is one of them.
This indicates that the current page is valid until the timestamp. Then it ‘Expires’. Easy!
The only problem is generating the timestamp, it can be a bit tricky. Also you’ll have to be sure you’ve set the time correct on your system. Also, the next time you update the page, you have to also update the timestamp!
With HTTP 1.1 there is a new class of headers called “Cache-Control”. These headers are more powerful than the Expires header.
To enable caching using Cache Control headers you can set:
The “max-age” is time in ms that the current page is valid. And by adding “must-revalidate” we tell the cache it should obey our max-age. If you don’t want an object to be cached you can use:
Refreshing cached data
The two methods described above will tell the cache if the content is cacheable. But what happens when the max-age or Expires timestamp expires? There are smarter ways to update the cache instead of getting the latest content from the server.
Websites should always set the response header called “Last-Modified”. This is a timestamp of the moment a webpage last changed.
When a cache has expired (max-age or Expires) and has to get a new version from the server it can set the request header “If-Modified-Since” and include the timestamp.
If the content on the server hasn’t been changed it’ll reply “304 Not Modified”. The cache can now keep the cached version.
With HTTP 1.1 there is also an improved method of doing the “Last-Modified”. Instead of using a timestamp (which is error prone), they’ve introduced the “ETag”. This is a tag that is completely customisable. Most of the time it will just be a hash of the content. The server sets the ETag as response header:
When a cache can no longer use the cached version (due to max-age or Expires) is will ask the server:
The term “If-None-Match” isn’t very clear, but is means “if-etag-changed-since” and works the same way as “If-Modified-Since”. When the ETag is the same the server will reply “304 Not Modified”, it won’t send the content back.
When you are working on a web application you could just add an ETag which is the MD5 of the returning content. If the content is the same, you don’t have to send the content over the line. The only drawback to this method is that you still need to generate the entire reply to calculate the MD5 hash to see if the content has changed…! But sometimes you’ll know in advance if the content has been changed.
I’m using WordPress and I’ve found the excellent plugin “WP Total Cache”.
It will involve a bit of tweaking, because only you can decide which stuff should be cached. But I think it worked out great, press F5 right now and you’ll probably be reading this from the browser cache.
The last two days I’ve been competing in a competition called Ludum Dare.
This is a short, 48 hour, contest. In this time you have to build an entire game based on a theme given at the start of the 48 hours. It is a good exercise is planning, scaling, hacking, imagining and just having fun! I really enjoyed it, and recommend you join LD24 four months from now.
For this game I decided to stick with Java. To make it playable for as many people I decided to make an Applet. It can easily become a standalone app, or maybe an Android app…!
I loved the old point-and-click games, from Dirty Larry to The Day of the Tentacle, from Monkey Island to Gobli(iiii)ns. So that was settled.
The big pro:
Not a lot of physics or game code.
The big con:
I’d have to brush off my paint skills because point-and-click adventures are filled with graphics and animation!
One big factor in games is music, and for this contest I took some midi control code I made some years ago. This was turned into a procedurally generated music generator. Every time you play you’ll hear something new.
With visitors coming to see our little baby girl on Sunday I decided to end early.
Here is my result, have fun playing the game: Itty-bitty botty!
The Real Katie
Today I stumbled upon the following blogpost:
The Real Katie - Lighten Up
Katie talks about the sexist jokes and remarks she regularly gets in the IT/programming world, and she is sick and tired of hearing “Come on, lighten up”.
The post is moving and shows how easy it is to offend people, not by a single remark, but by hundreds of similar remarks heard before.
Not an IT problem
There is one point I don’t agree with though. I don’t think it is fair to call this an IT/programmers problem.
Let me explain:
Obviously there are a lot of jerks, assholes and plain rude people around. Most of them are men, some are women. They pick on easy targets, the minorities. Sometimes the minority is a heavy male co-worker, sometimes it is the rare female programmer.
I fully agree, we should call the bullies out more. We all should do something about this problem. Her blogpost has re-opened my eyes again to that problem. But this isn’t a IT problem… it is a minority and rude people problem, a global social problem. It happens in all professions. Katie is just unlucky to be the minority in the field of work she loves.
More women in the IT
Also, I do agree that we could use more women in the IT world. At a young age we should teach boys and girls that there is no such thing as boy-jobs and girl-jobs, and both should learn the joy of programming! If we do that the problem of women being a minority in the IT world will disappear.
BUT that won’t solve the global social problem of assholes picking on minorities.
That is something we are all responsible for.
There will always be minorities and there will always be rude people (male or female). This is something we can’t change. You can however call them out and disapprove the behavior.
Our project is doing Scrum, and one of the main aspects of Scrum is having everything clearly visible. A great example is the scrumboard, a huge whiteboard filled with Post-It notes.
Post-It notes are perfect for this; small enough to be easy to handle; sticky enough so you can post them almost everywhere. I truly believe that without the Post-It note, Scrum wouldn’t be possible and probably wouldn’t even exist!
This all makes the real hero of the Agile movement: Arthur Fry. After somebody at 3M messed up a batch of glue, Arthur decided to add that glue to a piece of paper, creating the first Post-It notes.
This invention is much more important than the toilet, or wheel, or penicillin…! Celebrate and make the world aware of this unlikely hero, join us and celebrate Arthur Fry Day, this March 16th!
Moments ago this tweet caught my eye:
Devoxx 2011: "What Shazam doesn't want you to know!" by @royvanrijn is now freely available @ http://parleys.com/d/2869
That means everybody can now watch my talk without any subscription! If you want to learn how algorithms like Shazam work, be sure to watch this talk. It might be easier to understand than my blog post a year ago.
Without further ado:
Today I’ve been playing around with the Levenshtein distance. The Levenshtein distance is a number which measures the ‘distance’ between two strings. For example, the distance between “test” and “rest” is one.
A Levenshtein distance of one is the key element in a challenge I’ve been reading about. I first encountered it on williamedwardscoder’s blog.
The problem description:
Two words are friends if they have a Levenshtein distance of 1. That is, you can add, remove, or substitute exactly one letter in word X to create word Y. A word’s social network consists of all of its friends, plus all of their friends, and all of their friends’ friends, and so on. Write a program to tell us how big the social network for the word “causes” is, using this word list. Have fun!
Java solution (8.1 sec)
After some Googling and tweaking I decided to make an implementation based on the Trie structure. How this helps is excellently described by Steve Hanov. I’ve also had a peek in another Java based Trie implementation by Ximus.
I’ve been able to get the code below run in 8.1 seconds, which is pretty good. But I’ve read that there are Java implementations running in just 4 seconds…!? Maybe based on Levenshtein Automata?
The Orchard Planting contest from infinite search space is over. So it is time for a quick write-up.
The rules are simple, on a grid of integers, place N points on the grid to get as much 4 points on a line and never more then 4 points on a line.
My big break-through was when I figured out a way to improve the calculation speed of a solution, and make it possible to extend existing solutions (going back and forwards). To do this I used a unique vector (greatest common divisor vector) which is the same for all point on the same line:
Now we can evaluate the points:
If a point has three vectors that are the same, we have a line with four points! This can be checked easily if you sort the vectors and go through them once.
Also adding and removing points becomes very easy. A lot of the GCD calculations can be cached. To remove a point, just remove the vectors it made. And to add a point, calculate all the new vectors. So in the end it basically all boils down to a lot of GCD calculations and sorting.
Was this the fastest way to calculate solutions in this contest? I don’t know, but I was really pleased when I figured it out. With a better algorithm for picking possible numbers (instead of hill-climbing) and some more processor power I bet I could have ended a bit higher up the hill.
Also: Keep an eye out for the next contest, it is going to be an interesting one! January the 13th.
The guys at Devoxx/Parleys have already processed all the talks and post-processed them. So my talk it now available at Parleys.com.
There is one drawback though, the talk is currently for subscribers only. If you don’t have one you can only watch the first two (very nervous) minutes.
If you’ve attended Devoxx you will get a free subscription in the email.
(p.s. The intro movie for all the Devoxx ‘11 talks is made by me as well!)