Maybe it’ll be part of JDK 9, maybe it won’t… but people are working hard on creating a REPL tool/environment for the Java Development Kit (JDK). More information on the code and the project is available as part of OpenJDK: project Kulla.
Some of you might not be familiar with the terminology ‘REPL’, it is short for Read Evaluate Print Loop. It is hard to explain exactly what it does, but very easy to demonstrate:
The idea is that you have a place you can enter Java and run it, without main method, class structure. You can build instances, and alter them while you type. For example you’ll able to do Swing development while you type:
Now we have a visible frame, you can drag it around, resize it etc.
Suddenly our frame has a panel, and the panel has an empty button! You can prototype, do live coding and you have instant feedback.
Now the button has text, but pressing it still does nothing…
And there we go, a final simple lambda creates a working “Hello World!”-button.
It is also possible to load a repl script from file, allowing you to share, store and run scripts. This is done using the ‘/load’ and ‘/save’ commands. You can even ‘/dump’ all the created classes in a directory.
I’m very curious how people will be using the REPL in the future, some use cases:
If you want to try out Kulla, it took me literally 20 minutes to get up and running on my MacBook. Just follow the instructions on AdoptOpenJDK, but instead use http://hg.openjdk.java.net/kulla/dev/ as codebase. After building the JDK, go to ./langtools/repl and look at the README.
The post below is the content from my 2014 J-Fall and Devoxx Ignite presentations. You can check you the slides here:
We all do testing
In this day and age you aren’t considered a real Java developer if you are not writing proper unit tests.
We all know why this is important:
- Instant verification our code works.
- Automatic future regressions tests.
But how do we know we are writing proper tests? Well most people use code coverage to measure this. If the percentage of coverage is high enough you are doing a good job.
What is a test?
First let’s look at what a test actually is:
- Instantiate classes, setup mocks.
- Invoke something.
- Assert and verify the outcome.
Which steps are measured with code coverage? Only steps 1 and 2. And what is the most important thing for a test? It is the third and final step, the assertion, the place where you actually check if the code is working. This is completely ignored by code coverage!
I’ve seen companies where management looks at code coverage reports, they demand the programmers to write 80+ or 90+% coverage because this proves the quality is good enough. What else is a common thing in these organisations? Tests without any real assertion. Tests written purely to boost coverage and please management.
So code coverage says absolutely nothing about the quality of our tests? Well, it does tell you one thing. If you have 0% coverage, you have no tests at all, if you have 100% coverage you might have some very bad tests.
Luckily there is help around the corner, in the form of mutation testing. In mutation testing you create thousands of ‘mutants’ of your codebase. So what is this mutant you might ask? A mutation is a tiny singular change in your codebase.
For each mutant the unit tests are run and there are a couple of possible outcomes:
If you are lucky a test will fail. This means we have ‘killed’ our mutant. This is a positive thing, we’ve actually checked that the mutated line of code is correctly asserted by a test. Now we immediately see the advantage of using mutation testing, we actually verify the assertions in our tests.
Another possible outcome is that our mutant has survived, meaning no test fails. This is scary, it means the logic we’ve changed isn’t verified by a test. If someone would (accidentally) make this change in your codebase, the automatic build won’t break.
In Java (and other languages as well) there are frameworks for doing mutation testing. One of the most complete and modern frameworks for doing mutation testing in Java is called PIT. The generation of mutants and the process running the tests is fully automated and easy to use, just as easy as code coverage. There are Maven, Ant, Gradle, Sonarqube, Eclipse and IntelliJ plugins available!
What about performance?
Using mutation testing isn’t a silver bullet and it doesn’t come without any drawbacks. The major disadvantage is performance. This is the reason it never took off in the 1980s. At that time it would take an entire evening to run all your unit tests, so people could only dream of creating thousands of mutants and running the tests again. Luckily CPU’s have become a lot faster, and PIT has other tricks to speed up the process.
One thing PIT does is that it uses code coverage! Not as a measurement of the quality of your tests but as a tool. Before creating the mutants PIT calculates the code coverage of all unit tests. Now when PIT creates a mutant of a particular line of code it looks at the tests covering that line. If a line is only covered by three unit tests, it only runs those three tests. This greatly decreases the amount of tests needed to run for each mutation.
There are other tricks as well, for example PIT can track the changes in your codebase. It doesn’t need to create mutants if the code isn’t edited.
Code coverage is a horrible way of measuring the quality of your tests. It only says something about the invocations but nothing about the actual assertions. Mutation testing is much better, it gives an accurate report on the quality and you can’t ‘game’ the statistics. The only way to fake mutation coverage is to write real tests with good assertions.
Check it out now: http://pitest.org
Below is a build-log on how to build Commander Keen: Keen Dreams (which was recently released on Github) on OS/X using DOSBox and a shareware version found online.
Download and install DOSBox
Create a new folder, this will be your DOSBox mount-point.
Copy all the contents of the directories DISK1/DISK2/DISK3 from tasm5.zip to \TEMP
Install Borland C++ 3:
Copy all the contents to \BORLANDC.
Copy all the source files from Commander Keen to \KEEN.
Fire up DOSBox, mount the mount folder to C.
Put the following paths to PATH:
Go into C:\TEMP and run the installer to install TASM.
Go into directory C:\KEEN\STATIC and run ‘make.bat’
In the directory C:\KEEN and run ‘BC’ to start Borland C++.
To change the Borland directories to the correct path go to: Options > Directories and change the paths to C:\BORLANDC\*.
Compile and run! It creates the binary for me KDREAMS.EXE.
But sadly, when I run the executable it says “Can’t open KDREAMS.MAP!” :-(
It turns out you’ll need to own the game’s actual content before you can run this source code.
Thankfully a shareware version can be downloaded here: keendm.zip. This corresponds with the 1.01S version of the released source code, which is also released here.
Copy the missing files (SHL, MAP, AUD, etc) from the shareware version and play your own compiled Commander Keen!
Pretty interesting to see, I’ve already worked with him on improving the algorithm to generate these mosaics in the past. But next he set me a challenge: Find your own game thumbnail, it is in there somewhere!
This is the screenshot of my game, used as thumbnail:
So I went though the thumbnails, one time.. and a second time… then I decided to solve it like a real programmer:
How does it work? Well it is pretty simple:
- Input #1: mosaic.jpg
- Input #2: Amount of thumbnails width and height
- Input #3: screenshot.png
- The program resizes my game screenshot to the thumbnail size.
- Next it loops over all sub-images of the mosaic.
- For every sub-thumbnail: Calculate the error (+=Math.abs(mosaicPixelValue - screenshotPixelValue) for each color, for each pixel)
- Store the location of the thumbnail with the smallest error!
That is it, solved it in 10 minutes of coding!
(and another 10 minutes to make the visual conformation and make the animated gif).
Last weekend the 30th Ludum Dare competition took place. For those us you unknown with Ludum Dare, this is a very short international game programming contest. You are allowed to use any tool or language but there are strict rules:
- The theme is revealed at the start (and the game must match this theme).
- You get 48 hours, nothing more or less.
- Every image, sprite, song and/or sound effect in the game should be made within these 48 hour.
- The result is open source (but you pick the license).
- You work alone.
(There is also a ‘Jam’ version where you can work in teams, can do closed source, can use images/sounds and you have 72 hours)
The theme this year was ‘Connected Worlds’. This is a pretty broad term, so I started to think. How about a world where the main character is on one planet, and his love is on another planet. The planets are tantalisingly close (nearly touching) but out of reach. Our hero has to build a rocket to reach his love.
Drawing drawing drawing…
The decision to make a point-n-click game has a huge impact on how I get to spend my time during the contest. This type of game needs a lot of images, sprites and of course fun puzzles. In the end I think I’ve been busy drawing 90% of the time (on paper and in GIMP) and maybe 10% actual programming. After 48 hours my hands were cramped up from all the drawing instead of typing heh (I need a digital draw-tablet).
In the future somebody will inevitably invent a teleport, no doubt about that.
But how will it work?
The most likely way to teleport would be to digitalise yourself. Some yet undiscovered very high resolution MRI/CT scanner will scan every atom in your body and send this over to the receiver. This atom printer will build up your entire body again.
However, during a work lunch discussion, I came up with some scary fundamental problems with teleportation.
What would happen if, during transmission, we get a failure? We don’t want to end up with a failed teleport… which would mean the person getting teleported is dead.
That is why we need to have some kind of two-phase commit. First we digitalise the person, we send this over the line, build the person up on the other side. Once this process has been completed, we ‘delete’ the original copy. Because we don’t want to end up with thousands of clones.
Wait… what? Delete?
What would it feel like stepping into the teleporter? First a copy of you is made, this copy eventually walks out of the receiving end. But what happens to you? You’ll step into a machine, which makes a copy, and disposes you! You’ll be exterminated, killed, pushing up the daisies, your metabolic processes will be history, kicking the bucket, you’ll be an ex-parrot.
Lets not dwell too long on the loss of the old you. Of course YOU are also the one walking out of the teleporter, where nothing has happened but a successful teleport.
But is that really… you?
What defines you?
Are you just a selection of atoms clumped together?
If we make an exact copy, is that still you?
Did you know that (according to some research) every year almost 98% of the atoms currently in your body will be replaced? That would mean that a year from now, you will just be 2%… you!
Conclusion: I hope they won’t invent a teleporter while I’m alive.
The other thing I did while I was in London was volunteer and film at the first UK-based Devoxx4Kids.
Here is my video that sums it all up:
It was awesome being there, watching the kids play and learn at the same time. The volunteers were absolutely amazing (a lot of them!) and the atmosphere was very relaxed.
I’ve been filming again at Devoxx UK, here is the final cut:
Brings back so many great memories.
Including the night in the pub when ‘we’ won against Spain, 5-1
A year ago Devoxx crossed an ocean for the first time. After all the events in Belgium (Antwerp) and France (Paris) a new satellite event was launched in the UK (London). And here I am again, in a London hotel room, after two days of Devoxx UK.
Day 1: Getting started
With a silly one hour jetlag (which shouldn’t be a thing, but is…) I was awake and at the venue very early. Slowly but surely people started flooding in on the exhibition floor. It quickly became clear there were more visitors than last year. On the exhibition floor a lot of smaller local startups were present. But surprisingly I haven’t had any sales pitch at all, this is something other conferences can learn from. Only genuinely interesting people talking about content.
There was also a corner where Devoxx UK had invited cool upcoming hardware like the NFC ring and Crazyflie tiny quadcopter. These projects quickly generated a crowd of people shouting: “Shut up and take my money!” The problem was, they didn’t have anything for sale… just an “you can order online”. This is a bit of a shame, I’m pretty sure they are now missing at least a dozen of impulse purchases. They should do something about this next year.
Day 1: The keynote
The awesome thing about Devoxx is… the lack of sponsored talks and keynotes. At least, that is how it seems to be. I really haven’t seen any session that even had a hint of sponsor.
This year the keynote speaker was Dan North and he talked about some of his personal experiences. He talked about moments where colleagues did or said things that really affected him and his career. Things you say and do at work might have a lot more influence than you can imagine. The best part of his story (in my opinion) was how a colleague tricked him into pair programming. If you keep asking “hey buddy, can you help me with this?” eventually your ‘buddy’ is just sitting next to you all the time.
During the keynote two amazing artists from Smartup Visuals created a drawing of Dan, later they hand customised conference t-shirts for the visitors.
Day 1: The sessions
As ‘Devoxx videographer’ I can’t always just pick a session and settle down. Most of the time I walk around and only settle down where there is a good opportunity. The first session I attended was by Venkat Subramaniam who talked about the new Lambda expressions in Java. He is a great clear speaker, to the point.
After filming some more I settled in room 2 with Dick Wall for the second session. His talk is named: “What have the monads ever done for us?”… and as you can guess it is about lambdas as well. His talk was a bit more theoretical, naming all the different theoretical objects and patterns (like monoids, monads and functors). Good talk, great speaker, best lambda related talk I’ve seen yet.
The third sessions I had the chance to see was by James McGivern. He’s a programmer with a math background and he has very similar interests as I do. His talk was about ECC (Elliptic curve cryptography) versus RSA, explaining how these security algorithms and the math involved work. I absolutely loved his talk, everything was explained so clearly it reminded me of Numberphile and Computerphile (two related YouTube channels I adore).
Instead of following more tracks I got distracted filming the Crazyflie quadcopter and shooting them from the sky using an automatic NERF gun which can only be fired wearing the NFC ring. I also had a nice discussion about the future of affordable hardware and 3d printing with Dick Wall. The thing we did that evening was to attend the IBM Sensor hacking challenge.
In this hands on session we had to form teams of 4-6 people, each team got an Arduino and a set of 30+ sensors which can be attached. The challenge was to build the coolest piece of hardware with this. There was only one rule though: We had to use WebSphere to communicate with the Arduino. This made absolutely no sense at all to me… we ended up having to write JSP pages that send signals to the Arduino to read/write from the sensors. Even worse was the deploy cycle (should not be needed, but was): stop websphere, kill hanging processes, reboot Eclipse (!), start websphere again, deploy new pages.
The winning team was the Crazyflie crew. They took their quadcopter, fired up the Arduino IDE (boo!) and made the firmware in the Arduino language. In the end they could fly the quadcopter with a joystick attached to the Arduino. Clearly this wasn’t according to the only rule we had, but it was just too cool and had to win.
Day 2: Sessions
The second day of Devoxx UK started with a session by two fellow Dutchies, Regina and Linda. They talked about a pattern they found to change things in projects. They all did this in a “Punch and Judy show”-style, which didn’t quite work in my opinion, it was very rusty and read out. Although the underlying message itself is simple and sound, they started calling everything a pattern. It is a good thing to organise brown bag lunch sessions, sure, but please don’t call this a “brown bag pattern”.
The second session of the day was “Is Your Code Parallel-Ready?” by Maurice Naftalin. His session was also about the new lambdas in Java 8. It had a nice buildup and sketched a good problem, the only thing missing was the code of the solution. The solution was described/hinted but I’d love to have seen the actual code there for clarity. The speed of this session was very slow and I didn’t learn anything new, which is a bit of a waste of time.
Next I went to a panel of: Martijn Verburg, Ben Snipe, Stephen Colebourne & Ted Neward. They talked about the patent wars going on now between Google and Oracle. The final conclusion by Stephen was probably the best: Oracle and Google are dumb, dumber and dumbest; neither can stop now and they’ll likely settle.
The title “Modern Web Architecture, 2014 Edition”, by Ted Neward, is a bit misleading. Because 80% of the talk is a big history lesson of the origin of ‘the web’. The final 20% was more about common sense. Don’t get vendor locked in, think about creating a platform with multiple possible entry points, not just a website. Not really what I expected to hear, interesting nonetheless.
The final session was by Arun Gupta and Antonio Goncalves, they quickly went through 50 new features in Java EE 7. For some reason I’m not fond of the way Java EE is going. All the logic is being put in annotations. I predict a new term ‘annotation hell’ which is going to replace the ‘xml hell’ we had a couple of years ago. I’ve been warning about this since 2008, and it is getting worse and worse.
In the final keynote Martijn Verburg summarized the things he learned this Devoxx UK, some of the trends he noticed. There were a lot of lambda talks, maybe a bit too much. A couple of years ago there would have been a lot of alternative language talks (JRuby/Groovy/Scala/etc) but there weren’t a lot of those talks anymore.
Then Dick Wall hit the stage, and actually continued from what we’d been talking about on the first day: cheaper electronics. The Arduino is cool and kind of cheap, so is the Raspberry Pi… but for a real Internet of Things we need devices much smaller and much cheaper. It doesn’t have to do graphics for example! It can probably be a fraction of the cost. Oh and did you know Dick Wall’s dog has a Fitbit? (true story).
Finally Arun Gupta and Audrey Neveu talked about Devoxx4Kids, which is still gaining a lot of popularity all over the world. But we can always use more volunteers and new events!
Building for a Raspberry Pi in an emulator is just as slow as on the actual Pi. There is a slightly faster method involving chroot. But if you really want speed you’ll have to set up a cross compiler environment or try this other cross compiler setup.
Also: Links in the article below seem to be broken and it might not work anymore.
Original (outdated) article:
Today a colleague and I wanted to install gnuradio on a Raspberry Pi. This can than be combined with the amazing RTL-SDR dongle. This dongle this is a DVB-T USB stick, but can be turned into full software defined radio.
More information on that can be found here: http://www.rtl-sdr.com/about-rtl-sdr/.
When trying to compile gnuradio on the RPi (Raspberry Pi) we followed this description. But we quickly ran into a problem, compiling would take 20+ hours!
After running ‘make’ and grabbing a cup of coffee we set ourself a new goal, is it possible to emulate the RPi on our fast Macbook instead?
After following a couple of guides that didn’t work we finally managed to get Qemu up and running, this is what we did:
- Install and upgrade Xcode to 4.3 or above
- Install the latest version of Homebrew
Now we need to modify the Homebrew formula (which downloads and install qemu) to the correct version:
I’m using the osx$ prefix for commands that are executed on your OS X machine, pi$ for commands on the virtual Raspberry Pi.
Use the following file to get the working version 1.7.1 (other versions had SCSI problems): qemu.rb
After setting qemu.rb to the correct version you can install qemu:
Now check if qemu is installed correctly:
Now we need to download two thing:
- Linux kernel
- Raspbian image
To download the linux kernel:
Now we’ve downloaded the latest version of the raspbian image.
In our case: 2014-01-07-wheezy-raspbian.img
Now it is time to start the image in the emulator:
This first boot is a bit special because we only initialize /bin/bash. This is because we need to make two changes to the system:
We need to add a comment to this file:
Comment this line by placing a # in front of the line:
Now create the following file:
And put in the following content: 90-qemu.rules
Now we can stop the emulator and make one final change, the image file is a bit small and we need to increase the size before we continue:
From now on we can do a normal boot (save this command) by removing the “init=/bin/bash” part:
The last thing we need to do to get our virtual Raspberry Pi up and running is:
In this menu, you can “Expand filesystem” to make use of the increased image size (need to reboot afterwards).
Now you are ready to explore the raspberry pi without actually needing one.
Some problems we’ve encountered:
- qemu raspberry pi boot getting stuck in ‘scsi’ loop (fixed by using version 1.7.1)
- Disk size problems, resize didn’t work, expand filesystem didn’t work (fixed expanding and using ln -snf)
If you want to create an object in Java there is only one way: use a constructor.
Constructors in Java are different from all other methods. They don’t have a return type and can’t be invoked, only using the new keyword. Other tasks performed by constructors are:
- Initialisation of class variables
- Call the default contructor of the superclass if no constructor is present
- Initialisation of instance variables
- Execution of constructor body
Here is a common example of a constructor in Java:
If you want to create an immutable object in Java you need to have a constructor to initialise the final fields. Lets look at another example:
Immediately a problem becomes clear… if you read the code you have absolutely no idea what all the arguments mean. What is 200? What is 95.0? We have to look at the API or open the code of Person to see what arguments we can supply or have been supplied. Thankfully there is a design pattern that solves this problem, the Builder pattern.
The builder pattern is an ‘object creation software design pattern’. It can be used to create a immutable objects in a fluent, readable way.
Reading this code is much clearer, we don’t have to guess what the arguments mean, it is crystal clear. So we’ve got a good solution, right? Well, not really, I’ve got a big problem with the Builder pattern. For a simple immutable object as the Person above with just six fields we need the following code:
Woah… that is a stupendous amount of code just for a better readable immutable way of object construction. I don’t have a good solution for this problem, the Java language doesn’t have a good way to construct objects (especially immutable objects) in a readable manner. Constructors end up having a lot of confusing nameless arguments; or you’ll have to write a huge Builder class with a nice fluent interface.
The Java language is constantly evolving and there are now proposals to add ‘value types’ in Java. Reading through the proposal it seems the only way to construct the value type will be using the constructor, but I’m afraid this will quickly become a burden again. I’d love to have a better way to construct objects and (in the future) values, although I have no idea what it should look like. I’d love to have a fluent way of object creation without having to code a big Builder class, preferably in the language itself.
Would it be possible to change to language in a backwards compatible way to allow this?
One possibility would be to ‘steal’ from other languages, for example Moose Perl:
This has the readability advantage and has the flexibility of the Builder pattern (not needing dozens of overloaded constructors).
Would something like this be a good addition to the Java language?
As some readers might know, I sometimes play a programming game called Core War. This afternoon I was browsing through some old ‘Core Warrior’ newsletters which John Metcalf has collected here. When reading an article about an old successful warrior called ‘Thermite II’ I came across this tiny unsolved mystery:
When executing the code of the second warrior “Killing Hazy Shade Of Winter III” I noticed there is a false assumption in the email. The warrior doesn’t need to die in three cycles!
The first round it’ll execute:
LDP.B #0,#0 (and become LDP.B #0,#-1)
Then it executes:
JMP <-1 (to a location two higher than the warrior, which is usually empty code… but not always!)
The final line in Thermite’s code is a JMP instruction to the launch-instruction which loads Thermite’s bomber ‘Brand’. If John K W’s warrior was loaded at a position 100 or 101 from Thermite it would parasite into the bomber and become Brand before Thermite itself does. This is how the small seemingly suicidal warrior can kill Thermite.
After 18 years, Q.E.D. :-)
As input he is using thumbnails from the Ludum Dare game programming contest.
To make such a program you basically need four things:
- A collection of thumbnails/photos/images to place
- Divide the target mosaic image into a grid of tiles
- Have a scoring/measuring method for each tile
- Have a placement algorithm
Will is using all the thumbnails from the contest and a given target image. He then divides the target image into a grid. Next comes the important part, how do you measure how well a thumbnail matches a target tile? He is using a clever per-pixel RGB matching technique. The closer the color matches, the higher the score.
The final (important) step is placement. For each tile he takes the highest scoring thumbnail and assigns it to the tile. If you keep doing this (greedy) you’ll get a good recognisable result:
When he explained his algorithm (in gtalk) it reminded of the problem I encounter in almost all the Al Zimmermann programming competitions. Eventually all these algorithms boil down to search algorithms. You are looking for the best combination of images with the highest overall matching score (correctness/likeness). Instead of using just the greedy algorithm I suggested he’d try randomly swapping a couple of images and checking the score for improvements. This turned out to be a difference of night and day, check it out:
Of course there are still problems with this method, for example: you’ll quickly run into a local optima. Maybe you’ll have a much better image if A -> B and B -> C and C -> A. This will never be reached by swapping two tiles if the individual steps don’t have an improved score. This can be countered by swapping multiple pairs at once and hoping you’ll hit this correct pair.
There are a lot of other ‘smarter’ things you could do, for example always try to put a ‘best match’ on a particular tile and trying to fill the hole it created…. But for now adding this simple random swap is perfect!
Ever since the first time I heard about the Turing Test I’ve wanted to make my own chatbot. It started probably twenty plus years ago, when the only language I could program in was QBASIC. At that time I never got further than:
…. and now? ….
Since that first try (aged 10 or something) I’ve never tried to build another chatbot. But a couple of days ago I read a news article about the Loebner Prize, an annual event that tests the best chatbots in the world against some human judges, and it sparked my interest again.
I started researching the Loebner Prize winners and there seems to be three distinct groups/types of chatbots and algorithms:
- Template based bots
- Crowdsourced bots
- Markov Chain bots
Let me quickly describe how they work.
Template based bots
When somebody talks to the bot it quickly goes through all the templates and finds matches. This particular pattern will for example match:
What is your name?
My name is Chatbot.
These kinds of bots have enormous databases of AIML templates, mostly hand-crafted by skilled bot configurers. Although they work very good, they don’t seem very ‘smart’ and AI to me; so I’m not going to make another AIML-like bot.
One of the best (most human-like) chatbots is Cleverbot. But in my opinion it is one of the most simple bots around. It uses a nice trick and huge database. When it encounters a question it doesn’t know/understand, it just ignores it, and stores it. In another chat session, it’ll mimic and repeat that question to another human. The result is stored in the huge database for future conversations. As a result all the answers it gives are very human-like (because, well, they are human answers).
But there is obviously a huge drawback, for example at one moment the bot is pretending to be a 18 year old male, the next moment it claims to be a 40 year old female. Then it starts talking about how much it loves horses, the next moment it says it hates animals…
Markov Chain bots
To keep this (long) blog post within reasonable length I’m not going to elaborately explain how Markov Chains work. Markov Chain bots they store words in Markov Chains, let’s for example say we store chains of length three.
Now let’s imagine we are building a valid reply to a question and we already have “let’s store”… what can we do next? We go into the chain and walk the nodes until we find the two matching results (1) and (5). So the sentence can continue with “this” or “it”.
Famous bots like MegaHAL kind of work this way (see their explanation). Although this feels much more like real AI/knowledge to me, it also has drawbacks. For example you can’t say these bots are reasoning, they don’t understand the environment/context it is in.
A new attempt
I’ve made a list of what my ideal chatbot should be:
- Learn through conversation/reading text
- Not just repeat, but understand relations and concepts
- Have different scopes, global knowledge, conversational scope
Two days ago these ideas started to take shape in my head and I started writing the first code. The first goal is to make the bot able to read text and extract ‘knowledge’ from it.
The first thing I had to do was to break up the input text into pieces, for this I found a great open source framework called Apache OpenNLP. It recognises words and sentences, it detects verbs, nouns, pronouns, adjectives, adverbs etc.
The next thing I wanted to do was to turn all the nouns and verbs into their ‘base’. When storing relations in the bots memory I want to avoid duplicate entries, so for example “fish are animals” and “fish is an animal” is the same. For that purpose I’m using WordNet® in combination with the Java library JWNL.
Currently this is what the bot sees when given some input:
Understanding what kind of words are used and being able to transform them into the base-form will make it easier to store ‘knowledge’ and make sense of the world in the future.
Instead of learning how to use a real graph database (like Neo4J) I decided to build something myself. Normally this is a horrible choice, but I’m in it for the fun and to learn new skills. Although it is yet another distraction from the actual bot, after a couple of hours I’ve got the following unit test working:
There are already more facts possible to extract from this ‘knowledge database’ than just the plain information I put in.
The next step in making my bot is going to be data extraction. I’m probably going to make a small template language that (might!) look like this:
The template should match sentences like “fish are animals”, “roses are red”, “Roy is a handsome dude” and “Roy is an obvious liar”.
The bot should be able to store all these ‘facts’ and put them in my graph/relation database. With these data extraction templates it should be possible to build a large knowledge base with facts for the bot, for example just by parsing a Wikipedia dump.
Now I’m going to dive back into the code and continue my chatbot adventure, keep an eye out for part 2!
This week I’ve started running/jogging, and I’m using RunKeeper on my iPhone to track my progress.
RunKeeper stores all the GPS data from your runs. This data is displayed per run, including a nice map of your route. Most important data can be reached throught the web interface, things like pace (min/km or min/mile) and distance. The most rewarding thing in running is breaking your own records, and RunKeeper has a couple of records:
- Longest run (distance)
- Longest run (time)
- Most calories burned in a run
- Fastest average pace
As you can see, all those statistics are about single runs, what I’m missing are the following records:
Fastest 1 KM, 3 KM, 5K, 10K etc.
For example, when I’ve run a very fast 5.5 km run, I’d love to see this reflected as my 5K personal record, but right now it is lost because I’ve already done a slow 6 km run and a very fast 1 km sprint.
But luckily RunKeeper has a very useful option for us developers: Settings > Export data.
This results in a ZIP file with GPX files, raw GPS data with time and location!
The first thing I did was download the XSD and generate the JAXB Java code:
Now I can open the GPX files in Java and extract the GPS locations and time like this:
The next thing I did was translate the lat+long+time waypoints into ‘legs’ consisting of meters+seconds.
Those legs can be used to check for a record during longer runs, like this:
For example using a target of 5000 meters in a 5070 meter run, this analysis finds the following 5K times:
The information from RunKeeper website is:
- Distance: 5.07 km
- Time: 30:38
- Pace: 6:03
But when analysing the data more accurately, it could have said:
- New personal record 3K: 00:16:46
- New personal record 5K: 00:29:55
I couldn’t believe this feature wasn’t available in RunKeeper… but after a lot of Googling it turns out a lot of other people are looking for this ‘most requested’ feature! With a little bit of Java (100 lines of code) you can get a pretty good result: