Creating a chatterbot (Part 1)
Ever since the first time I heard about the Turing Test I’ve wanted to make my own chatbot. It started probably twenty plus years ago, when the only language I could program in was QBASIC. At that time I never got further than:
…. and now? ….
Since that first try (aged 10 or something) I’ve never tried to build another chatbot. But a couple of days ago I read a news article about the Loebner Prize, an annual event that tests the best chatbots in the world against some human judges, and it sparked my interest again.
I started researching the Loebner Prize winners and there seems to be three distinct groups/types of chatbots and algorithms:
- Template based bots
- Crowdsourced bots
- Markov Chain bots
Let me quickly describe how they work.
Template based bots
When somebody talks to the bot it quickly goes through all the templates and finds matches. This particular pattern will for example match:
What is your name?
My name is Chatbot.
These kinds of bots have enormous databases of AIML templates, mostly hand-crafted by skilled bot configurers. Although they work very good, they don’t seem very ‘smart’ and AI to me; so I’m not going to make another AIML-like bot.
One of the best (most human-like) chatbots is Cleverbot. But in my opinion it is one of the most simple bots around. It uses a nice trick and huge database. When it encounters a question it doesn’t know/understand, it just ignores it, and stores it. In another chat session, it’ll mimic and repeat that question to another human. The result is stored in the huge database for future conversations. As a result all the answers it gives are very human-like (because, well, they are human answers).
But there is obviously a huge drawback, for example at one moment the bot is pretending to be a 18 year old male, the next moment it claims to be a 40 year old female. Then it starts talking about how much it loves horses, the next moment it says it hates animals…
Markov Chain bots
To keep this (long) blog post within reasonable length I’m not going to elaborately explain how Markov Chains work. Markov Chain bots they store words in Markov Chains, let’s for example say we store chains of length three.
Now let’s imagine we are building a valid reply to a question and we already have “let’s store”… what can we do next? We go into the chain and walk the nodes until we find the two matching results (1) and (5). So the sentence can continue with “this” or “it”.
Famous bots like MegaHAL kind of work this way (see their explanation). Although this feels much more like real AI/knowledge to me, it also has drawbacks. For example you can’t say these bots are reasoning, they don’t understand the environment/context it is in.
A new attempt
I’ve made a list of what my ideal chatbot should be:
- Learn through conversation/reading text
- Not just repeat, but understand relations and concepts
- Have different scopes, global knowledge, conversational scope
Two days ago these ideas started to take shape in my head and I started writing the first code. The first goal is to make the bot able to read text and extract ‘knowledge’ from it.
The first thing I had to do was to break up the input text into pieces, for this I found a great open source framework called Apache OpenNLP. It recognises words and sentences, it detects verbs, nouns, pronouns, adjectives, adverbs etc.
The next thing I wanted to do was to turn all the nouns and verbs into their ‘base’. When storing relations in the bots memory I want to avoid duplicate entries, so for example “fish are animals” and “fish is an animal” is the same. For that purpose I’m using WordNet® in combination with the Java library JWNL.
Currently this is what the bot sees when given some input:
Understanding what kind of words are used and being able to transform them into the base-form will make it easier to store ‘knowledge’ and make sense of the world in the future.
Instead of learning how to use a real graph database (like Neo4J) I decided to build something myself. Normally this is a horrible choice, but I’m in it for the fun and to learn new skills. Although it is yet another distraction from the actual bot, after a couple of hours I’ve got the following unit test working:
There are already more facts possible to extract from this ‘knowledge database’ than just the plain information I put in.
The next step in making my bot is going to be data extraction. I’m probably going to make a small template language that (might!) look like this:
The template should match sentences like “fish are animals”, “roses are red”, “Roy is a handsome dude” and “Roy is an obvious liar”.
The bot should be able to store all these ‘facts’ and put them in my graph/relation database. With these data extraction templates it should be possible to build a large knowledge base with facts for the bot, for example just by parsing a Wikipedia dump.
Now I’m going to dive back into the code and continue my chatbot adventure, keep an eye out for part 2!