General chatbot architecture
The first thing to understand is how a chatbot works internally. Basically, given a user input, a chatbot returns a response. The principle is simple, but in practice things are not so easy.
Understanding what the user says
Assume that you are dealing with a travel chatbot and you ask the following:
I want to fly to Venice, Italy from Paris, France, on January 31
First, the chatbot needs to understand the input. There are two main techniques to achieve this: pattern matching and intent classification.
A pattern matching approach needs a list of possible input patterns. The input above could match a pattern such as:
I want to fly to <CITY> from <CITY> on <DATE>
The good point with this approach is that the patterns can be read by humans, so the input modeling phase can be somehow straightforward. The problem is that patterns are built manually: it is not a trivial task and it does not scale in several real use cases.
An intent classification approach relies upon machine learning techniques. You need a set of examples to train a classifier that will choose, given a user input, among all the possible intents (e.g. buy a ticket, check flight status, get specific information, etc.).
In any case, for the example above the notions of city and date are crucial to understand the input and return an appropriate answer. The chatbot is probably going to perform a search in a database (or online query) to look up for tickets from Venice to Paris at the given date. Thus, the chatbot needs to perform previously information extraction on the input to extract the important entities: locations, airlines, airports, dates, etc.
Classifying the input and extracting information from it are two key concepts that you have to keep in mind.
Responding to the user
Once the chatbot understands what the user says, it can choose or generate a response, based on the current input and the context of the conversation.
Static responses
The simplest way is to have a static response, with eventually a list of variants, for each user input. These static responses could be templates, such as The flight time is <ft> hours
, where <ft>
is a variable computed on the fly by the chatbot.
Dynamic responses
A different approach would be to use resources, such as a knowledge base, to get a list of potential responses, and then score them to choose the better response. This is particularly appropriate if you chatbot acts mainly like a question-answering system.
Generated responses
If you have a huge corpus of examples of conversations, you could use a deep learning technique to train a generative model that, given an input, will generate the answer. You will need millions of examples to reach a decent quality and sometimes the results are going to be unexpected, but it could be interesting and fun to test the approach and see what happens. This is an ongoing research subject, extremely promising and exciting.
Do not forget the context of the conversation
Current input is not enough to give a correct answer to the user. To model and implement the logic of the chatbot, the notion of context is very important. For example, if the user types the following input:
How many bags can I bring with me?
The chatbot can only answer the question if it knows the details of the ticket. Typically, this information was previously stored in the context of the conversation. Of course, each chatbot has to model its own notion of context and decide the information that is important to remember.
Existing platforms
Before you can choose a platform, you must know what kind of chatbot you are trying to build. Is it a goal-oriented, conversational or goal-oriented with strong conversational abilities chatbot?
A goal-oriented or transactional chatbot is the most frequent kind of chatbot for business. It helps users achieve tasks such as buying a ticket, ordering food or getting specific information.
A conversational chatbot is focused on having a conversation with the user. It does not need to deeply understand what the user says and does not have to remember all the context of the conversation, it just need to emulate a conversation. What are conversational chatbots useful for? Well, entertainment could be one reason, but you could, for example, create a chatbot that replaces a classic FAQ and offers a more dynamic experience to the users.
Having clarified this, we can distinguish three families among the existing platforms:
- No programming platforms.
- Conversation-oriented platforms.
- Platforms backed by tech giants.
This is not a formal taxonomy but rather a way of grouping or categorizing the platforms.
No programming platforms
They are non-technical user oriented platforms. It is usually easy to code a chatbot without having programming skills and without having machine learning or natural language processing expertise. The key idea is that the user does not have to worry about the technical details.
There is a plethora of no programming platforms and it would be impossible to list them all here. At Tryolabs we have tested some of them to have a taste of their pros and cons: Chatfuel, ManyChat, Octane Ai, Massively and Motion.ai.
First thing to say is that they are all task-oriented, the most common example presented being “order a pizza”. We found that, even if at a first glance they seem very similar, there are important differences in maturity, GUI usability and natural language processing power.
Pros
- You can develop a chatbot very quickly.
- They have a low learning curve.
- They are ideal for simple bots.
Cons
- There are a lot of platforms, with different levels of maturity and stability.
- Sometimes the GUIs are not so easy to understand and when the chatbot logic gets more complex, it becomes hard to handle.
- They have little or no natural language processing capabilities. For example, some platforms cannot perform information extraction. Therefore, given a phrase such as “I’m in Boston” they cannot extract the fact that the city of Boston (location entity) occurs.
- They do not seem appropriate for complex bots.
Conclusion
From our point of view, the no programming platforms lack of power for large scale commercial projects. The conversations cannot be very complex and usually it is not possible to integrate external resources, such as NLP and ML specific components.
However, they are really good platforms for small scale projects, typically to quickly add a chatbot functionality to a Facebook page, for example. So you may want to give them a try and see what they can do for you.
Conversational platforms
The main goal here is to allow the user to have a conversation with the bot, without considering a task-oriented scenario. These platforms typically use specification languages such as AIML (Artificial Intelligence Markup Language) to model the interactions with the user. The example below shows how to code interactions with AIML.
When the user says “my dog’s name is Max”, the chatbot recognizes that pattern and extract the dog’s name. It must be noted that this extraction by text match is very simple if we compare it with the power of NLP information extraction. The chatbot is going to respond with “That is interesting that you have a dog named Max”. Later, if the user asks for his dog’s name, the chatbot would be able to respond “Your dog’s name is Max”.
The most known example of this kind of platforms is Pandorabots.
Pros
- AIML is a standard.
- It is very flexible to create conversations.
Cons
- It could be difficult to scale if patterns are manually built.
- The information extraction capabilities are limited.
- They are not really appropriate for task oriented bots.
Conclusion
You would not use these platforms to build a chatbot for ordering food or buying tickets, but you could find that they are very interesting to quickly model an entertainment chatbot or, for example, a chatbot that replaces a FAQ and gives a better user experience.
Platforms backed by tech giants
These platforms are developed by tech giants companies and, somehow, they represent already a standard or at least they are on its way to become one:
They try to have a low learning curve and, at the same time, a strong expressive power.
For diverse reasons, at Tryolabs we have focused on Api.ai and Wit.ai. Our impression was that LUIS and Watson propose a framework a little more complex (and eventually more powerful) than what we needed. Regarding the Amazon Lex, we did not get access to the Limited Preview at the time of writing this post.
We are not going to exhaustively compare Api.ai and Wit.ai, or to go into each platform in depth, but rather give you our experience feedback. When you model a chatbot, you understand immediately that one of the hardest parts, if not the hardest of all, is the modeling of the conversation flow. It is this that defines, basically, the behavior of the chatbot. Let’s see how Api.ai and Wit.ai deal with this crucial aspect.
Api.ai
Chatbot behavior
Intents and Contexts are the key concepts to model the behavior of a chatbot with Api.ai. Intents creates links between what a user says and what action should be taken by the bot. Contexts are string values, useful for differentiating requests which might have different meaning depending on previous requests.
Basically, when Api.ai receives a user request, it is first classified to determine if it matches a known intent. Api.ai proposes a “Default Fallback intent” to deal with requests that do not match any user intent.
You can restrict the matching of an intent by specifying a list of contexts that must be active. At the same time, the matching of an intent can create and delete contexts.
In the example above, when the user says “I would like to order a large pizza”, this request matches the intent named order
, which could create a context named ordering
. When the user has indicated the pizza type, size, etc., you could create a context named pizza_selected
(and keep the ordering
context alive). Then later, if the user says “What is the delivery time?” the bot could match an intent named get_order_info
only if the context named pizza_selected
exists.
This mechanism of intents and contexts allows to create state machines that model large and complex flows. However, you cannot model that an intent can be matched only if a certain context is not present. This is a current limitation of Api.ai and we think that it is likely that they are going to work on this issue.
Entities
You can define your own entities and use those proposed by the platform. In the “order a pizza” example above, the type and the size of a pizza are user defined entities, while the address and the quantities are system entities.
Slot-filling capabilities
This is a key point of Api.ai, that brings at the same time flexibility and power. Slot-filling allows you to indicate, for a given intent, what are the fields that play a role and if they are mandatory or not.
This is great since you do not have to deal with missing information since it is done on the Api.ai side. In the example above, Api.ai will ask for each mandatory field until they are filled in by the user: pizza type and size, addres and time of delivery. As you can see, the field “number” may be part of the intent but it is not mandatory.
Server size coding
Of course, to define the full logic of your chatbot you will need to add some custom coding on the server side. Api.ai proposes a webhook integration that really makes the process very simple. Basically, Api.ai passes information from a matched intent into a web service and gets a result from it. A very useful feature is that the result sent to Api.ai can change the contexts and the chatbot response, both on the text and voice level, so you not only can implement a server side logic but you can also modify, at some extent, the chatbot side logic. You can decide which intents are going to call the webhook and if the webhook is going to be called during the slot-filling processing. This combination is a powerful and flexible tool to customize your chatbot behavior.
Pros
- Api.ai proposes a powerful way of modeling large and complex flows using Intents and Contexts.
- Slot-filling is an integrated feature. Consequently, a good part of the logic can be solved by the chatbot, which decreases the server side coding.
- Domains are available, that is specifications that can deal with several common use cases and applications (e.g. small talk, wisdom, flight schedules, reminders…).
- A section “Training” (in beta) is proposed to train the chatbot with examples.
- One-click integration with several platforms: Facebook Messenger, Slack, Twitter, Telegram…
Cons
- It is impossible to block the matching of an intent if a context is present.
- The training section is still in beta.
Wit.ai
Chatbot behavior
Stories are the key concept to model the behavior of a chatbot with Wit.ai. Each story represents an example of a possible conversation. It should be noted that “intent” is no longer a concept but a user entity, non mandatory. This was a change of greater impact in Wit.ai, motivated by the fact that a complex chatbot needs a lot of intents that can, in some way, be grouped in stories.
Bot developers basically teach Wit.ai by example. The subjacent idea is that when a user writes “similar” requests, Wit.ai will process the request, extract the entities and apply the logic defined by the developer.
A story can be seen as a graph of user intents. You can add branches that are triggered on conditions such as the existence or not of specific variable values, that are extracted from the user input. This allows you to define a conversation flow. Moreover, you have a bookmark mechanism, used to jump between intents and also between stories.
To interact with the server side, you have “Bot sends” commands, that are basically calls to functions. A very interesting point is that you can set the role of the entities in a phrase. For example, in “I want to fly to Venice, Italy from Paris, France, on January 31”, you can state that the first city is the departure and the second one the destination.
Entities
Wit.ai lets you define your own entities or use the predefined entities.
Server size coding
Wit.ai proposes a webhook integration: it passes information for each “Bot sends” command into a web service and gets a result from it. On the server side you are typically going to create or expand the context of the conversation. The result sent to Wit.ai can add, modify and delete context variables used on the chatbot side.
Pros
- The concept of story is powerful.
- Wit.ai allows controlling the conversation flow using branches and also conditions on actions (e.g. show this message only if some specific variables are defined).
- Assigning roles to entities helps server side processing.
- A section “Understanding” is proposed to train the chatbot with examples.
- An “Inbox” exists, where the requests that could not be processed by the chatbot are listed, so the developers can teach the bot.
Cons
- Stories are in beta.
- Even if stories are a powerful concept, there are cases where it is difficult to control the flow of the conversation and the bot tends to misunderstand the user requests.
Current limitations: improving with NLP and ML
As we have seen, to model a chatbot we need to provide the logic and the linguistic resources, mainly the input and output phrases and the entities. This is particularly true for Api.ai and Wit.ai. For small chatbots this should not be a problem, but if you are planning to deal with a big terminology and a lot of variants for phrases, you should consider using NLP and ML. We mention a few examples where they could be useful.
Singular and plural forms
If you want your chatbot to extract “pizzas” as an entity, it is not enough to define “pizza”, you need to provide “pizzas” as well.
Api.ai has a feature called “automatic expansion” and Wit.ai has “free-text” entities. They are mechanisms that will try to catch new items, based on word context. So if you have trained your chatbot with phrases such as “I’d like to order a pizza”, it is likely that it will understand that in “I’d like to order 3 pizzas” the word “pizzas” is an entity. But the accuracy of this feature will depend on the training and you cannot be sure about how much noise it is going to bring.
A sure alternative is to provide, for each concept, singular and plural forms. You can generate them using NLP tools called inflectors.
Synonyms, hypernyms and hyponyms
Let’s suppose that the users asks for a soda, but your chatbot only knows specific terms such as coca-cola or pepsi, that are hyponyms of soda. Hypernyms, synonyms and hyponyms can be handled in English because there are a lot of NLP resources, called thesaurus and ontologies, but they are usually for general language. Therefore, coca-cola, a very specific domain term, is unlikely to be part of this kind of resources.
You could try to find an existing thesaurus that fits your problem or build it by your own. Resources built by domain experts are expensive but highly accurate. With Machine Learning you can create linguistic resources, particularly with Deep Learning techniques, that could be good enough to your use case.
Sentiment analysis
Do you want to add some level of emotional reactivity to your chatbot? Well, you could try to perform sentiment analysis on the server side to adapt the responses consequently.
However, it might not be an easy task if you are using Api.ai or Wit.ai. If you want a very flexible and rich chatbot, you should probably consider developing the chatbot from the scratch.
Conclusions / Final thoughts
Clearly, chatbots are a rising trend and at Tryolabs we are witnessing a fast growing demand for them. If done right, this channel of communication with your users can increase engagement, give a better experience and also save costs. However, getting them right is not trivial.
Currently, there are a plethora of platforms that can assist you when creating a chatbot. Some of these platforms have been built with different use cases in mind, so it is clear that depending on the business case addressed by your chatbot, some platforms may be more appropriate than others. With the aim of helping you pick the best tool, we have reviewed some strengths and weaknesses of the existing services for building chatbots.
If you are planning to build a complex chatbot, you should seriously consider stability, scalability and flexibility aspects. If you don’t pay enough attention the intricacies of human language, a conversation can quickly go off the rails. You may be either required to build your own solution from scratch or use a combination of a tool for solving general NLP problems (i.e Api.ai) plus custom server side logic for more powerful features.
All in all, the chatbot ecosystem is moving very fast and new features are being released every day by the numerous existing platforms. As of today, it is clear that when trying to build an ambitious chatbot, which is able to handle complex conversations and take actions (i.e payments), one cannot rely 100% on the platforms and custom NLP development is needed. Recent advancements in Deep Learning techniques may come to be of great help in the near future, and we are very much looking forward to that.
Do you have any questions/comments related to building a chatbot? Please share them with us in the comments.
Originally posted at tryolabs.com/