Human and machine communication towards conversational analytics
In recent years, smart speakers like Amazon Alexa, Google Home, and Apple HomePod have increased their market share, and according to the forecast, their popularity will continue to grow. Lots of people have one or more of these speakers at home, and they are used for different things, from the easiest one like setting an alarm to the toughest one like playing games only using the voice. These speakers are also used in professional and commercial environments.
The cool thing about smart speakers is that they’re deeply customizable thanks to their SDKs, thus creating new ways to interact with them.
And that is precisely the topic of the blog post: making an open-source time-series database (NSDb) smart using Amazon Alexa! But what does smart mean? Well, we can query the database using natural language to obtain reliable information from the database also in natural language. In a few words, we are enabling a new way to interact with a database, without using query language and without having to read the result in some tabular way.
We have said that the Smart Speaker will tell us the obtained results in a natural way, but how can we do that? We will use Natural Language Generation (NLG) and in particular, an open-source library called RosaeNLG that allows us to generate natural language (in various languages like English, Italian, French, German, …) starting from data coming from the database.
Alexa, what is your role in all of this?
In this architecture, the role of Alexa (and generally speaking, of smart speakers) is both input and output. The communication will start talking with Alexa and will end with a response from the speaker.
To enable this opportunity we need to carefully craft a Skill, some sort of app, that let developers add custom functionalities to Alexa. This Skill is needed to write the code used to teach Alexa how to deal with our requests and how to produce answers.
Modularity is the keyword
We are talking about Alexa, NSDb, and RosaeNLG but what if we want to change something? The proposed architecture was built keeping in mind the word modularity, in fact even if it was realized with these three tools, the architecture was made to be able to change (with little effort) some components, allowing to:
- change the smart speaker: we can create a skill for Google Home or other speakers,
- change the database: we can use different databases like MySQL or PostgreSQL,
- change the NLG part: if we want to use for example Neural Networks or another NLG tool,
- add more languages.
Another interesting aspect of modularity is that this project is completely divided into two separated parts:
- From natural language to query
- From query result to natural language
This will enable performing requests in natural language using Alexa obtaining responses in the old fashioned way. Another thing that we can do is querying the database with standard SQL getting responses in textual natural language, visualizing phrases in some BI tools: there are lots of possible combinations using this architecture.
The data flow of this modular architecture is effortless: we start talking to Alexa asking about some data, and this request will be translated into an SQL query compatible with the underlying database.
At this point, the SQL query is forwarded to the database (in this case NSDb) through the Natural Language Lambda, which is the connection point between the two separated parts of our system. This query will be then executed by the database and the retrieved data will be sent back to the Natural Language Lambda. This kind of middleware will then forward the result to RosaeNLG, waiting for the natural language phrase representing the data. Finally, the flow (as we can see in the graph) will go back through all the components to the client (Alexa) that will tell the user the final phrase.
NLG: the easy (and reliable) way
How can we generate the language representation of our data?
Natural Language Generation nowadays is a trending topic, and there is a large number of experiments and alternatives for generating language. The problem is that a lot of these are not reliable.
In this project, reliability is a must-have. We are dealing with a database and we need reliable answers. For example, if inside the database there is a piece of information reporting that the number of listened songs in a specific window of time (for instance the last day) is 2450, we need to say as a response “In the last day the number of listened songs is 2450”. So we cannot rely on (even if well-trained) neural networks because the outcome can be wrong and the result will not match with the information inside the database.
In addition, to properly train a neural network, we need a lot of training data (in this case natural language sentences). The problem is that we don’t have it because the purpose of this project is to generate natural language phrases based on data coming from the database.
So the choice was to use an open-source library called RosaeNLG, which is a template-based Natural Language Generation system that allows us to craft templates (as complex as we want) that can be filled with data coming from the database. RosaeNLG is not a simple template system because it includes specific features to generate sentences in different languages (English, Italian, French, and German) with syntactical and grammar rules of the specific language chosen, synonyms, and referring expressions handling for no repetitions, proper agreement of verbs, etc …
Alexa, show me the results!
So… how does it work?
Here we have some screenshots coming from the Alexa Console i.e. the way how we can communicate with Alexa during the development phase, writing instead of talking. As you can see, the outcome is the same as we were using the physical speaker.
So the following screenshots represent the real communication between the user and the database (filled with some demo data) through Alexa.
As shown, the user can take part in a dialog with Alexa, performing questions with filters and having the response in a natural language way,