Published November 25, 2016 © CC BY-NC-ND

Making Alexa Your Personal Beer Connoisseur

The explosion of microbreweries over the past twenty years has given us a fantastic array of choices. Turn Alexa into a beer expert!

AdvancedFull instructions providedOver 1 day1,435

FIRST 50 SKILLS

The Amazon Alexa API Mashup Contest

Making Alexa Your Personal Beer Connoisseur

Things used in this project

Hardware components

Amazon Alexa Amazon Echo

Software apps and online services

breweryDB API

API's for crowdsourced microbrewery data

Amazon Alexa Alexa Skills Kit

Amazon Web Services AWS Lambda

This is used for the skill as well as building the custom slots and caching layer.

NodeJS HTTPS

Amazon Web Services AWS S3

This is used for persisting the static data from some of the API's.

Story

The number of choices in microbrews is extensive. Almost every city is filled with choices, and new ones are being created every day. How about having a "Beer Bot" that can help you navigate all of the different options as well as sort through the thousands of microbreweries who make them? This project is doing just that through the power of crowdsourcing, API's, serverless, and Alexa.

The Alexa BeerBot

Here's a demo of the skill live in the Alexa Skill Store.

Demo showing the current Beer Bot skill live on an Amazon tap.

Context - BreweryDB

There's an incredible crowdsourced database called BreweryDB that contains information about microbrews. Without this database, this Alexa skill wouldn't be possible as well as the thousands of fans that are sharing what they find. BreweryDB also enabled this information to be exposed via API's, and I've taken the data and created an Alexa Skill with it. For and context, below is a system context view taken from the BreweryDB website and the published skill in Alexa is called "Beer Bot".

Context diagram from BreweryDB.com

Step 1 - Design Voice User Interface

Trying to create a good bot requires picking some potential flows that a narrative may go through between a user and Alexa. The current version of the skill splits into two different directions depending on the type of question being posed.

The first is around narrating the large index of microbreweries, and organizing them by geography. Given that there are more than five thousand to choose from, this seems like a sensible approach, and if someone already knows the specific name of the microbrewery, they can jump right to that question.

The second is around the more than two hundred different types of beers that have style information on them within the database. Right now it's a drill-down based on category.

Complete interaction model of the utterances and schema can be found in the voice folder in github.

Voice User Flow for BeerBot

Step 2 - Architecting a Solution

The range of questions that we want the bot to support will highlight two different styles of how to leverage API’s within an architecture.

First, there is the pattern leveraging API’s where the data is relatively static, and it’s more important to organize it in such a manner that allows navigation through voice commands. There's an ongoing process that can refresh it over time, and some of it will require the skill to be republished into the app store. Going deeper, let's explore the following utterance.

“Alexa, ask Beer Bot to find me a beer”

The dialog with this question will be determining what categories of beer exist (English Ales, Irish Ales, etc.) then drilling into the different styles within the category. In this example, the data is static (there aren't new beer types being created every day), and the interaction will be around navigating a dictionary of information.

In this use case, we can invoke the API ahead of an individual user request, and organize and cache the data locally within the skill. This improves performance, and simplifies the runtime model. Here’s a view of how this looks using the BreweryDB API's and how the data is staged.

Using API's to build a caching layer for static content

An S3 bucket is used to store the data, and is persisted in a json modeled data object. Given the durability of S3, this ensures that the data is always accessible at runtime for the skill and we don't have to hammer the BreweryDB API again and again for what the types of beer are in the English Ales category. It also allows the static information to be pushed into the Alexa voice processing engine via Slots & Utterances, where pattern matching is critical (more detail on this in Step 5 below).

Second, there is the case for using API’s where the data is more dynamic, and where there is value in getting the latest information. The main example for this is the following utterance finding out what beers are currently sourced by an individual microbrewery.

“Alexa, ask Beer Bot what beer does Hardywood Park Craft Brewery have?”

In this example, the results of the query will change over time given that the crowdsourced data is constantly changing in the master database. In this case, we want to hit the API’s directly when the skill is invoked so we get the latest information. This creates a different pattern, where the skill directly invokes the API. Here's what the stream of information looks like.

Using API's directly from an Alexa Skill for dynamic content

The crowdsourced beer data changes every day, so each time the user comes back, the latest information will be provided directly from the source. Now there is the risk that the API is unavailable, in which case the lambda function will gracefully handle the exception, and give back a message to the user to check back later.

The application will be a combination of the two patterns using several API's, and the following steps will explore how.

Step 3 - Tapping into the API's

Using the API's requires first registering with breweryDB to identify the application as a consumer of these services, and will yield an access key that is needed to authenticate the application during invocation. Here's the link to get started.

https://www.brewerydb.com/auth/signup

Once this has been done, we can start to invoke the different API's. A full listing of them can be found here in their documentation.

http://www.brewerydb.com/developers/docs

Here's the code for invoking one of the API's, and it's a common pattern that we will see repeatedly. I'm writing all of the code for the skill in NodeJS using Lambda functions (source is in the lambda.js file in my github repo), so the syntax will change depending on programming language. For more detail in how to use the https package in NodeJS, here's the link to the docs.

https://nodejs.org/api/https.html

// format the API address
var APIurl = 'https://api.brewerydb.com/v2/brewery/';
//
// set the API key provided during registration
var APIkey = 'xxxx'
//
// connect to BreweryDB and invoke the API with a GET request
https.get(APIurl + breweryId + '/beers?key=' + APIkey + '&format=json', (res) => {console.log('API Call to Brewery DB HTTP Code: ', res.statusCode);
//
var beerData = "";
//
// capture data returned in the response
res.on('data', (d) => {
    beerData += d;
});
//
// after the end of message is received, process the data
res.on('end', (d) => {
    //
    // convert the data into a usable format
    returnData = eval('(' + beerData.toString('utf8') + ')');
    //
    // validate that a good (200) response code was received
    if(returnData.message == "Request Successful") {
        (... now do a bunch of processing ...)
    }
});
//
// this gets invoked if there is an error in the HTTPS request
}).on('error', (e) => 
    console.error(e);
});

There are several different components to the request that is sent to BreweryDB within the call. Breaking this into different parts shows how the call is made.

The APIurl is defined in the registry of services and for the logic in this section of the code is the Brewery API. For more information, here is the documentation.

http://www.brewerydb.com/developers/docs-endpoint/brewery_index

The APIkey is given to me by BreweryDB as part of the registration process, and used to authenticate me as a user. I can use this throughout the application as it works for all API's, and I've removed it from the syntax above given that it should be kept private.

This API requires the breweryId to be passed in dynamically and formatted as a string. Prior business logic determined what this value was based on user input.

Within the URI you will notice the &format=json option that is requested. This asks the API to package the response as a json object (versus XML) which is easier to consume in javascript.

The response received is an array of beers from the database for this microbrewery, and is packaged as a json object based on our request. We then can parse through this information into the business logic within our skill.

Step 4 - Building the Caching Layer

Going through the registry of API's identifies what data can be exposed. Some of this information is static - for example, the dictionary of beer styles and categories. Those are covered by the /categories & /styles API.

Rather than invoke these every time, I've written a lambda function that executes them once, then save off the json response into an object at an S3 bucket or persist them within the code of the skill (this is the first pattern described in Step 2).

When accessing the S3 bucket from the skill, we need to make sure that the execution role is granted access to the bucket. This is how the authentication model works within AWS, and can be frustrating to troubleshoot.

Another object that is cached is the array of cities that breweries are available for. This data is a rather large array and is more dynamic, so it's persisted as an object in S3, and retrieved by the skill as needed.

Step 5 - Building the Custom Slot Data

To leverage the machine learning capabilities of Alexa, we need to be able to highlight the patterns that we want to teach Alexa to recognize for this skill. This is done by building custom slots that can then be loaded as part of the publishing process. It's a simple metaphor that a slot is like a dropdown box in HTML, but it's actually incorrect. A more accurate (albeit complex) explantation is that you're providing a set of clues for which the user may have said, but its not a set of rules that is strongly enforced. Here's a simple example.

My favorite color is {Custom Slot}.

We could then create a list of colors (Red, Blue, Orange, Yellow, etc.) that makes up the slot. When the user says "My favorite color is Blue", Alexa passes back an object that parses out the attribute "Blue" which can then be processed. Given that we are dealing with the spoken word, it's up to Alexa to format the text and by providing the slot data as context, we will get back the string "Blue" versus "Blew" which from most people sounds exactly the same.

Back to this specific skill, the most significant custom slot will be the list of microbrewery names, and we want to be able to leverage the machine learning within the Alexa to be able to help us with the matching. Looking across the entire database, there are more than 5000 microbreweries, so there are many that sound similar. For example, there is Beaver View Brew in Albion, Nebraska; Beaver Creek Brewery in Wibaux, Montana; Beaver Brewing in Beaver Falls, Pennsylvania; Beaver Island Brewery in Saint Cloud, Minnesota, and Beaver Street Brewery in Flagstaff, Arizona. By providing what the possible choices are, we can maximize the hit rate for matching user intent with the invocation of the API's.

I've written two Lambda functions to build this information - within the GitHub repo, the first function is the gatherBeerData.js script, and uses the /locations API to build a json array that is stored in an S3 bucket, and will be used at runtime by the skill. The second is the createSlot.js script that trims down the data in the array into a text file that will contain the microbrewery names, and will be published as part of the skill

Serverless data extract process using BreweryDB API

There is some level of logic in the scripts beyond the data extracts. After receiving the array back, it then parses out all of the duplicate names given that each line of the slot data must be unique. Given that the data is typically used for visual media, there is also some data scrubbing logic to take out acronyms that make the matching logic more difficult. The most common example is converting "Co." to "Company" as we want to be consistent in our language. For example, in BreweryDB, we convert "The Veil Brewing Co." to "The Veil Brewing Company".

Step 6 - Building the Skill

Once we have the beer dictionary data loaded as objects within S3 and the custom slots created, writing the skill is the final step. This is authored in NodeJS and is called lambda.js in the GitHub repo.

The code is organized into a series of functions that are invoked based on the intent. Here are the mappings from utterances to intent for this skill.

AMAZON.StartOverIntent start 
AMAZON.StartOverIntent begin 
AMAZON.StartOverIntent new 
AMAZON.StartOverIntent main menu 
AMAZON.StartOverIntent menu 
AMAZON.StopIntent goodbye 
AMAZON.StopIntent no thank you
ListAvailableCities List cities 
ListAvailableCities List city names 
ListAvailableCities List of cities 
ListAvailableCities Which cities do you have 
ListAvailableCities Give me a list of cities 
ListBeerCategories Find me a beer 
ListBeerCategories Find me a beer type 
ListBeerCategories List beer categories 
ListBeerCategories Which beer categories are there 
ListBeerCategories Which categories exist 
ListBreweriesForCity List breweries for {City} 
ListBreweriesForCity List microbreweries for {City} 
ListBreweriesForCity List breweries from {City} 
ListBreweriesForCity List microbreweries from {City} 
ListBreweriesForCity What breweries for {City} 
ListBreweriesForCity What microbreweries for {City} 
ListBreweriesForCity What breweries are located in {City} 
ListBreweriesForCity Which are located in {City} 
ListBreweriesForCity Which are in {City} 
ListBreweriesForCity List what are in {City} 
GetBeerStyles List beer styles for {Category} 
GetBeerStyles What beer styles are there for {Category} 
GetBeerStyles Give me the beer styles for {Category} 
GetBeerStyles Give me beer styles for {Category} 
GetBeerStyles Describe beer styles for {Category} 
GetBeerStyles Get beer styles for {Category} 
GetMoreDetail Beer detail on {Brewery} 
GetMoreDetail Beer detail at {Brewery} 
GetMoreDetail More detail on {Brewery} 
GetMoreDetail More detail about {Brewery} 
GetMoreDetail More detail for {Brewery} 
GetMoreDetail More information on {Brewery} 
GetMoreDetail More information about {Brewery} 
GetMoreDetail More information for {Brewery} 
GetMoreDetail Tell me more about {Brewery} 
GetMoreDetail Tell me more on {Brewery} 
GetMoreDetail Tell me more information on {Brewery} 
WhatsOnTap What is on tap at {Brewery} 
WhatsOnTap What is on tap for {Brewery} 
WhatsOnTap What is on tap with {Brewery} 
WhatsOnTap What is available at {Brewery} 
WhatsOnTap What does {Brewery} have 
WhatsOnTap What beers are on tap at {Brewery} 
WhatsOnTap What beers are at {Brewery} 
WhatsOnTap What beers does {Brewery} have 
WhatsOnTap What kind of beer does {Brewery} have

Each of the intents have their own functions, and then gather the data for the response based on either calling the API directly, or leveraging cached data in the local memory of the skill, or by calling out to a S3 bucket. Details are in the repo, and here's a mapping of where the data is retrieved from in each.

One of the most important aspects of building a good skill is to be able to interpret intent, and user interaction, we want to make sure that we are flexible in how to understand what is being asked for. In each of the functions, there is logic that helps interpret.

For example, in the getBreweriesByCity() function, the code must be written to interpret variations. For example there shouldn't be any difference in the following.

List microbreweries for Seattle.

List microbreweries for Seattle Washington.

In normal language, we assume that some cities stand alone in their recognition, and providing additional detail is redundant. This requires some additional logic within the skill to first see if this data is left off, then the state attribute is added so that the query still finds the match. There's also the possibility that the user requests something slightly different.

List microbreweries for South Carolina.

Given that there are a finite number of states, we can detect when this scenario is encountered, and then handle accordingly. This seems much more natural than responding back to add the state. There is a challenge with this approach as the distribution of microbreweries by state is not even. As of this writing, here's the distribution (only extremes shown).

Distribution of active Microbreweries (per BreweryDB)

So the logic needs to differentiate between states with a large number and small number, and then the response needs to be different. For California and other states, the user gets reprompted to provide a city. For states with a smaller number (less than 100) all of them are repeated back.

Now the skill processing in the getBeersAtBrewery() function also has logic in it to help translate user intent. First, here's how we tie the custom slot into the intent through the intent schema.

"intent": "
"slots": [ 
    { 
     "name": "Brewery", 
     "type": "LIST_OF_BREWERIES" 
    }
]

Within the function that is called, a variable will be passed in with what the voice processing interpreted - intent.slots.Brewery.value. Note that the Alexa Skills Kit does not enforce the value to match something in the slot, nor does it provide any additional attribute like a boolean or other indicator.

Now the skill needs to translate this value into a six character string that can then be used within the API's. To maximize the matching, one of the challenges to overcome is that common words may be left off. For example, here are three different utterances that are looking for the same thing.

What beer does Legend Brewing Company have?

What beer does Legend Brewing have?

What beer does Legend have?

Now in the slot, the "official" name of this microbrewery is "Legend Brewing Company", but the ASK won't enforce just that coming back, so the skill needs to handle any of these three potential values and return the same response. This is done by adding some data scrubbing in the function like this.

// save off last brewery to use in encouraging user to ask for brewery detail.
localDetailBrewery = breweryName;
// shorten the name to encourage not spelling out common words like brewery that are hard to pronounce
localDetailBrewery = localDetailBrewery.replace(" Company","");
localDetailBrewery = localDetailBrewery.replace(" Brewing","");
localDetailBrewery = localDetailBrewery.replace(" Brewery","");
localDetailBrewery = localDetailBrewery.replace(" Beer","");
localDetailBrewery = localDetailBrewery.replace("The ","");
localDetailBrewery = localDetailBrewery.replace(" Co.","");