The Inspiration
From Amazon’s first announcement of the Echo in late 2014, we were excited about the possibilities of a hands-free, “always listening” device that could interact with users on demand – a key aspect of the Internet of Things. We batted around the possibilities when the Alexa Skills Kit (ASK) first came out in June 2015, and periodically shot each other possible product ideas. Voice user interface (VUI) design is an area we were all intrigued with exploring. We believe that it is the next major disruption in computing.
Finally, in January of this year, a group of us took a leap of faith and formed a company called Skillsai, based on building skills for the Amazon Echo™ platform. While developing games is not the focus of Skillsai, creating Hangman was a great way for us to explore what was currently possible with the platform and have fun at the same time. When Hackster and Amazon announced this contest, we thought we would share some of what we have learned with other developers here.
The Project
Designing a decent VUI application is not about just taking physical software UI and making it verbal. The best practices page on Amazon should be required reading for all skill developers, though we don’t profess to say we executed perfectly on this. There are many use cases that just don’t fit the auditory nature of interacting with Alexa very well, so what to build and how to do that efficiently was a big question for us.
Our main goal was to build something more challenging than the “one-hour” template trivia skill that would allow us to exercise all of the functionality Amazon currently provides, because first and foremost, this was a learning activity for us. We also didn’t want to use the Lambda service because our “real” products use our own web service, so we created this game on the same infrastructure we are using for those products.
Ultimately, we decided to create a Hangman game in order to learn about the ASK platform, how to create an intent schema, sample utterances and custom slot types as well as to practice implementing SSML, adding sound clips, and utilizing the Alexa card app to the best of its abilities while pushing the boundaries of what you could do in a game without any visual UI.
The Implementation:
In order to play Hangman, first you need words. We looked at several free word lists and chose the free WordNet database that requires only attribution in order to use it, which we include on our website at http://www.skillsai.com/hangman. We ran some scripts to clean it up more for our purpose, deleting acronyms, proper nouns, and words less than 3 characters. This resulted in a list of over 60,000 words and definitions that were 3 to 31 letters in length.
Our initial design included asking the player to choose the word difficulty from easy, medium, hard, or genius level. As the game took shape, we realized that determining if a word was easy or hard was too subjective – short words were actually harder than long words, but was a 8-letter word easier or harder than a 10-letter word?
We decided to modify the game to ask the player to choose the length of a word, and then we randomly chose a word of that length from the database. Alexa keeps track of the letters guessed, and communicates which guesses were letters in the word. For instance, if the word Alexa chose was “fox”, and the player first guessed an "F", Alexa would report that so far you had “F, blank, blank" and ask you to "Pick another letter." If the player repeats an earlier guess, Alexa notes that, and she doesn’t count that letter as a miss.
Visualizing the word in your head was more challenging than some of our testers liked-though doing that is good exercise for your brain! We decided to add a new play method called “category” in order to make guessing the word a little easier. The categories (colors, cooking terms, musical instruments, etc.) was created as another custom slot type. However, rattling off a fairly long list of choices (13 currently) was too much to listen to without becoming impatient, so we added 3 master list choices and then guided users through the sub-category options.
However, it isn’t necessary to repeat all instructions in every game or for a user to go through all the navigation of selecting a sub-category. Once a person has played a few times, they will mostly remember the options. And they can also ask for help at any during the game.
Keeping the interactions with the user short but helpful is a real, but important, challenge in VUI design. So, we also store whether this is the first time someone has played Hangman, and if not, we reduce and simplify the verbiage going forward.
Our Hangman game logic is written in PHP in an MVC design with about 10 specific functions, many of which are just long lists of switch statements. We also developed a few common functions that we can re-use for any skill we build. The Hangman functions keep track of the guesses, checking if the letter is in the word, how many tries are left, and whether the player guesses the word or fails to do so after 6 misses among other things.
When one game is complete, Alexa asks if you would like to play again – or you can say “I quit” or “I give up” anytime she asks you for a response. If the player gives up in the middle of a game, Alexa reports the chosen word to the player – eliminating an avenue of frustration for the user.
If the player has to stop the game in the middle of playing, we save the current state of the game so that when the player returns, it picks back up from the point where they left off. In addition, we added a number of goodbye “greetings” that are randomized when the player quits the game, which is one of the functions in the common code we can now use for any skill.
To further make the game more entertaining, and to understand how sounds are implemented, we added a number of free audio clips that are randomized and based on whether the user won or lost. It took some time to get this working correctly, and we had to run some of the mp3 clips through a converter before they worked with Alexa – even though they were already encoded as mp3s… Knowing the command syntax for this was also important:
ffmpeg -y -i input.mp3 -ar 16000 -ab 48k -codec:a libmp3lame -ac 1 output.mp3
The complete list of requirements for implementing audio are:
- A valid (MPEG version 2) MP3 file
- No clip longer than 90 seconds
- Encoded with a bit rate of exactly 48 kbps
- Encoded with a sample rate of 16000 Hz
- Stereo format must be JOINT STEREO
- The file must be hosted on a HTTPS endpoint
- Hosted on a domain that presents a trusted SSL certificate (not self-signed)
Note: Thanks go to MisterDan who posted this information in the developer’s forum. The PHP code then is fairly simple:
$soundfile=commonGetSoundfile('win');
$response="<speak>You guessed correctly! You win! <audio src=\"{$soundfile}\" /> The word was {$word}, {$spelling}. ,Would you like to play again?</speak>";
$soundfile=commonGetSoundfile('lose');
$response="<speak>The letter, {$letter}, is not in the word. You lost. <audio src=\"{$soundfile}\" /> The word was {$word}, {$spelling}., Would you like to play again?</speak>";
Note the use of commas, which causes Alexa to pause a bit more between completing one sentence and starting a new one. When and where to place those commas takes a bit of trial and error, as do using metaphones in order to help Alexa understand what is being said.
At that point, we felt like our Hangman experiment was complete. We set the game aside, happy with what we had learned. Then, in mid-March Amazon added the ability to put images on the cards that show up in the Alexa app. Thinking that using the cards in the app would help those players who struggled to visualize the word mentally, we came up with a dozen or so images to draw the Hangman scaffold with number of tries left and worked to format the card responses to put in writing what Alexa gave you verbally. String manipulation is never a lot of fun to do, but the end result turned out pretty well.
When Hackster and Amazon announced this contest in mid-April, we decided to go ahead and publish the game and see how it fared against the other skills submitted.
The Code:
We listed our intent scheme in the code window at the end, and our sample utterances looks like this:
- ChoosePlayMethod {PlayMethod}
- ChoosePlayMethod {PlayMethod} of {Length}
- ChoosePlayMethod for {Category}
- ChoosePlayMethod using {PlayMethod} of {Length}
- ChoosePlayMethod using {Category}
- ChooseMethodFilter {Length}
- ChooseMethodFilter {Category}
- ChooseALetter how about {Letter}
- ChooseALetter the letter {Letter}
- ChooseALetter {Letter}
This is all that's needed on the Amazon side of the architecture. The rest of the code resides on our own web server.
Key Learnings:
Little did we know that we would struggle more with publishing the game than we did while writing it. We failed the security check three times until we figured out what was required, mostly through trial and error and re-reading a variety of threads on this topic in the developer forums.
Of course, if you use the Amazon Lambda service, there is nothing to worry about regarding the security check. And if we had used Java, there is a library that Amazon points to that you can use to handle the various checks needed to get through the certification process.
However, If you are going to use PHP on an external server for your Amazon Echo skill like we did, the following tips may be helpful.
- Use getallheaders function.
$headers=getallheaders();
- This will allow you first to confirm that there is a Signaturechainurl header.
if(!isset($headers['Signaturecertchainurl']){...fail...}
- You will want to save the pem file locally.
Pem = file_get_contents($headers['Signaturecertchainurl']);
- Use php//input to get the request.
$postdata = file_get_contents("php://input");
- Use openssl_verify to confirm it is https.
$ssl_check = openssl_verify (postdata,base64_decode($_SERVER['HTTP_SIGNATURE']),$pem );
if ($ssl_check != 1) {...fail...}
- Use openssl_x509_parse to check the pem.
$parsedCertificate = openssl_x509_parse($pem);
- Make sure to send a 400 error code on failure. This one is important and critical to passing the test. Sending “user friendly” messages is not expected and fails.
header('HTTP/1.1 400: BAD REQUEST', true, 400);
exit;
Conclusion:
Creating skills for the Amazon platform is a lot of fun, and the platform gets a little more feature rich every month. While there is something of a learning curve, particularly once you start creating more complex applications, we still feel that VUI is the next big thing, and we are excited to be a part of it. We hope that more users will try out our Hangman game and let us know what they think.
Comments