Speed Tap is a game for Amazon Alexa that uses Echo Buttons to test a player's reaction time and concentration. The button cycles through random colors, and the player must tap the button when it's green. As the game goes on, the lights speed up and it gets more difficult.
The game incorporates a number of Alexa API's and AWS Services to create an engaging experience, yet maintain simple game play. This story will focus on explaining why these are needed and roughly how they work so other developers will understand what's possible and how.
This skill was built from scratch using Node, AWS services, and the alexa-app framework.
KEY CONCEPTS AND UNIQUE FEATURESFrom a high level, these are the things that make this game unique both as an Alexa Skill and from a development/programming perspective:
- Players compete against all Alexa users to get the World Record, which makes it more fun and interactive rather than just playing against yourself.
- The integrated web site at AlexaSpeedTap.com features an automatically-updated leaderboard showing high scores for each player. Optionally, players can seamlessly connect their real name on the leaderboard through the game itself, which uses an Alexa API.
- In-Skill-Purchases allow players to purchase extra lives to continue playing after they miss.
- The Alexa Sound Library and random phrase Speechcons (interjections) are used to make the experience more engaging and lively, as well as give it some personality. Kids especially love it.
Lambda
All the code for the game is in AWS Lambda. It's written in Node.js using the alexa-app framework. The same lambda function code is used both from Alexa events (users playing the game) and from CloudWatch Events, which triggers the background task to update the leaderboard.
DynamoDB
Data for both individual players and for the global world record statistics are stored in DynamoDB. Access to DDB is done using Amazon's aws-sdk package, with a custom layer I wrote over the top of it to simplify the API and provide an async/await interface over the existing Promise interface.
User data and state is persisted to DDB with each interaction, allowing the game to interact with the user in ways that evolve over time and are appropriate (see the User Experience section below).
S3
A JSON file with data about all the high scores and current world record is generated every 5 minutes by the Lambda function and stored in a public S3 bucket. This JSON file is then loaded via AJAX from the web site to display the leaderboard and announce the current record-holder in large text.
Background Task - CloudWatch Events
A "background task" is needed to regularly update the JSON file containing data for the web site. Since Lambda functions cannot be fired on a schedule, CloudWatch Events is used to trigger an event which fires the Lambda function.
The Lambda function code then detects whether it's being triggered by Alexa or CloudWatch Events, and behaves accordingly.
ALEXA API'S AND FEATURESA number of Alexa API's and features were used to create this game. Below is a summary of what was used and why.
Gadgets API
The Gadgets API is required to interact with Echo Buttons, and actually comes in two flavors which both must be declared as interfaces the skill requires.
GAME_ENGINE is the interface used to listen for events from the Echo Buttons.
GADGET_CONTROLLER is the interface used to actually set the light patterns on the buttons themselves.
These are two different API's, and getting them to work together seamlessly is a bit tricky to understand at first. It's important to read the documentation carefully, understand the limitations of each, and design your experience knowing what's possible and what isn't.
Customer Profile API
The Customer Profile API allows skills to request information about the user, such as name, email address, and phone number. When used, the skill presents a card to the user in their Alexa App to grant permission before it can access this information. Speed Tap uses this API to get the user's name to display on the web site's leaderboard, if they request it.
In-Skill Purchases
The In-Skill Purchases (ISP) API allows skills to request users to make actual purchases, using real money, to unlock content or features within the skill. Speed Tap uses "consumable" ISP's (CISP), which are items that can be purchased and used up, as opposed to one-time purchases and subscriptions which don't go away as they are used.
Speed Tap offers the user the ability to continue after tapping on the wrong color. Each new user is given 5 extra lives, and when they are gone they have the option of buying 10 more, using CISP.
Display
Speed Tap makes use of the Display Interface API to interact with the display capability on Echo Show and Echo Spot devices by showing a colorful "splash screen" when the game starts, and showing the current round each time the button is pressed.
As Echo devices with displays become more common, it is important for skills to supplement their audio output with visual cues and additional information.
Sound Library
The Alexa Skills Kit Sound Library is a collection of audio snippets that can be used in any skill. These audio files are stored by Amazon and can be inserted into any response, without the skill author creating, storing, or delivering them. For example:
<audio src='soundbank://soundlibrary/ui/gameshow/amzn_ui_sfx_gameshow_countdown_loop_32s_full_01'/>
With a growing selection of sound samples for many types of use, this library gives skill authors an easy way to add a little extra personality and fun to their skills. Speed Tap uses these sounds for music when waiting for the user to press the button to add drama and tension, and for positive and negative events.
Speechcons
The Alexa voice output can say certain words in a more expressive way than just the default Text-To-Speech. These "Speechcons" are limited to a certain set of words for each supported language, and are inserted into your SSML response using the <say-as>
tag:
<say-as interpret-as="interjection">abracadabra!</say-as>
Speed Tap uses Speechcons in response to each button press, to make the game more fun for users and add a little variety. Kids who play the game especially like the fun output.
Alexa Skill Events
The Skill Events API allows skills to respond to changes or events, without the user directly interacting with the skill. For example, a skill can be notified when a user enables or disables the skill.
Speed Tap uses the Skill Events API to immediately respond when a user enables or disables access to their Customer Profile information, using the Alexa App on their phone. When this event is received, Speed Tap calls the API to get the user's name immediately, so the leaderboard can be updated without requiring the user to play the game again.
USER EXPERIENCEThe user experience of voice apps is critical and nuanced. Unlike display mediums like the web and mobile, audio takes longer to present information, and the user can't process lists of options or scan a screen for what they want to do next.
Speed Tap implements a number of things to improve the user experience in ways that are not so obvious, but make a big difference.
A Shared Global Experience
Unlike mobile apps, most Alexa games are single-player, with no interaction with a community of other players. Users are isolated and typically play for a personal high score or to explore a game by themselves.
But... games are more fun with others!
Speed Tap takes a unique step in that direction by letting players see how their score compares to others. The leaderboard on the web site lists each player's individual high score and how many continues they used to achieve it. The ability to add your real name gives you bragging rights, and see if you are beating your friends.
What Does The Player Know vs What Is Actually True?
An important part of giving the user relevant information is keeping track of what they know, versus what is actually true.
In Speed Tap, this comes into play with the World Record. When a user plays, they may be told that the current world record is 25 taps. But the next time they play, the world record may have increased to 30, and they would like to know that.
A lazy skill would just tell the user the world record every time they start the game. But for a better experience, the skill needs to know what the World Record was the last time the user was told, and only tell them if it's changed. This way, they get the information they want, but are not forced to endure the same sentence about the current world record every time if it's the same.
Speed Tap keeps track of things like this by keeping two separate areas of data. One holds the current true information (like the current World Record and who holds it), while the other holds the same information but in the state the user was last aware of. These are then compared to determine what the user should be told. The code looks like this:
sayif `Your high score is ${user.high_score} `;
// Check to see if there is a new world record to inform the user about
if (game.world_record) {
if (user.world_record && user.world_record < game.world_record) {
say `and there is a new world record. The highest score is now ${game.world_record}`;
}
else if (game.world_record_user && game.world_record_user===request.data.session.userId) {
say `and you still hold the world record`;
}
else {
say `and the world record is ${game.world_record}`;
}
user.world_record = game.world_record;
}
Incremental Experience / Session Tracking
As users play a game multiple times, they shouldn't be presented with a long audio intro each time. When they use features multiple times, they shouldn't be required to endure an explanation about something they already understand.
For this reason, it's important for skills to track a user's experience as they interact with the skill, so the skill knows when it can shorten the interactions. This is exactly what humans do, and by doing so, a skill seems more natural.
Speed Tap keeps an "experience" object for each user, which is updated with each type of interaction that depends on it. Then, output is changed based on the user's current experience level. Here's some example code:
// Open with a more different intro depending on how often the user has played
if (request.experience("session_count",false) <= 1) {
say `Welcome to ${speedtap}. This is a game of quick reactions and concentration. Would you like to hear a quick explanation of how to play?`;
}
else {
if (request.experience("session_count",false) < 4) {
say `Welcome back to ${speedtap}.`;
}
else {
response.say(`Welcome back.`);
}
}
The experience object is also used to determine if an explanation should be given about how the leaderboard works, the permissions needed, etc.
Randomized Output
Skills that repeat the same output over and over become very repetitive and feel unnatural. It's important to randomize at least parts of the output so the skill doesn't feel so "stiff".
For Speed Tap, simple randomization is done using a text post-processing technique I have implemented in other skills.
When text is output, {braces} can be put around any words or phrases that are candidates for randomized synonyms. Then, a list of possibly synonyms are created and just the main key is used in the output strings. The words get automatically randomized, making the skill feel more natural and less robotic.
// The synonym list
const outputSynonyms = {
"Okay, ": ["Okay, ","Alright, ",""]
};
// A method in the response object
"randomizeSynonyms": function(synonyms) {
try {
let ssml = this.response.response.outputSpeech.ssml;
ssml = ssml.replace(/\{([^\}]+)\}/g, function (m, m1) {
if (synonyms && synonyms[m1]) {
let s = synonyms[m1];
if (s.length) {
// simple array of synonyms
return s[Math.floor(Math.random() * s.length)];
}
}
return m1;
});
this.response.response.outputSpeech.ssml = ssml;
} catch(e) { }
}
// Example
response.say("{Okay, }let's play."); // Could be: "Alright, let's play" or "let's play"
// Randomize synonyms in the output during app.post()
response.randomizeSynonyms(outputSynonyms);
IMPLEMENTATIONThis section will dig into the implementation details of the skill and show code snippets that demonstrate how each type of functionality is accomplished.
Trigger Switching
The Lambda function needs to behave differently based on whether it's being triggered from CloudWatch Events (to update the leaderboard) or from Alexa. The Lambda function handler switches with this code:
// connect to lambda
exports.handler = function(event, context, callback) {
if (event && "aws.events"===event.source) {
// Scheduled Event!
app.scheduled(event).then((response)=>{
callback(null,response);
}).catch((e)=>{
callback(e);
});
}
else {
// Alexa Request
log("Alexa Request");
app.handler(event, context, callback);
}
};
Button ListenerEvents
Skills listen for button events using the Gadgets API. The skill returns a Directive defining the exact conditions which should trigger the skill to fire. This is an example button listener directive that will cause the skill to be called when the button is pressed or if it times out.
{
"type" : "GameEngine.StartInputHandler",
"timeout" : 25000,
"proxies" : ["button"],
"recognizers" : {
"button_down_recognizer" : {
"type" : "match",
"anchor" : "end",
"fuzzy" : false,
"pattern" : [{
"action" : "down"
}
]
}
},
"events" : {
"button_down_listener" : {
"meets" : ["button_down_recognizer"],
"reports" : "matches",
"shouldEndInputHandler" : true
},
"timeout" : {
"meets" : ["timed out"],
"reports" : "history",
"shouldEndInputHandler" : true
}
}
}
Light Directives
There are several different formats of GadgetController.SetLight Directives that change the color of the buttons based on their state. The API documentation goes into detail about what these states are and how they work.
This is a sample directive that cycles through colors and waits for the user to press the button when it's green. When the code receives the button press event, it also receives the color of the button when it was pressed. This allows the code to check if the user pressed it when it was green.
{
"type" : "GadgetController.SetLight",
"version" : 1,
"targetGadgets" : [],
"parameters" : {
"triggerEvent" : "none",
"triggerEventTimeMs" : 0,
"animations" : [
{
"repeat" : 255,
"targetLights" : ["1"],
"sequence" : [{
"durationMs" : 1000,
"blend" : false,
"color" : "0000FF"
}, {
"durationMs" : 1000,
"blend" : false,
"color" : "FFA500"
}, {
"durationMs" : 1000,
"blend" : false,
"color" : "FF0000"
}, {
"durationMs" : 1000,
"blend" : false,
"color" : "FF00FF"
}, {
"durationMs" : 1000,
"blend" : false,
"color" : "00FF00"
}
]
}
]
}
}
Persistence
All persistence is done using DynamoDB (DDB).
Each time the skill is called, it checks to see if the session has a user object. If it doesn't, it tries to load it from DDB for the given userId in the request. If the user record exists, it loads it and stores it in the session so it doesn't need to be retrieved with every call to the skill.
The user data is persisted back to DDB whenever it changes. For example, a new high score, the session count increments, extra lives are used, etc.
In the event that the In-Skill Purchase flow is triggered, the skill is actually required to exit, and the session ends. Then when the player completes the purchase, a new session is created. For this reason, in some cases it's required to persist the state of the current game. But in most cases, the current game's round and score need to be cleared out of the user record and not persisted.
The general function to persist the user record does this scrubbing of data that shouldn't persist, and calls the DDB persistence layer to store the data.
async function persist_user(persist_game_state) {
// Persist user session back to db if it has changed
if (user) {
let u = JSON.parse(JSON.stringify(user));
if (!persist_game_state) {
delete u.round;
delete u.lives_used;
delete u.state;
delete u.buttonConnected;
}
delete u.game;
delete u.listenerRequestId;
await ddb.put(app.user_persistence_table, u);
}
}
DDB Layer
The ask-sdk module for javascript has a lot of great functionality, but its API was still too low-level for my tastes. I wrote a wrapper around the ask-sdk functions that abstracted functionality a bit more.
One of the things the wrapper does is to extract the Promises returned from the ask-sdk functions, and instead expose async functions so my code could use await.
The wrapper code is included in the source, and I intend to clean it up and distribute it as an NPM module. Here is an example function which simplifies the retrieval of a single record from DDB:
'get': async function(table, keyAttribute, keyValue) {
let params = {TableName:table, Key:{ [keyAttribute]:keyValue } };
return docClient.get(params).promise()
.then( (item)=> {
if (!item || !item.Item) { return null; }
return item.Item;
});
}
Session/Experience Maintenance
The user object in the game source includes an "experience" object that stores the user's experience with the game as they play. As part of this experience object, a session_count attribute is incremented each time the game is launched.
The experience object also contains keys that track whether the user has heard a certain response yet. If they have, then the next time it's triggered they get a shorter version. For example:
if (round===1 && request.experience('intro_1')) {
say `That was easy, but now the lights will get a little faster every round. How far can you go? Keep going.`;
}
else if (round===1) {
say `Nice, Keep going.`;
}
Text Response Post-Processing
Building text output that gets translated to speech has some annoyances, such as pluralization and is/are, among other things. Speed Tap includes a text post-processing function that simplifies common use cases considerably. This post-processing is done automatically to the output SSML after every skill call.
Here are some examples of what the post-processing function can do.
let coins=1;
say `There {are} ${coins} coin{s} left.`;
The post-processor handles {are} and {s} and looks for nearby numbers to determine how they should be handled. In this case, it sees "1" and the output is:
"There is 1 coin left."
If coins==2, then the output from the same call to say would be:
"There are 2 coins left."
State and Contextual Intents
Intents lack state and context. Meaning, there are global AMAZON.YesIntent and AMAZON.NoIntent intents which get fired when the user says Yes or No. But those intents don't know which question was asked, so a common approach is to build logic inside each of those handlers that knows what was asked and acts appropriately.
Instead, I built a layer over the alexa-app framework that allows me to create "contextual intents". When the user is in a certain state, a Yes or No will fire a function within the context the user is in, rather than the global intent.
For example, if the user is asked whether they would like to continue, their session is updated to reflect that they are in the "continue" state, and the Yes or No are handled appropriately:
app.intentMap({
"continue": {
[YES]: async()=>{
await continue_game();
}
,[NO]: async()=>{
await end_game();
}
}
});
In-Skill Purchases
In-Skill Purchases allow players to buy extra lives to continue playing when they mess up. This is actually a complex topic with a lot of implementation quirks and details. A few significant notes are worth mentioning:
- ISP can only be configured and deployed using the CLI tool.
- It limits your skill to the US region (which is why Speed Tap is not in UK).
- An API call is required from within the skill code, to make an https request to their API endpoint.
- When the ISP flow is initiated, the skill exits completely and the session ends. The Alexa services takes control and guides the user through the purchasing flow. When it's complete - whether success or fail - the skill is launched again with a "Connections.Response" event type that indicates the status of the purchase.
- The skill must handle error conditions in the purchase flow and if the user declines the purchase. It must also handle the case where the user asks for a refund after a purchase.
- In the case of Consumable ISP's, as Speed Tap uses, the skill must remember and store which items the user has purchased and how many they have left. The Alex service does not maintain inventory.
To trigger an ISP purchase flow, a directive must be returned from the skill and the session must be closed. The directive looks like this:
{
'type': 'Connections.SendRequest',
'name': 'Buy',
'payload': {
'InSkillProduct': {
'productId': "XYZ"
}
},
'token': "arbitrary-token"
}
Alexa Skill Events
Alexa Skill Events are what allow skills to respond to events when no user is actively interacting with the skill. In the case of Speed Tap, it handles the case where the user grants or revokes permission to use their real name.
The code below is how Speed Tap responds to this event, retrieves the user's real name using the API, updates their user record, and persists it.
app.on('AlexaSkillEvent.SkillPermissionAccepted', async()=>{
try {
let user_id = request.data.context.System.user.userId;
user = await ddb.get(app.user_persistence_table, "userid", user_id);
let name = await app.api("/v2/accounts/~current/settings/Profile.name");
user.name = name;
user.linked = true;
await persist_user();
} catch(e) {
console.log(e.message);
}
});
To respond to these events, the skill must register that it wants to be notified of them. When using the CLI, these are stored in skill.json under manifest.events.subscriptions:
"subscriptions": [
{
"eventName": "SKILL_PERMISSION_ACCEPTED"
},
{
"eventName": "SKILL_PERMISSION_CHANGED"
}
]
Sound Library
The Sound Library is a great way to include sounds in skills with very little effort. The sounds are all listed by category on the Sound Library page.
You don't need to do anything special to use these sounds. Simply find the sound you wish to use, copy the SSML content listed, and insert it into your response.
<audio src='soundbank://soundlibrary/animals/amzn_sfx_bear_groan_roar_01'/>
Using the soundbank:
protocol minimizes latency, I assume because the sound files are stored on some Edge server close to the Alexa internal servers.
Splash Screen / Display
When the skill starts, a graphical splash screen is displayed. This is accomplished by returning a directive in the response using the Display interface. A reusable function wraps the functionality up in one place:
function display_splash_screen(request,response) {
if (has_display(request)) {
response.directive({
"type" : "Display.RenderTemplate",
"template" : {
"type" : "BodyTemplate1",
"backButton" : "HIDDEN",
"backgroundImage" : {
"contentDescription" : "",
"sources" : [{
"url" : "https://alexaspeedtap.com/splash.jpg",
"size" : "MEDIUM"
},{
"url" : "https://alexaspeedtap.com/splash-square.jpg",
"widthPixels":640,
"heightPixels":640
}
]
}
}
});
}
}
This is not necessarily representative of best practices for display, but it gets the job done.
When using Display directives, the skill must detect if the user's device has a screen, and not send the directive if they do not, otherwise it will throw an error. The has_display() function encapsulates that check.
const has_display = function(request) { try { return !!request.data.context.System.device.supportedInterfaces.Display; } catch(e) { return false; }};
The RENDER_TEMPLATE interface must also be registered for the skill. If this step is skipped, any attempt to return a Display directive will cause an exception.
Writing to S3
When the background task runs and a leaderboard JSON file is created, it must be written to an S3 bucket from within the skill code.
I wrote a simple reusable function to write an arbitrary javascript object to my bucket:
const putObjectToS3 = async function (o,filename) {
let s3 = new AWS.S3({'region': 'us-east-1'});
let params = {
Bucket: "alexa-speed-tap",
Key: filename,
Body: JSON.stringify(o),
ContentType: "application/json",
CacheControl: "no-cache",
ACL: "public-read"
};
return s3.putObject(params).promise();
};
CORS
Because the leaderboard JSON file is stored on AWS S3, being served at a generated Amazon URL, browsers will prevent it from loading from other domains, like AlexaSpeedTap.com.
In order to permit the browser to access the content on a different server, the hosting server (S3) must explicitly allow these kinds of requests.
This can be configured on an S3 bucket under Permissions --> CORS Configuration using a policy like the one below.
<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<CORSRule>
<AllowedOrigin>*</AllowedOrigin>
<AllowedMethod>GET</AllowedMethod>
<MaxAgeSeconds>3000</MaxAgeSeconds>
<AllowedHeader>*</AllowedHeader>
</CORSRule>
</CORSConfiguration>
CONCLUSIONBuilding for Echo Buttons is a unique challenge, because the API's and actual gadget functionality is a bit tricky. But once expectations were adjusted to what is actually possible, and the basic functionality was built, polishing the experience and making the game fun was relatively straight-forward.
Although Speed Tap is a simple game to play and understand, it uses many Alexa features and concepts that make it truly unique, and it provides an experience for players unlike other games.
I hope you enjoy it!
Comments