Step 1 - Idea
Idea came to me in August after Alexa Dev Days in Seattle. That's where I first heard about Echo buttons. I was curious about capabilities of these devices and the problems they can solve beyond their obvious fit to trivia games. I forced myself to think about other scenarios by experimenting with artificial constraints. What if you can act only on release event? What if you cannot use your hand to press them? That's how I stumbled upon push-up scenario. Previously, I used a phone to count push-ups by pressing the screen with my nose and that was awkward. Echo Button seemed like a much better option.
Step 2 - Experiment
I wanted to experiment with this idea. Next time I was doing shopping in Bellevue Square I came to Amazon Bookstore. They had Echo Buttons on display. I picked one of them, went behind a cabinet, laid on the floor and tried it out! It was great! That day, I came home with a new set of buttons and a head full of plans!
Step 3 - Interaction model (design)
This was my second skill and this time I knew that it is best to start with the guidance from the "Designing for Conversation" course and create a detailed script. This helped me define the main flow of interaction and discover several use cases (e.g. how users define their current level).
Step 4 - Skill state model (design)
What was different from my previous skill was that this time there were clear states in the game and both intent and event handlers had to be aware of them. For example, each Echo Button game has to handle roll-call state, and users must not be able to progress to other functionality without first registering their buttons. I defined several states.
I tried to keep the number of states at the absolute minimum because I knew they can multiply the development effort of horizontal functionality (e.g. help handler). I also decided to use Dialog Management capabilities of Alexa to reduce states. For example, setting target level is a complex intent which requires confirmation. By delegating dialog management I could simplify the skill considerably.
Step 5 - Interaction model (implementation)
Implementation of the interaction model was straightforward. What wasn't obvious was the different ways people call the "number of push-ups they want to be able to to do". I defined it as "target" slot with a type which covers all the ways me and my friends could refer to that "thing".
Step 6 - Development Environment
I decided to implement my skill in TypeScript. It seemed like a natural choice since Alexa Skill Kit SDK for Node.js is written in this language and there are many great resources which cover this platform. Additionally, there are many blog posts about developing Alexa Skills in JavaScript (e.g. How to Localize Your Alexa Skills) and their snippets can be included directly in a TypeScript code-base.
I generated my project with Alexa Skills Kit Command Line Interface (ASK CLI) and later I've been developing it in Visual Studio Code with Alexa Skills Kit (ASK) Toolkit extension. After spending some time on the configuration (including defining "ask.profile" workspace settings!) everything worked like a charm. I was able to make changes to the skill, quickly deploy it to Lambda, interact in the simulator and see all the logs in the CloudWatch.
Step 8 - Assets
When I was building the skill I started with Alexa Skills Kit Sound Library. As I was adding more functionality it became obvious that I will need more audio assets. In particular I needed relaxing sound for the break between push-ups. I searched though several sites which offer royalty-free music. I landed on http://www.freesfx.co.uk because it had great music and was transparent about the licenses. Other sites I visited would deliberately mix free and paid assets in their search results. That was a red light for me.
Step 7 - Skill implementation
This step took by far the longest. Even though the skill was relatively simple, there were many scenarios which had to be handled independently. I followed the "can-handle" (chain of responsibility) pattern promoted in the Skills Kit and defined many handlers. Every time I noticed there was some common functionality which appeared in several handlers, I would extract it to the common components. For example, I gathered definitions of animations and input handlers in a single module.
Step 8 - Beta testing
When the skill was ready I tested it with my friends. I did it by carrying my Echo Dot everywhere. I had it plugged into an Anker power-bank and connected to the wi-fi shared by my phone. That was great because it completely eliminated the problem of connecting to their networks and any other concerns they had. I would ask them to try it out and give me feedback. Then we would unplugged it and continue with the social activities.
One thing that came again and again was that default voice was different from what you would hear if you went to the gym. It wasn't "powerful" enough. When joking about "Who would be your perfect, personal trainer?", my friend Marina (thank you!) mentioned that only Arnold Schwarzenegger could force her to exercise. Initially, I didn't take it seriously, but I looked into it nonetheless!
Step 9 - Voice impression
When I searched for Arnold's impressions I came across several videos on YouTube by Joe Gaudet (http://joegaudetvo.com/). The resemblance was staggering! I contacted Joe and asked if he could join this project. He liked the idea and said he would be open to accept such order! I prepared script and one week later I had perfect recording which was the last missing piece.
I added Joe's voice the same way somebody would go about localizing the skill for a new market (e.g progressing from en-US to en-UK). I followed recommendation from a post "How to Localize Your Alexa Skills".
Step 10 - Demo
Finally, I had to prepare the demo for the submission. I recorded it at the gym in the apartment complex I live in. I also tried to film it in the Amazon Bookstore, but that's a longer story.
Lessons Learned
Detailed script was the biggest asset I used throughout the development of the skill. It helped me frame my thinking and make sure the conversation my skill supports is natural and engaging. Of course it didn't happen overnight. I updated it many times to include feedback I was receiving from my wife and friends.
Comments