Last week, I wrote about my struggles with Alexa, Amazon’s virtual voice assistant. Sometimes you just have to stare a problem down to make it go away. It worked, and a few working days later, my new Alexa skill is ready to go.
Sometimes it helps to see the whole picture. Think about making a nice dinner. You may have recipe books full of foods you would like to eat. Easy enough if you are adept at cooking.
Coming up with a complete dinner involves another level of orchestration. You need to select a few specific recipes and come up with a menu that has a nice mix of foods. Some protein, a little starch, a nice mix of vegetables, a tasty dessert. Then you have to prepare the food in the right order and with just the right timing to come out when it’s time to eat. Not too soon, and not in dribs and drabs that leave your guests wondering if they will go home hungry.
So, I am going to share how I came up with my particular meal, so to speak. This is a semi-technical rundown of everything I bumped into and had to figure out to get my skill written and published.
Want to know how I did it? Keep reading for lots of gory details.
Guess the Total
My skill teaches Alexa how to play Guess the Total, a game that I invented with my daughter during her daily bath time when she was still little but old enough to count. I have explained the game previously, but allow me to recap.
The game generally requires two or more players. In the skill I programmed, it’s just two players: you vs. Alexa. Each player thinks of a secret number within a range, say from 0 to 10. Then they take turns guessing the total of the secret numbers. Whoever is closest wins.
I used to call this game Guess the Sum. When I got around to trying it on my Echo Dot, I found that Alexa misinterpreted the invocation command half of the time. Every other round, she kicked off a lively game of Guess the Song. Um, Alexa, stop.
Through this turn of events, I learned an bonus marketing lesson. When you quit Guess the Song, the skill tells you about other skills by the same publisher. Good idea, despite my mild annoyance after the fifth or sixth time. Alexa, really, stop.
So I renamed my game Guess the Total. That sounds better anyway, and with Alexa it’s all about how things sound.
So now you invoke my skill by saying, “Alexa, open Guess the Total.” She responds with instructions for how to start the game or get directions. In skills parlance, these are called intentions, and they are built into my skill model. When you say “play the game,” Alexa guides you through it.
As the skill logic stands, Alexa always lets you go first. She asks for a couple of numbers, and you have to give answers that make sense. When your turn is over, Alexa takes her turn and reveals the winner. She shares the number she picked and her guess. Then she reveals the sum of the secret numbers and explains who was closer. If you win or tie, she tells you how many points you get. She awards an additional point for an exact guess.
Then she promptly forgets everything, and asks if you want to play again. Pretty much the way I played with my daughter when I invented the game.
Now you might think that telling Alexa your “secret” number is a bad idea. After all, she’s your opponent, and it would be sooo easy for her to cheat. Except that I programmed the skill to play fairly, so you don’t have to worry about that.
That’s it. Simple enough. The basic mechanic of the game is working. You can get from start to winner in a matter of seconds. Players are nudged back on track when they say something weird. For instance, when asked for a secret number, the player might say “banana.” Alexa will recognize that banana is not a number and ask the player again.
While Guess the Total has lots of room for improvement. at this point I think it’s good enough to share. By pushing out the minimum amount of scope that is playable, I got to learn about the publishing process, and I can start getting feedback from real players.
Plus, I already have my next release planned. So I can push improvements a little at a time to the delight of players around the world. Oh yes, I will take the world incrementally by slow-moving storm.
I know. Dry humor is so hard to read on a web page.
Moving right along…
Learning New Technology
You may not realize that the phrase “new technology” has layers. The technology might be new to you, like when a caveman is found frozen in an icecap and is revived to discover the magic of electricity and combustion engines. Or new technology might literally be brand new.
Alexa is new relative to gas-powered cars but mature enough to have whatever gaps and kinks there used to be worked out. You might still think of it as leading edge technology without being on the bleeding edge.
Lots of technology I use when writing software is somewhere along the newness spectrum. Here’s my general approach for tackling new software technology.
- Read an inspirational overview of the awesome reasons to use the technology. This often includes hints of riches, the good life, or saving the world. Usually I have something in mind that I am trying to accomplish, and I’ll imagine how this technology will help with my mission.
- Work through a “Hello World” tutorial. This is a simple way to get the basic environment set up and to prove that at least the simple stuff works. Also, if you cannot get “Hello World” to work, there’s not much point in continuing. Usually, it’s fine and introduces core concepts of the tech.
- Try to adapt “Hello World” to a non-trivial purpose. Basic tutorials tend to be too simple for vetting the technology. You need to throw a few non-trivial problems against it to get a sense of whether the tech will help and what the gaps might be. I like to use the solution I imagined in step 1 as a proving ground. By going deep on a real problem, I tend to discover documentation, example code, and discussion boards.
- Dig into a few thorny problems. Beyond that, a variety of factors influence where things go. Can I get a happy path to work where users pick all of the right answers? Do I need to restructure the “Hello World” project to handle added complexity? Have others figured out “best practices,” or at least a few tricks that help?
- Decide to fish or cut bait. At some point, the tech is either good enough or not worth the struggle.
Luckily, Alexa is works mostly as expected so far. In some cases, I had to adjust my mental model or approach. That’s called learning, which is part of the fun.
The Elements of a Skill
Skills are relatively straight-forward. They start with an invocation name, the phrase you tell Alexa to kick things off. Each skill is tuned to fulfill a set of intents, things that people want Alexa to do. The default skill model includes some common pre-built intents for things like getting help and quitting.
A view from the Alexa Developer Console
For Guess the Total, I have two intents:
Intents are triggered via utterances, things people might say when they want something. In the skill model, you supply sample utterances for each intent. These are supposed to be conversational and should include various ways to say the same thing.
For instance, to get directions, you can say “Give me the directions,” or “Tell me how to play,” or “What are the rules?”
Utterances that imply the “Play Game” intent
An intent performs an action or provides some information. The
GetDirections intent gets Alexa to explain how to play Guess the Total. Kind of a one-off intent, a simple request-response interaction.
PlayGameIntent is more involved. By providing Alexa with a dialog model, she is able to manage a conversation with the player and collect the information needed to fulfill the intent. In the case of Guess the Total, Alexa needs to gather a secret number and a guess from the player. This information is placed into slots to be used in the logic that handles the intent.
Slot definitions for the “Play Game” intent
Let’s say that Alexa asks the player for a secret number, and the player says “six.” The
playerSecretNumber slot gets populated with a 6. Then Alexa asks for the player’s guess, and she says “twelve.” The
playerGuess slot holds the value 12. Everything is peachy, and Alexa takes it from there.
Now let’s say that the player is being tricky and says “pineapple” for her guess. The
playerGuess slot gets populated with ‘?’ because Alexa is expecting an integer. That’s where my code takes over and directs Alexa to reprompt the player for a better slot value. The same kind of re-prompt kicks in if the player says a number that is out of bound, such as 42.
As far as I know, that touches on the main the elements of an Alexa skill. It’s not that complicated.
If you are technically inclined, you might want to look at my code on GitHub. I could do a walk-through of my code in a future post. Let me know if you’re interested. For now, let’s not scare away everyone who doesn’t write code for a living.
Tricks I Learned
If you are thinking of creating an Alexa skill of you own, here are my pointers on pulling together a three-course meal.
I learned two general approaches to creating a new skill.
One is to use the Alexa Console to create the skill. Then use the AWS console to create a Lambda. Then hook the two together with a few well-placed clicks and a little cut-and-paste action on the skill ID and Lambda ARN. You might call this the coder-less approach. More clicking on things and less typing.
The coder-less approach allows for some variation, particularly with how the Lambda function is created. You can start from scratch, use a blueprint, or draw from the serverless application repository.
I will admit to not quite getting the difference between blueprints and the serverless repository. I attempted to use pre-built examples of Alexa skills that seemed close to what I was trying to do with the game. None was close enough, and the naming seemed to linger. I ended up with new roles in IAM called “whatever-the-heck-that-sample-skill-was-called” backing my “guess-the-total” Lambda function. Not entirely satisfying.
I also tried using the microservice-http-endpoint app pattern. That hooked up a downstream DynamoDB instance, which will be useful at some point, but not just yet.
I also tried starting from scratch. That’s a journey you don’t want to start lightly, requiring the most knowledge of AWS technologies and how to glue them together. Probably a good idea once you have things under control on the skill and need to expand beyond your custom logic.
At some point, I tried creating a role that would be shared across all of my Lambda functions. Big plans for the days when I’ll have more than one. Well, that didn’t work. Somehow I messed up the access to CloudWatch, so the Lambda logs were not being captured.
I gotta have my logs. How else will I see what going on and more importantly what’s going wrong? So I threw away that attempt.
In all of these cases, the Lambda function started life as a single file, which is too simplistic for my use case. I prefer the project structure that the Ask CLI provides, with a zip-able set of files to upload to Lambda and an automated deployment script that I didn’t have to write.
Another pesky problem I kept bumping into has to do with regions. Presumably Alexa skills can work from a few different regions. I prefer using Oregon since it is closest to me. But the CLI tools kept insisting on sending my Lambda code to Virginia. The problem could be a lack of understand on my part of how to nudge the CLI to use Oregon. I couldn’t find any region overrides in the documentation.
I prefer automated deployment scripts that do what I want. *hrmph*
The solution for now was to let go of my desire for control over the region. Laziness wins the day.
Based on my experience with the Alexa and AWS consoles, I think the better approach is to create skills from the command line. In my case, something like:
ask new --skill-name guess-the-total
worked just fine. I was able to open the new project in VS Code, make a few adjustments, and issue the command
to push everything to the cloud. The CLI takes care of creating the skill, building the skill model, setting up Lambda, and pushing out that code.
The Lambda function gets set up in Virginia? No problem. At least it works. If you think you want more control than that, I say let it go. After all, we are just getting started here. Better to go with the flow and deliver something now than to be stubborn and stay stuck. Right? Right.
Alexa, show me the way.
The Alexa Console is for managing the skill. It offers a helpful editor to modify, save, and build your skill model. The editor organizes the JSON-based model the way Alexa likes it, and the build process alerts you to any errors it finds where you can easily fix them and re-build. Once the build succeeds, it’s a simple step to open the JSON Editor and copy the code into my local IDE. That keeps my local environment in sync with what’s out in the cloud.
Working in Alexa Developer Console
There are command line tools for keeping things in sync. A clone command brings down the whole project, but that’s a bit heavy handed when I just need to latest model.
The CLI has a rich set of sub-commands, too. No doubt these building blocks are composed into macro commands. With a bit of work, I can see how I might program something more targeted to automate this “copy-and-paste” process. However, at the moment I am more focused on getting a skill published ASAP.
Ironically, I have no time for “time-savers.”
(Did I just write that? Yes, I did.)
Once I have something worth testing, I use the Ask CLI
deploy command to push lambda changes into the cloud.
ask deploy --target lambda
A crowning virtue of keeping all of the latest code and configuration in my local environment is that it allows me to use source control to preserve my work. Git is a fabulous tool. As soon as I have created my skill project using the Ask CLI, I can create a local Git repository with
git init. Voilà, instance source control.
Then it’s a simple matter of pushing that repo to GitHub so that I can pull it into any other machine I happen to use for development. I have two computers for development, a desktop machine in the home office and this laptop that I take with me, since I am on the go quite a bit.
Your situation may be different. If you spend a lot of time writing code and jumping between machines, try this out. Just remember to commit your changes frequently and keep them synced with your GitHub repo.
The Alexa Console has a few other tabs: Test, Distribution, Certification, Analytics.
My adventure with the Console Test window was hit-or-miss. More often than not, I would try to invoke my skill, and the request would never come back. The browser console shows lots of errors. I think Chrome sees the test requests as some kind of hacker attack and blocks them. Unfortunate and a tad frustrating.
Testing from the Lambda console works better. Assuming you can trust Alexa to trigger your function, focus your tests on the custom logic you have supplied. That’s certainly a more direct approach, although there is some presumption that it’s easy to know what will be in the payload that Alexa sends to Lambda.
You can test using the Ask CLI, which supports a
simulate command. That seems to work and does not have the problems I saw in the browser-based tests. I haven’t yet figured out how to test a series of utterances to truly simulate the player experience.
Instead, I jumped to what might be the best testing approach. I loaded my skill onto my Echo Dot, and tested with my voice. After all, that employs the same interaction mode that regular players will use. So I got to have a true experience, not a simulation.
If you recall, that’s how I learned that I needed to rename my skill. Good to know.
For my test plan, I listed all of the possible outcomes. Here they are:
- I ask for directions.
- I win.
- I win with an exact guess.
- Alexa wins.
- Alexa wins with an exact guess.
- I say “play again” when the round is over.
- I say “quit” when the round is over.
- I say “banana” for my secret number.
- I say “kitchen sink” for my guess.
- I ignore Alexa when she asks for input, which triggers the re-prompt speech.
Then I played the game until I got them all to happen. By exercising those cases, I discovered a handful of problems in my code. Once they all worked, my skill was done.
And then I had my daughter try it. She humored me by playing a couple of rounds before going back to Minecraft. Hey, I wasn’t expecting any miracles. I think she enjoyed it, and more importantly, I enjoyed watching her play.
Alexa, tell me it’s all about me.
Today my skill was approved. That means I made it through the release process. Not without a hiccup or two.
Releasing involves filling out a multi-page form where you provide marketing information: what the skill is for, any requirements for using it, the category of skill, and so on.
Amazon also wants to know your intentions where there might be legal or financial implications. Are you selling the skill? Are you marketing to children? Does your skill expose any governmental secrets?
This time I got through without raising any flags. That was my second try.
The first time around, my skill was rejected. Apparently, Alexa should never leave the user hanging. If she is going to wait for a response, the last thing she says has to prompt the user with options. Otherwise, she should end the session.
Some Alexa skills are one-shot skills. “Alexa, what is the weather?” is a one-shot—she tells you the weather and quits.
For Guess the Total, I want Alexa to keep going until the player says “quit.” The first time I went for approval, there was no explicit way out of the game, and it wasn’t clear that you could play again. So I fixed that and resubmitted.
It was nice to wake up and see this message.
Thank you for the recent submission of your skill, ‘Guess the Total’.
Congratulations! Your skill has passed our certification process and will be published to skill store shortly.
Quirks and Quacks
Most technology has quirks about. Little oddities that may get sorted out at some point. Or not. I found a few in the Alexa machinery, most of which I have already discussed. How about a few more?
Alexa is not good at hearing negative numbers. While testing my out-of-bounds validation, I tried to say numbers like “negative two” and “minus six.” Alexa would helpfully ignore the “non-numeric” part and fill the slot with 2 or 6. Nice try, Alexa.
I never tried saying a decimal. I bet that would mess things up in my game logic. Not sure how well Alexa hears decimals.
It’s not obvious that in order to keep Alexa awake you have to include a “re-prompt” with the response. Otherwise, she says her thing and ends the session. As I explained, I want Alexa to continue playing Guess the Total until the player tells Alexa to quit. So I had to include re-prompts with all of my intents except the intent to exit.
On a related note, I originally thought I could ask the player something like “do you want to play?” But the reply “yes” or “no” does not imply a specific intent. Maybe there is a way for Alexa to remember the context of the last question. The easiest thing for now was to direct players to be explicit about their intent. When someone says “play the game” or “play again” or “quit,” the intent is pretty obvious.
With time and practice, I might get better at making things conversational.
I am sure there are other quirks. Then again, once you get used to them, they become normal.
Guess the Total, as it exists today is what I consider to be a minimum viable product, or MVP. That means it does some portion of what I intend it to do well enough. Just enough to get attention, to attract players and gather feedback, and to provide a base to build on.
For my next trick, here is a prioritized list of features I am considering. I may release one at a time or batch a few together.
- Allow Alexa to go first. The player who goes first ought to alternate every round. The coding is different when Alexa gets to make the first guess.
- Up Alexa’s game. I programmed Alexa with a random strategy for choosing a number and guessing. It would be more fun if she could choose from a variety of strategies, even mixing them as more rounds are played to try and catch her opponent off guard.
- Keep score for a multi-round session. This would lend itself to tournament mode, where you play until you reach an overall score that definitely proves you are a champion of the world, at Guess the Total.
- Create a leader board. This is a big one that would mean registering users, keep stats across sessions (using a database), and having a web page to see your score and rank among the top players.
And with that juicy list of features, I may have just blocked out all of my spare time for the next six months.
I think I’ll take it one step at a time.
Guess the Total is live in the Skill Store. If you have an Echo device or the Alexa app, you can find it by searching…wait for it…Guess the Total. Yup. There it is.
Go ahead and try it. I guarantee you’ll have a moment of engaging fun while you figure out how to play and go a few rounds against the lovely and talented Alexa. She really is a charming companion.
Also, you can find news about the latest updates on this page. I already announced the release to Happy Spirit Games newsletter subscribers. Do you like be an insider? Click that link, and subscribe today.
If you enjoyed this post, let me know, and please tell others about it.
Be brave. Add a comment or ask a question. I promise not to delegate my response to Alexa.