Voice recognition and voice interfaces are here in a big way and will play an important role in the future of our computing experience. What has long been a staple of science fiction, can now be a science reality for everyone.
My first experiment with a voice control interface,“Lend me your ears!”: Web Bluetooth and Voice Recognition, was an exciting experiment. I was very pleased when it became a subject of Hackster.io live and inspired numerous novel and derivative projects by myself, VoiceBot101, and others such as:
The popularity and excitement around voice can also be seen by the numerous projects and skills published here for Alexa and Google Assistant, such as:
A few weeks ago I found out about an open source voice recognition and artificial intelligence platform, Mycroft.ai, that sounded very exciting and I decided it was time to do some more experimenting with voice interfaces.
This project description will mostly focus on how I learned to write a skill for the Mycroft platform. I will also share some of my understandings, confusions and overall impressions.
The Mycroft.ai website and github page have a wealth of information you should read. Additionally there are some excellent written reviews and commentaries about the platform, such as:
My ultimate goal in exploring Mycroft and writing this Skill and the others to come is to answer the questions:
- What is good voice interface design?
- Does the Mycroft.al platform allow me to create good voice design?
Mycroft.ai is an artificial intelligence and voice recognition system. It is similar to platforms such as Alexa, Google Assistant or Siri but it is entirely open source and written in Python. It is available in 3 different versions:
- The Mycroft Mark 1 - an open hardware platform based on the Raspberry Pi 2
- A Desktop package to run on Linux Ubuntu
- The Picroft 0.8 which is meant to be run on the Raspberry Pi 3
I have been looking for a good reason to get starting experimenting in a serious way with the Raspberry Pi 3, so the Picroft 0.8 option was a natural choice for me. It was also appealing that there are very thorough installation instructions and a guide to compatible hardware.
From here on out, I will use Mycroft and Picroft interchangeably, but I exclusively used the Picroft 0.8 version for this project. I may also call Mycroft/Picroft he or him, but in this I am following the convention of Manuel "Mannie" Garcia O'Kelly.
Picroft Installation NotesFor this project, I used a Raspberry Pi 3 housed in the Pi Top Ceed enclosure. I used this because it was recommended that a keyboard and screen would make interacting with and installing Picroft easier. Don't get too excited, though, I am not running Picroft on Pi Top OS, I just overwrote the SD card. It was the keyboard and screen I was interested in!
I also purchased one of the recommended microphone/speaker combinations in the compatible hardware list.
There are 2 sets of installation instructions & tips to review prior to installing Mycroft on a Raspberry Pi 3:
It took less than 1 hour to perform and complete these steps, and amazingly it all worked! That was until I told Mycroft to go to sleep. When I tried to wake him up with the wake word, he refused to do so. In frustration, I rebooted. He would then recognize the wake word, I could see the text response display on the screen, but could not hear any audio. So I went back to the installation instructions . . .
At the bottom of the installation instructions and tips in the first link above, is a tip on how to configure the USB microphone:
Using USB Audio as Output
Typically the USB audio should be connected to hwplug:1,0 but to verify run the following:
aplay -L
Find the hwplug output for the device you want to use, take this and update the /etc/mycroft/mycroft.conf file accordingly:
"play_wav_cmdline": "aplay -Dhw:0,0 %1" this line now becomes "play_wav_cmdline": "aplay -Dplughw:1,0 %1"
You can now run ./auto_run.sh to start the program back up and test and ensure the output comes through the USB speakers.
Following theses instructions using nano as a text editior and with a reboot, cured the microphone problem. However, putting him to sleep, still requires a full reboot to make him responsive to the wake word. Hopefully with a little more work I'll be able to resovle this.
This brings up an important point with using the Picroft version. You should be confident with basic command line skills in Linux. So far, it has only been the simple stuff, such as moving around directories, safely deleting files, and using nano.
Idea for a SkillI wanted to create a skill that would be a bit more complicated than a "Hello, World" example(they already have one) but not so complex as to deter me from completing it.
After some thinking, I came up with having Mycroft tell me the current orbital location of the International Space Station(ISS) relative to its Earth coordinates in latitude and longitude. Since these two numbers don't convey much to me, I also wanted Mycroft to tell me what these coordinates corresponded to on a map.
Ok so it is not the most useful skill, but it's fun, and chance to learn. The following two sections are domain specific information, while not essential to writing a Skill, was interesting and relevant to the "output" of the skill.
The International Space Station(ISS)The ISS is a massive science laboratory in space. It has been built up to its current size and the many of the last missions of the Space Shuttle were used to build it. It is interesting to note that the original concept of the Space Shuttle was for it to work in tandem with orbiting space laboratories. It completes an orbit of earth every 90 minutes at an average of 240 miles high. By calculation it orbits the Earth 16 times in one day(24*60/90). It's velocity of 17,500 miles-per-hour. means a trip from LA to New York would take a little bit less than 9 minutes! This is relevant to our Skill because by moving so fast, it gives the skill a bit of variety and interest in its responses, which is an important feature of voice design.
Reverse GeocodingI was able to find two free and open source APIs that deliver real time ISS coordinates Open-Notify and wheretheiss.at. I choose to go with Open-Notify for this project.
Both of these APIs deliver ISS orbit locations relative to the earth as JSON payloads with the most current latitude and longitude of the ISS. We could simply access one of these, decode the JSON payload for latitude and longitude and have Mycroft report these 2 numbers. However, how does a report of "latitude 5.218 and longitude 115.008"( which is over the South China Sea) or "latitude 44.590 and longitude -104.715", (which is over Devil's Tower, Wyoming, USA) mean to the average user? Having the Skill provide both the map coordinates and a meaningful geographic feature name(aka toponym) provides a context and perhaps even a personal connection to the Skill. So a bit more effort on our part, will pay off in a better user experience.
When we interact with mapping services, such as google maps, we give it an address or 2 known points on a map and ask for directions or we may make use of a ride service app such as Uber to let our driver know where we are and where we would like to go. This is geocoding: we identify points on a map, and the application transforms these into computable coordinates. Our skill,however, requires us to do the opposite, we give it a set of coordinates and it provides us with a recognizable geographic feature. This what is known as reverse geocoding. Many mapping APIs will provide this as function into their service.
So we're all set, just use the reverse geocode function of our favorite(and free) mapping service, and decode the resulting JSON payload! However, there is a catch! What you will find is that most of these services don't cover major bodies of water, such as oceans! Try them with coordinates of features such as the Atlantic and Pacific Oceans and "result not found" is the report! I guess that this has to do with monetizing the service. You really can't get someone to the nearest Starbucks in the middle of the Coral Sea!
Lucky for us, GeoNames.org, provides a web service that does return a meaningful payload over major bodies of water! I could not find another free service that did so. If you do, please let me know about it!
Go to the GeoNames.org web site, apply for a username/account and get up to 30,000 accessions a day! Just remember to activate web services for your user name/account! Problem solved.
Writing the SkillThe Mycroft core documentation contains an excellent tutorial on creating a skill and there many skills on the Mycroft github page to learn from. I will not rehash that tutorial here, but point out the important points I used in creating this skill.
Creating a skill requires several different files in a specific structure, with specific file name conventions:
The /dialog folder contains language based files that tell Mycroft what to say when he executes a skill. Here is the skill-hello-world/dialog/en-us/hello.world.dialog from Mycroft.ai's github repository:
Hello world
Hello
Hi to you too
In response to you saying, "Hey Mycroft, Hello world", Mycroft will select one of the 3 responses above and say it. The words, "Hey Mycroft" or just "Mycroft", are the wake words or trigger words that indicate that Mycroft should begin listening.
How does he know what to listen for in order to execute this skill? That's where the /vocab folder comes in. From the tutorial,
- "The vocab folder contains subfolders for each language supported, like en-us. Inside each language folder, we place .voc files which contain phrases or keywords that determine what Mycroft will listen for to trigger the skill. "
The dialog and voc files for my skill are a bit more complex. Here is the skill-iss-location/dialog/en-us/location.current.dialog:
{{latitude}} latitude {{longitude}} longitude which corresponds to {{toponym}}
The space station is at {{latitude}} latitude {{longitude}} longitude over {{toponym}}
It's at {{latitude}} latitude {{longitude}} longitude, {{toponym}}
The ISS is now over {{toponym}} at {{latitude}} latitude {{longitude}} longitude
{{toponym}} at {{latitude}} latitude and {{longitude}} longitude
The terms {{latitude}}, {{longitude}} and {{toponym}} are keys to values that are populated during the execution of the __init__.py file associated with this skill. There is a one-to-one relation between the values you obtain in the python code and these keys, making it very easy to deliver on the interactive experience. The text around these keys is free form, you could place anything at all here and Mycroft would pick among these options to respond with.
The heart of the Skill lies within the __init__.py file. I look at it this way: anything you can write with python, can be incorporated into a skill! With the number and variety of python libraries available, there is no limit here! There is a specific structure to this file and I recommend reading the tutorial and looking at the skills in the Mycroft.al github site for idea on getting started here. I will go over the specifics of my code here.it appears that this is all down in Python 2.7).
What is the /test/intent folder all about? It took me a while to begin to understand the purpose of this folder, and I am still learning about it. To understand it, these files, written in JSON, allow Mycroft to ensure the intent of your skill and the vocal expressions as defined in the files you create all work correctly together. They don't define the vocal interaction, but test it, hence the name. This was not clear to me until I compared the test/intent files from the wikipedia and hello-world skills:
From one of the test/intent files from hello-world:
{
"utterance": "Hello world",
"intent_type": "HelloWorldIntent",
"intent": {
"HelloWorldKeyword": "hello world"
}
}
And this is from the wikipedia skill:
{
"utterance": "tell me about the first world war",
"intent_type": "WikipediaIntent",
"intent": {
"WikipediaKeyword": "tell me about",
"ArticleTitle": "the first world war"
}
}
Comparing these files, helped me to understand the role of these files. The utterance is a sample of what Mycroft might be expected to hear with keywords and specifics included. The intent_type corresponds to an intent handler defined in your __init__.py file, and as you can see when you compare the "intent:" key of JSON file contain keywords from your vocab files and in the case of wiki skill a key:value pair to passed.
So these files do not define your possible interactions, but are specific test cases to ensure your Skill files all work together.
Not shown above is the optional /regex folder which would contain regular expressions helping Mycroft respond to and parse out information from what it hears! This is very exciting, but not a subject of this project.
Writing a skill using the Raspberry PI 3 with Picroft 0.8 installed, requires you be be comfortable working on the linux command line. In order to facilitate writing this skill, I wrote all the required files using atom on my Windows 10 laptop and then uploaded everything to Github.
To install this skill, I then cloned the repository from my github page and worked from there. This is not the ideal way to work on software. However, now that I am comfortable with the process, I will likely pull the template skill and work straight from there with nano as my editor and push/merge changes to my own repositories properly.
The skill must be placed in the /opt/mycroft/skills skills folder. After rebooting the system with the sudo reboot the skill was availableable to me. There is a tool provide by MSM( Mycroft Skill Manager), but I did not use it for this project. I still have a lot to learn!
Python Code HighlightsI found the process of debugging through the command line/nano/voice interface a bit of a challenge. So I wrote a test script, issTest006.py, to debug the essential functionality of the __init__.py intent function handler to run on my desktop.
This helped ease the debugging process for me by ensuring the non-Mycroft related functionality was separated out from the Mycroft platform related bugs in my code.
We are really just writing a Python script here. The Skill is created as a class in Python. To work in the Mycroft environment, the class has functions it must contain. These are outlined both in the tutorial and in the template skill:
class HelloWorldSkill(MycroftSkill):
def __init__(self):
def initialize(self):
def handle_thank_you_intent(self, message):
...
def stop(self):
def create_skill():
return HelloWorldSkill()
In my case, I modified this template to fit the ISS Location skill:
The __init__(self) function is used to create an instance of the Skill:
class ISSLocationSkill(MycroftSkill):
def __init__(self):
super(ISSLocationSkill, self).__init__(name="ISSLocationSkill")
...
This is no different than an __init__(self) constructor for any other Python class. I have no instance variables to set up here, so that's all that is needed here.
The def initialize(self) is where we build our intent and register the intent handlers. In my case there is only one.
You also see here the values being passed into the functions match file names, of the keyword .voc, or JSON keys from our previous coding. This is where the connection between the keywords, intent testing and intent handlers is made. Hence it is important to keep to the naming conventions of the Mycroft platform.
def initialize(self):
iss_location_intent = IntentBuilder("ISSLocationIntent").require("ISSKeyword").build()
self.register_intent(iss_location_intent, self.handle_intent)
...
My skill has only one intent, but you can see from the Hello World Skill what registering multiple intent handlers looks like:
def initialize(self):
thank_you_intent = IntentBuilder("ThankYouIntent"). \
require("ThankYouKeyword").build()
self.register_intent(thank_you_intent, self.handle_thank_you_intent)
how_are_you_intent = IntentBuilder("HowAreYouIntent"). \
require("HowAreYouKeyword").build()
self.register_intent(how_are_you_intent,
self.handle_how_are_you_intent)
hello_world_intent = IntentBuilder("HelloWorldIntent"). \
require("HelloWorldKeyword").build()
self.register_intent(hello_world_intent,
self.handle_hello_world_intent)
The bulk of the code for my skill is contained in the intent handler which is in contrast to the Hello World Skill intent handlers which are sparse. They simply point Mycroft to the dialog file to access:
def handle_how_are_you_intent(self, message):
self.speak_dialog("how.are.you")
It is important to note that the value passed is the name of the dialog file to access when executing the skill, not the actual text you want Mycroft to speak! The value passed is the name of the file without the .dialog extension.
Let's dive into my intent handler and try and focus on how the information from the intent handler is passed back to Mycroft.
The first thing we want to do is get the current ISS location from open-notify. We use the urllib2 library to make a request and then parse the JSON payload response using json (Please note that urllib2 is not a single library in Python3).
def handle_intent(self, message):
# get the 'current' latitude and longitude of the ISS from open-notify.org in JSON
reqISSLocation = urllib2.Request("http://api.open-notify.org/iss-now.json")
resISSLocation = urllib2.urlopen(reqISSLocation)
issObj = json.loads(resISSLocation.read()) # JSON payload of ISS location data
latISS = issObj['iss_position']['latitude']
lngISS = issObj['iss_position']['longitude']
Can you see a problem with the code above? I can and experienced it. What happens if the server for Open-Notify is down or there is a problem with the connection? This will cause the program to crash. It won't crash Mycroft, it just will not provide an acceptable voice response to this "error" condition, leaving the user(like me) to wonder what the heck has happened. To correct this, in a future version, I will use the most awesome exception handling mechanism in Python as I did in the code further down.
Next we will use the latitude and longitude we parsed out of the JSON payload from Open-Notify to create the URL request string to access the GeoNames.org web service. We are creating 2 strings here as there is a different reverse geocoding api call for land and ocean/water features:
...
oceanGeoNamesReq = "http://api.geonames.org/oceanJSON?lat="+ latISS +"&lng="+ lngISS +"&username=YourUserName"
landGeoNamesReq = "http://api.geonames.org/countryCodeJSON?formatted=true&lat=" + latISS + "&lng=" + lngISS +"&username=YourUserName&style=full"
...
Don't forget to insert your username into the strings above!
Even though the ISS orbit is designed to cover 90% of the world's population at a sweep, the Earth is 3/4 water and whenever I look much of the orbit seems to be over a body of water. So as the comment in the code below indicates, we check to see if the coordinates reverse geocode to an ocean location first. Again here I should have put this in exception handling block. Something for the future . . .
# Since the Earth is 3/4 water, we'll chek to see if the ISS is over water first;
# in the case where this is not so, we handle the exception by searching for a country it is
# over, and is this is not coded for on GenNames, we just we say we don't know
oceanGeoNamesRes = urllib2.urlopen(oceanGeoNamesReq)
toponymObj = json.loads(oceanGeoNamesRes.read())
In the next section of code, we absolutely must wrap it in an exception handling block. As we try to parse out the JSON payload returned from GeoNames.org from an ocean location, if the ISS is over a land mass will yield an error as the ocean name key will not exist in the payload.
If we get this error, our first assumption will be that we are over a land mass. So we will then try and access the reverse geocoding service for land. We could actually make this more sophisticated by checking the status/error code returned, but this assumption, land vs. body of water, seems to work just fine:
try:
toponym = "the " + toponymObj['ocean']['name']
except KeyError:
landGeoNamesRes = urllib2.urlopen(landGeoNamesReq)
toponymObj = json.loads(landGeoNamesRes.read())
toponym = toponymObj['countryName']
except:
toponym = "unknown"
So here, if there is a KeyError, which would arise it we try to parse the ocean name key from the payload, we then handle this error by reverse geocoding for land feature and get the country name from this. In the case where accessing this web service generates an error, we handle this by calling it an unknown location. Maybe not the best, but a good way to start handling a case where both the ocean and land mass coordinates do not reverse geocode properly. If you explore the many reverse geocoding web services from GeoNames.org, you may find other useful JSON payload items from Mycroft to say. Take a look and let me know or modify it yourself!
So far, all the code described for the intent_handler is just Python. We have yet to interact with Mycroft! You can see this from comparing this code to the Python test script I wrote. I wanted to show this, because not only was it fun to write in Python again but also to demonstrate that it appears to me that if you can write it in Python you can have Mycroft run it under voice control! This is pretty powerful stuff!
Now let's see how we pass our new found knowledge of where the ISS is to Mycroft:
if toponym == "unknown":
self.speak_dialog("location.unknown",{"latitude": latISS, "longitude": lngISS})
else:
self.speak_dialog("location.current",{"toponym":toponym, latitude": latISS, "longitude": lngISS})
And here it is! Remember we talked about the {{latitude}} key in the location.current.dialog file? Well here is where we pass the values to those keys!
As you can see here, if we have found either a land or water toponym we pass the latitude, longitude and toponym back to the dialog file for Mycroft to speak. If both failed, and we generate an unknown toponym for the given latitude and longitude, we choose the location.unknown.dialog file and let Mycroft execute a phrase from there!
Pretty cool, right?
Just so you know, it was easier to write the code for Mycroft then to document all this!
Demonstrationslet's see the skill in action under a few different cases:
Needed ImprovementsThe perfect is the enemy of the complete. So I went with the code as it stands because I have being getting lots of ideas that I would like to try, but I also want to get this out there so people can see how easy and fun it is to build voice applications on this platform.
That being said, as you will see below in the demos, I still have not figured out how to get Mycroft to handle acronyms or initials, like ISS, properly, on the recognition side.
There is also a bit of lag from request to response on the voice side. It may be due to my settings, coding, or inherent to the platform at this time. I know all of these will likely improve with time.
Also we rely on 2 web services, Open-Notify and GeoNames. As you can see, I really don't make any attempt to handle server errors. A simple fix, but one that still needs to be done to improve this Skill!
Conclusion/Some ObservationsI've had Mycroft(Picroft 0.8) up and running on a Raspberry Pi 3 for little over a week now and wrote my first skill for it. One thing that motivates me about this platform is the feeling of engagement I have with it. I feel like I wanted to carry on more than "single turn" events or skills with it. It was easy to write the skill once I got comfortable being on the command line and using nano again, and I really like that everything is in Python. I haven;t had the chance to code with Python recently, so it was a good chance to do this again and also start to dream of all the things that Mycroft and Python can do in combination on a Raspberry Pi 3.
So far, the answer to the question if Mycroft will allow me to create a good voice design is yes. The interconnection between dialog, vocab and the __init__.py files are clear and easy to work with.
The concept of intent and and intent_handler passing information along a message bus makes sense to me. I look forward to exploring regular expressions in my voice asks and then trying to create multi-turn skills.
What is good voice design? I hope to explore this question more and improve on my abilities with this platform. If you have a Raspberry Pi 3 lying around and Ubuntu box to use, or maybe even if you can buy a Mycroft Mark 1 I think it would be well worth it to get on this platform and experiment with it!
Comments