Over the years I have learnt two things: one, I like to make things work together and two, due to the fact I am not good at programming, I like to keep things simple and follow the path of least resistance. I often find that this leads to more creative solutions. So I decided to voice enable my home automation; but with a slightly more challenging set of aims. Please note that for this project article I only cover a basic set of functionality - controlling lights and getting weather information, but the approach can be used for many other automations.
Project Aims:-1. The solution had to run on a single Raspberry Pi.
2. It had to be local only, with no dependence on the cloud and all data kept private.
3. Voice commands must be customisable.
4. All software had to be open source.
5. The integration had to via MQTT for future proofing.
These aims required a choice of technology for both the voice assistant and home automation solutions and lead me to choose the following:
Nymea for my home automation because it is quick to deploy, covers most protocols, has a pre-built interface, is lightweight and the support you get from the developers is the best I have ever encountered (better than commercial offerings). The level of support makes it a safe bet compared to other open source solutions.
Rhasspy for my voice assistant - I think it is the most advanced non cloud, open source solution that is totally integrated with MQTT.
Prerequisites:I am not going to cover the installations because these are well documented on the websites for both Rhasspy (follow instructions at https://rhasspy.readthedocs.io/en/latest/installation) and nymea (instructions available at https://nymea.io/documentation/users/installation/getting-started). However, for my project I made a few installation choices:
- I deployed Rhasspy using Docker because this is by far the easiest way to install and maintain it (this is optional).
- I decided to use Rhasspy as the main MQTT broker (although nymea also has the it's own internal MQTT broker), simply because the main movement of data happens internally within Rhasspy.
Note: this requires that you also make port 12183 (Rhasspy internal MQTT) available from the docker container e.g. add it to the first line as shown below:
docker run -d -p 12101:12101 -p 12183:12183 \
--name rhasspy \
--restart unless-stopped \
-v "$HOME/.config/rhasspy/profiles:/profiles" \
-v "/etc/localtime:/etc/localtime:ro" \
--device /dev/snd:/dev/snd \
rhasspy/rhasspy \
--user-profiles /profiles \
--profile en
I used a Jabra 510 speakerphone for my microphone and speaker as these give good quality input/output and because I had one available. I have read that the Respeaker and Matrix devices work well. In fact, in the early days of Rhasspy development by the maintainer Michael Hansen used a playstation eye!
Method/Process:My automation follows this process:
Wake word spoken => Rhasspy listens for commands => voice command given => Rhasspy converts speech and matches to an intent in the sentences file => intent is converted to a JSON object and sent to Rhasspy's internal MQTT on the topic hermes/intent/ => Nymea is listening to Rhasspy's internal MQTT on port 12183 for anything published to the topic hermes/intent/# => JSON object received by Nymea => Nymea script activated => JSON object inspected for slot data => slot data is used to create a command for a relevant Nymea thing => command passed to the Nymea thing as an action.
Configuring Rhasspy for initial use:Following install - use the settings (cogs icon) menu to apply all the recommended defaults as shown below:
Note: Rhasspy listens for the wake word - so background noise will be passed to MQTT - if you prefer this not to happen then set the udp port for the wake word and audio recording to the default port e.g. 12101. This keeps any recording isolated to the microphone until Rhasspy recognises the wake word.
Rhasspy Sentence File:When Rhasspy recognises a voice command it is translated to text and passed to the sentences file, where it tries to match an intent (i.e. what are you asking Rhasspy to do). This can be as simple as "turn the light off". However, for useful data to be passed to another program (nymea) it has to be structured using entities and slots. The Rhasspy documentation is very detailed on how to do this.
For this project I decided to break the voice command into 5 entity slots - Location, Object, Action, Verb and Group. A combination of these means that most actions in nymea can be performed. For example to "turn off the bedroom table lamp" - we need a slot for location -> bedroom, object -> table lamp, action -> off and verb -> turn. Matching these slots from the JSON object provided by Rhasspy allows us to create an action statement in nymea.
Before we can provide a range of matches in the sentence file, we need to create a set of rules for creating the slots as follows:
[Switch_On_Off]
location = (house | outside | downstairs | bedroom | kitchen | (Living Room):(livingRoom) | (Front Room):(frontRoom)| bathroom){location}
object = (Lamp | Lamps | Light | Lights | (Floor Lamp):(FloorLamp) | (Floor Lamps):(FloorLamps) | (Table Lamp):(TableLamp) | (Table Lamps):(TableLamps)){object}
action = (on:true | off:false){action}
verb = (TURN | SWITCH | PUT | SET){verb}
group = (ALL | (THE):(SINGLE)){group}
\[<verb>] <action> <group> <location> <object> [in]
\[<verb>] <group> <location> <object> <action> [in]
The first line is the intent name - useful for creating logic in scripts.
The second line is a rule called location that provides multiple options - bedroom, kitchen, etc. Note - I have given some locations an alternative by using the : colon, this means that when Rhasspy hears Front Room - it is converted to frontRoom when passed in the JSON object.
Rules for object, action, verb and group are also added (group is for future use to issue commands to multiple devices). Notice that by using alternatives in combination I can easily identify a device by name - e.g. frontRoomTableLamp.
At the end of each rule is an {entity} that also matches to a slot in the JSON object. This provides a way to identify the location, action etc. by matching the slot data.
The last two lines are used by Rhasspy to match the actual voice command. Using the <> we can insert any of the items in the rules and using [] creates optional items. The \ is used because we are using a rule at the beginning of a sentence. So "turn off the front room table lamp" is recognised and so is "all house lights off".
Here is an extract of my sentence file:
Activating the wake word and speaking the voice command "turn all house lights off" produces the following JSON object passed to MQTT:
{
"entities": [
{
"end": 4,
"entity": "verb",
"raw_end": 4,
"raw_start": 0,
"raw_value": "turn",
"start": 0,
"value": "TURN",
"value_details": {
"kind": "Unknown",
"value": "TURN"
}
},
{
"end": 8,
"entity": "group",
"raw_end": 8,
"raw_start": 5,
"raw_value": "all",
"start": 5,
"value": "ALL",
"value_details": {
"kind": "Unknown",
"value": "ALL"
}
},
{
"end": 14,
"entity": "location",
"raw_end": 14,
"raw_start": 9,
"raw_value": "house",
"start": 9,
"value": "house",
"value_details": {
"kind": "Unknown",
"value": "house"
}
},
{
"end": 21,
"entity": "object",
"raw_end": 21,
"raw_start": 15,
"raw_value": "lights",
"start": 15,
"value": "Lights",
"value_details": {
"kind": "Unknown",
"value": "Lights"
}
},
{
"end": 27,
"entity": "action",
"raw_end": 25,
"raw_start": 22,
"raw_value": "off",
"start": 22,
"value": "false",
"value_details": {
"kind": "Unknown",
"value": "false"
}
}
],
"intent": {
"confidence": 1,
"name": "Switch_On_Off"
},
"raw_text": "turn all house lights off",
"raw_tokens": [
"turn",
"all",
"house",
"lights",
"off"
],
"recognize_seconds": 0.15912807499989867,
"slots": {
"action": "false",
"group": "ALL",
"location": "house",
"object": "Lights",
"verb": "TURN"
},
"speech_confidence": 1,
"text": "TURN ALL house Lights false",
"tokens": [
"TURN",
"ALL",
"house",
"Lights",
"false"
],
"wakeword_id": null
}
Nymea Configuration:Although I think it is self explanatory here is an example of how to add a thing (device):
Select the + button in the top right corner to add a new thing:
Select the model/vendor of the thing you want to create (Note: if it is not listed then add the plugin from the system update menu). In this example we select a Sonoff single switch to control a lamp:
Configure the name and ip address of the switch and also choose the light option if it is going to control lights (as in this project):
That's it - if nymea finds the device it will be added to your list of things and available on the things tab under lights:
Nymea provides a simple interface to create rules but also has the facility to create a qml script that allows nymea things (devices) to be controlled using javascript. This is clearly explained in the documentation.
Create a new script by clicking on the hamburger menu and selecting Magic:
Then click on the {} symbol in the top left corner:
and finally click the + button to create a new script:
Below is a picture of the first part of my Rhasspy script:
The first part of the script triggers action based on the nymea MQTT client receiving data on the hermes/intent/# topic.
Part 1
import nymea 1.0
Item {
// === Section 1 - create Event trigger ===
ThingEvent {
thingId: "{3b3b9e72-5227-4d8a-8432-1d797d2c816b}" // Rhasspy GW
eventName: "triggered"
onTriggered: {
//console.log("Rhasspy MQTT event received:", JSON.stringify(params));
// === Section 2 - Get Data Parameters from Rhasspy ===
var intentData = params["data"];
console.log("Intent = ", intentData);
// === Section 3 - Parse the data to a JSON object ready for searching ===
var extractedIntent = JSON.parse(intentData);
// === Section 4 - Get the intent Name from Rhasspy ===
var intentName = extractedIntent.intent.intentName ;
At section 1 the Rhasspy GW thing (MQTT client) is triggered.
Section 2 - the MQTT thing parameter data is store in a variable.
Section 3 - the parameter data (which is the Rhasspy JSON object) is parsed ready for searching.
Section 4 - the JSON object stored in the intentName variable is interrogated to find the intent name so we know what to do with it
Part 2 - an iteration is performed on the JSON object to extract the verb, action, group, location and object. Not all of these have to be used. Logging to the console is optional.
// === Get the number of slots that Rhasspy sent ===
var numSlots = (Object.keys(extractedIntent.slots).length)
// === Loop data for each slot to determine the value and store as variable ===
var i
for (i = 0; i < numSlots; i++) {
console.log("slotname", extractedIntent.slots[i].slotName) ;
if (extractedIntent.slots[i].slotName == "verb") {
var intentVerb = extractedIntent.slots[i].value.value ;
console.log("intentVerb = ", extractedIntent.slots[i].value.value)
}
else if (extractedIntent.slots[i].slotName == "action") {
var intentAction = extractedIntent.slots[i].value.value ;
console.log("intentAction = ", extractedIntent.slots[i].value.value)
}
else if (extractedIntent.slots[i].slotName == "group") {
var intentGroup = extractedIntent.slots[i].value.value ;
console.log("intentGroup = ", extractedIntent.slots[i].value.value)
}
else if (extractedIntent.slots[i].slotName == "location") {
var intentLocation = extractedIntent.slots[i].value.value ;
console.log("intentLocation = ", extractedIntent.slots[i].value.value)
}
else if (extractedIntent.slots[i].slotName == "object") {
var intentObject = extractedIntent.slots[i].value.value ;
console.log("intentObject = ", extractedIntent.slots[i].value.value)
}
}
Part 3 - Using the slot data above we can now create a command to send to a nymea thing and then issue a text to speech back to Rhasspy to state what has been done.
// === Section 1 -Determine the action Rhasspy TTS will say ===
if (intentAction == "true"){
var intentActionTTS = "On"
}
else {
var intentActionTTS = "Off"
}
// === Section 2 - check intent for relevant action ===
if (intentName == "Switch_On_Off") {
// === Section 3 - create an action string to send to the target object ===
var rhasspyThing = intentLocation + intentObject + ".execute" ;
// === Section 4 - send action to ThingAction object ===
eval(rhasspyThing)({"power":intentAction}) ;
// === Section 5 - Define what will be said by Rhasspy TTS ===
var toSay = "{\"text\": \"The " + intentLocation + " " + intentObject + " now set 2 "+intentActionTTS+"\"}"
}
// === Section 6 - another intent to deal with ===
else if (intentName == "Get_Weather") {
// === non action object ===
var toSay = "{\"text\": \"The " + intentObject + intentLocation + " is "+ outsideweather.value + "with a temperature of " + outsidetemperature.value + "\"}"
}
// console.log("RhasspyThing = ", rhasspyThing) ;
// === Section 7 - Send the Text to Speech to Rhasspy via the MQTT client ===
publishAction.execute({"topic": "hermes/tts/say", "data": toSay, "qos": 0 })
}
Section 1 - store the action in a variable to personalise the text to speech response.
Section 2 - conditional check for script based on intent name
Section 3 - store the 'thing' in a variable by combining the location and object and add an execute statement so it can be passed to a nymea action object.
Section 4 - send the command to the nymea action object that corresponds to the thing to be controlled e.g. frontRoomTableLamp
Section 5 - store the text to speech response in a variable
Section 6 - this intent does not require any action - just a customised text to speech response (it reads weather data from a nymea openweather thing).
Section 7 - send the speech response to the MQTT thing
Part 4 - a list of things to be activated
ThingAction {
id: publishAction
thingId: "{3b3b9e72-5227-4d8a-8432-1d797d2c816b}" // Rhasspy GW
actionName: "trigger"
}
ThingAction {
id: frontRoomTableLamp
thingId: "{75e8b06f-3d95-43a1-b8ae-9e96e7da4fd8}" // Front Room Table Lamp CH1
actionName: "power"
}
ThingAction {
id: livingRoomFloorLamp
thingId: "{ca09eb69-7b57-497f-ac86-2eaa2d2c0f14}" // Living Room Floor Lamp CH1
actionName: "power"
}
ThingAction {
id: livingRoomTableLamp
thingId: "{ea8b30d4-6a72-4444-985e-3c4f851c6359}" // Living Room Table Lamp CH1
actionName: "power"
}
InterfaceAction {
id: houseLamps
interfaceName: "light"
actionName: "power"
}
ThingState {
id: outsidetemperature
thingId: "{8e269d15-ade9-411b-ac7d-a0c47eec6086}" // Derby
stateName: "temperature"
}
ThingState {
id: outsideweather
thingId: "{8e269d15-ade9-411b-ac7d-a0c47eec6086}" // Derby
stateName: "weatherDescription"
Here is the complete script for the first run through of this project:
import QtQuick 2.0
import nymea 1.0
Item {
ThingEvent {
thingId: "{3b3b9e72-5227-4d8a-8432-1d797d2c816b}" // Rhasspy GW
eventName: "triggered"
onTriggered: {
//console.log("Rhasspy MQTT event received:", JSON.stringify(params));
// === Get Data Parameters from Rhasspy GW and store in a local variable ===
var intentData = params["data"];
console.log("Intent = ", intentData);
// === Parse the data to a JSON object ready for searching ===
var extractedIntent = JSON.parse(intentData);
// === Get intent Name from Rhasspy so we know what we are searching for ===
var intentName = extractedIntent.intent.intentName ;
console.log("Intent Name is " + intentName);
console.log(Object.keys(extractedIntent.slots).length)
// === Get the number of slots that Rhasspy sent ===
var numSlots = (Object.keys(extractedIntent.slots).length)
// === Loop data per slot to determine value and store as a local variable ===
var i
for (i = 0; i < numSlots; i++) {
console.log("slotname", extractedIntent.slots[i].slotName) ;
if (extractedIntent.slots[i].slotName == "verb") {
var intentVerb = extractedIntent.slots[i].value.value ;
console.log("intentVerb = ", extractedIntent.slots[i].value.value)
}
else if (extractedIntent.slots[i].slotName == "action") {
var intentAction = extractedIntent.slots[i].value.value ;
console.log("intentAction = ", extractedIntent.slots[i].value.value)
}
else if (extractedIntent.slots[i].slotName == "group") {
var intentGroup = extractedIntent.slots[i].value.value ;
console.log("intentGroup = ", extractedIntent.slots[i].value.value)
}
else if (extractedIntent.slots[i].slotName == "location") {
var intentLocation = extractedIntent.slots[i].value.value ;
console.log("intentLocation = ", extractedIntent.slots[i].value.value)
}
else if (extractedIntent.slots[i].slotName == "object") {
var intentObject = extractedIntent.slots[i].value.value ;
console.log("intentObject = ", extractedIntent.slots[i].value.value)
}
}
// === Determine if Rhasspy TTS will say the thing was switched on or off ===
if (intentAction == "true"){
var intentActionTTS = "On"
}
else {
var intentActionTTS = "Off"
}
if (intentName == "Switch_On_Off") {
// === create an action string to send to the target object ===
var rhasspyThing = intentLocation + intentObject + ".execute" ;
// === send action to ThingAction object ===
eval(rhasspyThing)({"power":intentAction}) ;
// === Define what will be said by Rhasspy TTS ===
var toSay = "{\"text\": \"The " + intentLocation + " " + intentObject + " now set 2 "+intentActionTTS+"\"}"
}
else if (intentName == "Get_Weather") {
// === non action object ===
var toSay = "{\"text\": \"The " + intentObject + intentLocation + " is "+ outsideweather.value + "with a temperature of " + outsidetemperature.value + "\"}"
}
// console.log("RhasspyThing = ", rhasspyThing) ;
// ===Send the Text to Speech command to Rhasspy via the MQTT client ===
publishAction.execute({"topic": "hermes/tts/say", "data": toSay, "qos": 0 })
}
}
ThingAction {
id: publishAction
thingId: "{3b3b9e72-5227-4d8a-8432-1d797d2c816b}" // Rhasspy GW
actionName: "trigger"
}
ThingAction {
id: frontRoomTableLamp
thingId: "{75e8b06f-3d95-43a1-b8ae-9e96e7da4fd8}" // Front Room Table Lamp CH1
actionName: "power"
}
ThingAction {
id: livingRoomFloorLamp
thingId: "{ca09eb69-7b57-497f-ac86-2eaa2d2c0f14}" // Living Room Floor Lamp CH
actionName: "power"
}
ThingAction {
id: livingRoomTableLamp
thingId: "{ea8b30d4-6a72-4444-985e-3c4f851c6359}" // Living Room Table Lamp CH
actionName: "power"
}
InterfaceAction {
id: houseLamps
interfaceName: "light"
actionName: "power"
}
ThingState {
id: outsidetemperature
thingId: "{8e269d15-ade9-411b-ac7d-a0c47eec6086}" // GB
stateName: "temperature"
}
ThingState {
id: outsideweather
thingId: "{8e269d15-ade9-411b-ac7d-a0c47eec6086}" // GB
stateName: "weatherDescription"
}
}
Finally, test the script by issuing a voice command or (or typing one) in Rhasspy and check if Rhasspy displays the correct intent:
Although this worked well; having to add logic and thing objects to an ever growing script will become a problem over time. However, the next release of nymea (version 0.28) will contain a generic (dummy) device. This provides a more dynamic way forward. Dummy devices can be setup (let's call them Rhasspy cards) for a thing (device) or groups of things. These dummy things can hold the data of the things you want to control. Then the logic of the script can be set to read these Rhasspy cards, extract the data, match it with the intent and dynamically create the thingaction/thingstate. This means that to add a thing to control in nymea is simply a matter of editing the Rhasppy cards (dummy device). I will give it a go once the new version is released (possibly in July).
Comments
Please log in or sign up to comment.