An LLM AI assistant for Home Assistant (home automation) that asks follow-up questions and writes API requests to retrieve additional information before taking the appropriate action.
Demo VideoWhy is it useful?A big problem with existing AI assistants is having to formulate specific enough queries from the get-go, making it frustrating to use them.
We naturally have a context in our mind that is often different from the context of the AI. For example, I may want to turn all lights off in the room, and ask "Turn all lights off" but the AI would turn all the lights off in the house.
Or, I may ask it to set humidity to 40% and it might turn on the sprinklers because soil humidity is below 40% outside.
Another big issue is with speech: it is very hard naturally to formulate a request with many specifications, and say it without pauses that would be detected as end of query. In addition, most APIs require specific names/IDs that are difficult to remember precisely and say correctly.
How does it work?This project uses a local LLM model, Llama3 70B (but you can try it out with any model). I hosted it as an exposed local API with LM Studio, running on an AMD Radeon Pro W7900. I chose this since LM Studio works very nicely out of the box with this AMD GPU (leveraging ROCm, the AMD equivalent to CUDA) and allows me to easily hotswap models and deploy the server:
You should be able to use any LLM as long as you have an API with endpoints equivalent to the OpenAI API (LM Studio mimics it).
Since this project is specifically for a home automation assistant through Home Assistant, it needs a Home Assistant instance to interface with, and will read information about your devices and areas. It will work much better if you label your Home Assistant entities with helpful labels like "temperature" or "humidity" as this makes it much easier for it to find the relevant devices. Of course, naming them nicely also helps (something like "indoorTempSensor" or "bedroomHumidity").
I made a simple Python script with text input to interact with the assistant; ideally it would be a voice assistant, but I decided that was beyond the scope of this project for now and I wanted to see the possibilities and limitations.
Here is how the code/prompting works. There are four types of actions that the LLM can choose from:
- Intent: Make an intent API request to Home Assistant to perform an action (e.g. turn on light X)
- Query: Make a template API request to Home Assistant to get information (e.g. the entity names for all the entities in X room)
- Followup: Ask the user a follow-up question to understand the request better
- Answer: Give the user an answer to their question (if they asked one)
Initially, I tried using LangChain with Ollama for this purpose, specifically the MultiPromptChain tool which chooses among different prompts to use. The idea is that the LLM chooses between the prompts to build an Intent, Query, Followup, or Answer.
I came across some issues with using LangChain for this purpose - namely lack of documentation and flexibility. For example, in this project, the results of the Intent or Query prompts should be send to API endpoints, whereas the Followup question should be asked to the user. There was no easy way to do this with LangChain (as far as I could tell).
So, I adapted the prompting style behind the MultiPromptChain and directly prompted the LLM like so:
- It gets a short description of each of the four actions and when it should use them, and is instructed to output which action it wants to take, like "INTENT"
- Based on its output, I then give it a much more detailed prompt of how it should take that action. For example, since it outputted "INTENT", it is then prompted with a few examples of Intent API requests and how to construct one
- The output is then used appropriately (if Intent or Query, then the appropriate API request is made, if Followup then the question is posed to the user)
- Back to the start!
The overall idea behind this is that any followup questions or queries will be made first before creating the appropriate intent or answer.
One more thing to mention is that I preloaded the LLM with some basic information like available labels (temperature, humidity) and areas (bedroom, kitchen, etc.) taken straight from the API. There aren't many of them and it helps avoid queries that will almost always be called anyway. This still enables the LLM to adapt to your own setup.
Successes, Issues, and ImprovementsIf you set up your Home Assistant devices nicely (descriptive names and labels), the LLM does a pretty good job of finding the right devices.
Where it doesn't do great, is that it will often malform its API requests and get confused along the way. Sometimes, it can correct itself. The better the model, the less likely this is to happen (I used Llama 70B Q2 XS, and it still made a lot of mistakes).
In my experience, it was also difficult to get the LLM to judge when to ask follow-up questions. Depending on my prompting, it usually strayed towards not enough follow-up questions or too many. It was difficult to strike a balance. I would like to provide more examples but I was limited by context and also didn't want to be too literal and simply tell it exactly what to do in every situation.
Some improvements I'd like to make when I have more time are:
- Better error recovery
- Somehow figure out how to prompt it for better "prompt selection"
- Make my Home Assistant devices more organized to make it easier for the LLM
- Less finicky prompting (had to repeat myself a lot, and it still sometimes chooses the wrong action to take)
Let me know if you have any suggestions, hope you like the project.
Cover image mostly generated with DALL-E 3.
Code (mostly prompting) available below!
Comments