Published March 9, 2024 © MIT

Voice Assistant based on Raspberry Pi Pico W and LLM

In this tutorial we will learn how to make an AI Voice Assistant based on a LLM and Raspberry Pi Pico W Microcontroller

IntermediateFull instructions provided3 hours3,557

Voice Assistant based on Raspberry Pi Pico W and LLM

Things used in this project

Hardware components

Raspberry Pi Pico W

DHT11 Temperature & Humidity Sensor (3 pins)

DFRobot Gravity: Digital 5A Relay Module

Software apps and online services

Microsoft VS Code

MicroPython

Picovoice AI Picovoice Porcupine Wake Word Engine

Story

With the introduction of ChatGPT in the last year, the concept of large language models has become popular among technology lovers. I decided to run a large language model locally on my PC as a first step and add voice conversation functionality to it as a next step. I thought to myself, it would be interesting if, in addition to voice conversation with artificial intelligence, we could also control electronic hardwares with this system. Next, we'll learn how to have a voice conversation with a language model and turn it into a voice assistant, which can controls our electronic devices, plays music, and tell us the temperature and humidity.

you can see full video tutorial in below

first of all I wrote a program using langchain library in Python and Ollama in windows. by help of these tools we can run a LLM locally on our PC. for these purpose go to Ollama webpage and download Ollama for your OS. after installing Ollama you must download your desired LLM model, for example I downloaded phi model on my computer. for more details please see Ollama github page. after installing your LLM model, you must install these packages in python:

pip install langchain
pip install SpeechRecognition
pip install PyAudio
pip install pyttsx3

With the help of these packages, you can write a program to have an audio conversation with a large language model. This program recognizes your voice and then reads the produced text of the language model for you. you can see the program in below:

from langchain_community.llms import Ollama
import pyttsx3
import speech_recognition as sr


llm = Ollama(model = "phi")
engine = pyttsx3.init()


def audio_recognizer():
    speech = sr.Recognizer()
    
    with sr.Microphone() as source:
        
        audio = speech.listen(source)
        
        text = ''
        
        try:
            text = speech.recognize_google(audio,language = "en-US")
            #print(text)
            
        except:
            print("I can't hear you.....")
            
    return text
    


while True:
    
    print("speak with me: ")
    
    question = audio_recognizer()
    
    print("your question: ",question)
    
    response = llm.invoke(f"{question} please give me an brief answer.")

    print(response)
    engine.say(response)
    engine.runAndWait()

After writing this program, it is time to define a wake word detection for our voice assistant. I used the picovoice platform for this purpose. But I will not explain about this issue here and I will only talk about the main program. Before talking about the main program, I will introduce my simple hardware. I used a Raspberry Pi Pico W microcontroller to control devices in receiving voice commands. And with the help of Raspberry Pi and a SSR relay I control the lamp. I also used a DHT11 sensor to measure the temperature and humidity of the environment. You can easily understand circuit diagram from the code.

Now we come to our main program. you need to install these modules in python.

pip install struct
pip install playsound
pip install pvporcupine

in main program we use socket programming to send commands to Raspberry Pi Pico W. you can see source code in below. you must run this program on your PC or on a SBC like Raspberry Pi 5 or Nvidia Jetson Nano.

from langchain_community.llms import Ollama
import pyttsx3
import speech_recognition as sr
import pvporcupine 
import pyaudio
import struct
import socket
from playsound import playsound

# Create a TCP socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

# Specify the IP address and port number of your Raspberry Pi Pico W
# You can use the ipconfig or ifconfig command to find out the IP address of your device
ip = '192.168.1.231' # Change this to your Raspberry Pi Pico W IP address
port = 80 # Change this to your Raspberry Pi Pico W port number


llm = Ollama(model = "phi")

engine = pyttsx3.init()

s.connect((ip, port))

def audio_recognizer():
    speech = sr.Recognizer()
    
    with sr.Microphone() as source:
        
        audio = speech.listen(source)
        
        text = ''
        
        try:
            text = speech.recognize_google(audio,language = "en-US")
            #print(text)
            
        except:
            print("I can't hear you.....")
            
    return text

# Create a Porcupine instance with your custom wake word
handle = pvporcupine.create(
    access_key='your access key', # Get this from Picovoice Console
    keyword_paths=['your wake word detection model from picovoice platform']) # Download this from Picovoice Console

# Get the required audio input parameters
pa = pyaudio.PyAudio()
sample_rate = handle.sample_rate
frame_length = handle.frame_length
audio_stream = pa.open(
    rate=sample_rate,
    channels=1,
    format=pyaudio.paInt16,
    input=True,
    frames_per_buffer=frame_length)


while True:
    
    #print("speak with me: ")
    
    # Read one frame of audio
    pcm = audio_stream.read(frame_length)
    pcm = struct.unpack_from("h" * frame_length, pcm)

    # Pass the audio frame to Porcupine
    keyword_index = handle.process(pcm)

    # Check if a wake word is detected
    if keyword_index >= 0:
        
        engine.say("i am at your service........")
        engine.runAndWait()
        
        print("ask me a question: ")
        
        # Wake word detected, listen for the question
        question = audio_recognizer()
    
        print("your question: ",question)
        
        if question == "turn on the light":
            cmd = 'o'            
            cmd = cmd.encode()
            s.send(cmd)
            
            engine.say("light is on")
            engine.runAndWait()
            
        elif question == "turn off the light":            
            cmd = 'f'
            cmd = cmd.encode()
            s.send(cmd)
            
            engine.say("light is off")
            engine.runAndWait()
            
        elif question == "play a song":
            song = "D:\\voice assistant based on llms and rpi pico w\\['1'].wav"
            playsound(song)
            
        elif question == "what is the temp":
            cmd = "temp"
            cmd = cmd.encode()
            s.send(cmd)
            response = s.recv(1024)
            
            temp = float(response.decode())
            
            print(temp)
            engine.say(f"temperature is {temp} degree Celsius")
            engine.runAndWait()

        elif question == "what is the humidity":
            cmd = "humidity"
            cmd = cmd.encode()
            s.send(cmd)
            response = s.recv(1024)
            
            hum = float(response.decode())
            
            print(hum)
            engine.say(f"humidity is {hum} percent")
            engine.runAndWait()
                      
        else:
            response = llm.invoke(f"{question}, please give me a brief answer.")

            print(response)
            engine.say(response)
            engine.runAndWait()

also we have a script for Raspberry Pi Pico W, upload this MicroPython script on your Microcontroller.

import socket
import network
from time import sleep
from machine import Pin
import dht


ssid = 'Mrsh77'
password = '1m77n2299215r77#'


def connect():
    #Connect to WLAN
    wlan = network.WLAN(network.STA_IF)
    wlan.active(True)
    wlan.connect(ssid, password)
    while wlan.isconnected() == False:
        print('Waiting for connection...')
        sleep(1)
    ip = wlan.ifconfig()[0]
    print(f'Connected on {ip}')
    return ip

def open_socket(ip):
    address = (ip,80)
    connection = socket.socket()
    connection.bind(address)
    connection.listen(1)
    #print(connection)
    return connection

led_pin = Pin(0,Pin.OUT)
led_pin.value(0)


dht_sensor = dht.DHT11(Pin(15))

def air_params():
    sleep(4)
    dht_sensor.measure()
    temp = dht_sensor.temperature()
    humidity = dht_sensor.humidity()
    
    return temp, humidity
    
try:
    air_params()
    ip = connect()
    connection = open_socket(ip)
    while True:
        # Accept a connection from a client
        client, addr = connection.accept()
        print(f'Connected to {addr}')
        while True:
            # Receive data from the client
            data = client.recv(1024)
            if data:
                # Print the data to the console
                print(data)
                if data == b'o':
                    led_pin.value(1)
                    
                elif data == b'f':
                    led_pin.value(0)
                    
                # Send a response back to the client
                elif data == b"temp":
                    raw_temp, _ = air_params()
                    print(raw_temp)
                    final_temp = str(raw_temp).encode()
                    client.send(final_temp)
                    
                elif data == b"humidity":
                    _, raw_hum = air_params()
                    print(raw_hum)
                    final_hum = str(raw_hum).encode()
                    client.send(final_hum)

except KeyboardInterrupt:
    # Close the server socket
    connection.close()
    machine.reset()