Published November 30, 2023

Generative AI - Virtual Agent with M5Stack

Virtual Agent build with M5Stack, also can run with many kind of ESP32. Base on ChatGPT / Google Palm2 LLM to build the backend for API.

AdvancedWork in progress10 days954

Generative AI - Virtual Agent with M5Stack

Things used in this project

Hardware components

M5Stack ESP32 Basic Core IoT Development Kit

Can use M5Stack Core 2 V1.1 or M5Stack Core3 with integrated speaker and microphone.

Pimoroni Speaker pHAT

Can use other I2S speaker module for the sound output. Just change the pin config.

SparkFun MEMS Microphone Breakout - INMP401 (ADMP401)

I2S Inmp441 Mems microphone

Software apps and online services

Arduino IDE

Story

Virtual Agent Power by M5Stack Core

Generative AI - Virtual Agent with M5Stack: Your Conversational Companion

In an era of rapid technological advancements, the quest for seamless communication has never been more pressing. With the advent of ChatGPT, Bing, and Google Bard, the world has gained access to powerful language models capable of answering complex questions and generating creative text formats. However, harnessing these capabilities often requires access to computers or laptops, limiting their reach and accessibility.

Enter the Virtual Agent with M5Stack, an innovative device that empowers you to interact with the world around you in a seamless and accessible way. Powered by the latest advancements in generative AI, this smart companion is designed to serve as your personal assistant, answering your questions and providing information in real time.

Empowering Communication and Accessibility

The Virtual Agent with M5Stack is a game-changer for those with mobility impairments, visual impairments, or low vision. With its intuitive voice-based interface, users can simply ask questions and receive clear, concise answers. This hands-free approach eliminates the need for typing or screen navigation, making it an ideal tool for everyday communication and information access.

This project tackles this challenge by developing a smart device powered by generative AI and the M5Stack platform, enabling users to engage in natural conversations with a virtual agent anytime, anywhere. Leveraging the power of PaLM2 API or ChatGPT API and Google Infrastructure, this device delivers up-to-date information and enhanced performance through integrated text-to-speech (TTS) and speech-to-text (STT) functionalities.

The device's keyword detection system, built using TinyML framework from Edge Impulse, activates the virtual agent, ready to receive and respond to user queries. For STT, the device records audio and sends it to Wit.ai for processing, converting it into text format suitable for chatbot interaction. Beside that I also program for Button A to trigger the asking question in case the detecting keyword is not working properly.

The backend chatbot, initially built using ChatGPT 3.5, was later transitioned to Google PaLM2 API due to its superior performance and cost-effectiveness. The device seamlessly sends the user's question to the chatbot, receives the response, and utilizes Google TTS library to read it aloud, providing a comprehensive and engaging communication experience.

This project not only demonstrates the transformative potential of generative AI but also highlights the versatility of the M5Stack platform in facilitating accessible and interactive communication. With its potential to empower individuals with low vision, the elderly, or those seeking an alternative to screen-based interactions, this device represents a significant step towards a more inclusive and connected future.

Some of demo of Agent chatbot can try here: (please noted the languages is Vietnamese)

https://multilchatbotpalm2-johnnietien.streamlit.app/

So far this is what I was achieved: