Build a customizable conversational AI bot! This project uses the OpenAI GPT4o Realtime API. The Realtime API is super useful because it allows for natural, human-like conversation: the AI bot is interruptible, it can detect tone, emotion, and emphasis in your voice, and it is much better at handling different accents.
Th Realtime API also uses speech-in-speech-out, whereas prior implementations of AI voice services use speech-to-text and text-to-speech transcriptions. This means the Realtime API is wayfaster! Also handy for conversation :)
I've used this API for tons of projects, including a Spooky Fortune Teller for a Halloween party activity, as a Spanish tutor, and to voice puppets on my YouTube channel.
This tutorial shows you how to get started using the Realtime API, including how to set up the various AI resources, how to modify the system prompt and other basic settings, and how to deploy via a simple local app.
Here's a demo video of the Spooky Fortune Teller:
PrerequisitesFor this project, you'll need:
- An active Azure Subscription. If you don't have one, create a free Azure account before you begin.
- VS Code as a code editor.
- Azure OpenAI account. You will need to create a resource and obtain your OpenAI Endpoint and API Key
1. In the Azure Model Catalog, deploy text-embedding-ada-002 and gpt-35-turbo-16k models.
2.Grab your OAI Endpoint and API Key. To find these, click on the resource name on the top menu (far right of the window). This opens a sidebar on the right-hand side called 'Azure AI Services Resource' which has information about that particular resource.
In VS Code:
3. Clone the AOAI Samples GitHub repo: AOAI_Samples/realtime-assistant-support at main · monuminu/AOAI_Samples
4. Create a.env file in the root directory and update the following environment variables:
AZURE_OPENAI_API_KEY=XXXX
# replace with your Azure OpenAI API Key
AZURE_OPENAI_ENDPOINT=wss://xxxx.openai.azure.com/
# replace with your Azure OpenAI Endpoint
AZURE_OPENAI_DEPLOYMENT=gpt-4o-realtime-preview
#Create a deployment for the gpt-4o-realtime-preview model and place the deployment name here. You can name the deployment as per your choice and put the name here.
AZURE_OPENAI_CHAT_DEPLOYMENT_VERSION=2024-10-01-preview
#You don't need to change this unless you are willing to try other versions.
Update the Sample CodeThe only thing you need to update is the system prompt (line 61 in app.py). The system prompt modifies the behavior of the conversational bot to adapt to your needs, like a spooky fortune teller or a Spanish tutor :)
Tips for writing a good system prompt:
- Be instructive and tell the bot what to focus on: write as if you're talking directly to the AI.
Example: "You are a... You do... You talk like... You like to ask about..."
- Be super specific and give your bot personality adjectives.
Example: "you are a spooky fortune teller that gives eerie but positive predictions about the future for those brave enough to ask. You use metaphors and reference the stars. You ask questions to provide more clarity and find what matters to the human you are talking to."
- Use capital letters, * * to indicate bold, or hashes to make things more prominent. Explore all three to see how each changes the behavior of the bot.
Example: "You MUST NEVER talk about..." or "You **must never** talk about..." or "#You must never talk about..."
- Use hierarchical approaches.
Example: "Use A by default, but if not A then do B, and if not A or B then do C."
Other Simple (and fun!) changes
The RealtimeClient Class is defined in the __init__.py file at line 372:
This is where you can change the voice type, supported modalities, input and output audio formats, temperature (how much the bot hallucinates), give the bot tool capabilities like searching the Web, and more.
I had lots of fun playing around with different voices for the different projects, definitely recommend exploring that!
Take a look at the documentation to learn more: OpenAI Platform
Run the Application Locally1. Install the project dependencies. Open the terminal and navigate to the src folder of the repository. Then run the following command to install the needed Python packages:
pip install -r requirements.txt
2. Run the application: Run the following command to start the application:
chainlit run app.py -w
3. When you run the chainlit app, it should auto-open a localhost in your default browser. If not, open a browser and navigate to:
http://localhost:8000/
More to ExploreThat's it! You can go and build all the fun projects for yourself and your friends with the Realtime API! Some ideas for taking this further:
- Deploy on a Raspberry Pi and embed in a portable (or yard) prop for the holidays!
- Another Pi idea: put it in a doggy backpack and pretend that you can talk to your dog :D
- Use the framework to deploy a real app! If you work in Telecom, maybe you could FINALLY make a better customer service voice assistant.
Happy making!
Comments
Please log in or sign up to comment.