In this project I'm going to walk you through making a magical mirror using a Raspberry Pi and Gemini. I'm specifically using the Gemini Live API, which is currently in preview, so things may need to be updated later if you are viewing this tutorial a while after it's published, but that's OK! Have fun with it, and I hope you learn something useful.
This project is also using the new JavaScript/TypeScript SDK, so you can find the documentation here for expanding on the project.
Initial Hardware Setup
This project is based on this magic mirror project. While I stuck very closely to how they built the mirror, I did do a few things differently, which I'll cover here. That said, how you build your mirror has a very good chance of being different due to monitor sizes, materials and tools available, and a few other factors, so this is more an explanation around how I built *my* mirror, and hopefully it's a helpful addition to the original (and great) tutorial.
Here's a list of materials that I used for this project (I'll link to them on Amazon, though depending on when you read this, the links may no longer work, so I'll try my best to describe the items as well). I do want to mention that I'm not associated with *any* of the things I bought/list here, they just happened to be what I picked up or already had to make this project works - if you have something you think would work better, absolutely use it and let others know in the comments.
- A Raspberry Pi. I used a 3B because I have an obscene number of them, and I figured if I can make this project work with a device that only has 1GB of RAM, then others should be golden if they have a more powerful board. Make sure you have a power cord that is appropriate for the Pi - I use one with a power toggle that I really like so I can easily turn things on and off.
- An SD card. I'm using a 32gb card, but someone could easily use a smaller one and still be fine.
- A mini HDMI to HDMI cable. I'm using one that uses a left 90 degree angle to better fit/hide it in the setup. If you're using a Raspberry Pi that is a different version than the 3B or a different monitor that does not have a mini HDMI port, you may need a different connector.
- This monitor (KYY Portable Monitor 15.6inch) because it's thin, small, and most importantly, affordable.
- 18x24x0.04 inch two-way mirrored acrylic. You'll want to make sure anything you buy is two-way mirrored glass because the way this is set up, the monitor is placed behind it, so you want to have a reflective surface that allows light to pass through it.
- 11.69x16.53x0.08 inch clear acrylic sheets. Thickness doesn't matter as much for this, just the height and width. This will be used as a piece to hold the Raspberry Pi and monitor internals together.
- Black cardstock. This is for hiding the edges of the mirror around the monitor, so you'll just need to make sure it's large enough to be cut down to 16x10.75"
- A microphone. I'm using an AT2020 because I already owned it, but you should be able to use whatever you have available. When we get to the code for this tutorial, I'll point out the one value that needs to be changed to match your microphone's input sample rate (for example, I use 44, 000 for the AT2020, but other microphones may be 16, 000).
- Black duct tape for attaching things and blocking light from coming in through the edges of the cardstock to the monitor.
- Double sided mounting tape. This is used for attaching the monitor internals back to it.
- M3 screws and spacers. You'll need a few different lengths, mentioned in the original tutorial. I personally went with a bigger variety box because these things are used in a lot of different projects.
- Tools to disassemble the monitor frame. I picked up a cheap set of electronics opening tools that worked out well, but I only used a couple of the pieces. You'll also want a heat gun or hair dryer to heat up and soften the glue attaching the monitor electronics to the plastic frame.
- Optional materials for a standing frame. I used some scrap MDF I had from another project and laser cut a stand that I found for free online (though I did scale it up), but I think you could also use a painting easel or anything else that works for you. I highly recommend using some kind of stand for this project.
Alright, now that I think that's all the supplies, let's get into actually making this awesome project. I'm going to skip over how to disassemble the monitor because I think that's covered really well in the original tutorial, and I think credit should be given where credit is due for a great project start, so go check that out.
To start, I don't have the steadiest hand for scoring and snapping the mirrored acrylic, and cutting in various ways caused a lot of chipping. After a few failed attempts (and ruined acrylic. Sorry Google, but thanks for letting me expense this!), I went all in with cutting my sheet into two pieces that are 12"x18" on a table saw, which worked out well. If you're more comfortable with other methods, that's great, but I just want to say up front that it isn't the easiest material to work with.
After getting my mirrored acrylic into a workable size, I figured I'd save myself the next headache and move everything to the laser cutter. Heads up, if you're not *very* familiar with laser cutters, I really recommend baby sitting it during cuts, especially with the cardstock. Did I set a piece on fire during my first attempt? Absolutely. Will I set something on fire again in the future when I get careless? Probably, but hopefully not! Definitely be mindful of your tools.
If you happen to be using the same monitor that I linked above, the dimensions that I've found work for the overall mirror are 16" by 10.75", with a cutout in the black cardstock that's 13.25" by 7.25". I also added the holes along the edges for the M3 screws so you don't need to drill them or 3D print the jigs (though I did print those jigs on a separate attempt on this, and they do work out really well!). I've attached a PDF that's 96 DPI of the file that I designed with Affinity Designer. You'll want to cut the green and red pieces in every piece (mirrored acrylic, clear acrylic, and the cardstock), but the blue center rectangle is only cut out of the cardstock.
At the end of this you should have the three separate materials all cut to matching sizes with aligned holes. Don't forget to remove the plastic covers on the two acrylic sheets before screwing everything together!
For the remainder of this project assembly, follow the original tutorial. Once you have your Raspberry Pi setup and attached to it, you'll need to go through the steps of setting up the Magic Mirror software, making sure your display is rotated 90 degrees, the scale is correct, and everything starts up on boot. Once you have a base project, it's time to dive into the new Gemini connected magic mirror code.
I highly recommend being able to SSH into your Raspberry Pi as everything else for this tutorial will be done through a terminal and git, though you can also hook up a keyboard to the mirror and interact with the device's terminal directly.
Initial Code Setup
All of the code for this project is attached to this Hackster.io project, though if you want to run the latest code directly on your mirror without building it from the ground up, you can clone this github project into your modules folder on the Raspberry Pi under the Magic Mirror project, run npm install, and then update your configuration file to display the module. This will constantly stream audio, so you may want to modify the project to allow for budget considerations (for example, add push to talk on the microphone), but since this is a hack project, you should definitely modify in any way that makes sense for you. If you are in a noisy environment, be aware that the API is very sensitive and prone to interruptions right now when it detects new sounds.
You will also need to update the configuration file to include your Gemini API key, which can be created or found under Google's AI Studio here.
While the Live API, which is the core of this project, is available for free up to a limited number of requests per day, image generation is not (at the time of this writing), so you may need to change your app depending on what you're attempting to do. You can find a full description of pricing here as new models are consistently coming out with different capabilities and pricing.
The base for this project is the Magic Mirror Module template, which you can find and clone here if you would like to follow along from the ground up. After you have a base project, it's time to update the Magic Mirror's config/config.js file. For reference, my addition for this module looks like this:
{
module: 'MMM-Gemini',
position: 'lower_third',
config: {
apiKey: 'MY_API_KEY_HERE',
}
}
The most important part is your API key in the config file, as this is what drives setup and available information, and it is local to your machine.
There's three main files that we'll work with for this project: MMM-Gemini.js (though if you cloned the template, it will be called MMM-Template.js), which handles all of the UI, MMM-Gemini.css, which styles that UI, and node_helper.js, which is where all of the heavy lifting happens. To keep things simple, I'm going to skip over the css file, but you can find the code for it with this project page, or in the completed GitHub project.
As for the UI file, we're basically creating a UI state machine that goes through INITIALIZING, READY, RECORDING, and ERROR. The socketNotificationReceived function can receive a payload and notification from the helper that tells it which state should be displayed in the UI, and then it will update the DOM.
This function will also accept notifications for GEMINI_IMAGE_GENERATING and GEMINI_IMAGE_GENERATED to display a progress spinner or a generated image (received in base64 format) when that operation is invoked, or it will display any text generated by the Gemini Live API until a turnComplete response is received, then it will clear that text when the next response is sent. You can find all of the code for the UI attached to this project and use that as a starting point.
If everything works as expected, you should have a UI similar to this:
With the core app put together, it's time to dig into the node_helper.js file and really get into the features of this magic mirror with Gemini!
Setting up the Gemini Live API and Speech Inputs
The first major step to making this magic mirror work is that we will need to be able to talk to the mirror, send that audio data to the Gemini Live API in real time, and wait for a response. To keep things simple, we'll start by asking for responses to be sent in text, and our UI will display it. Going into the node_helper.js file, let's add the various constants that will be required throughout this project.
const NodeHelper = require("node_helper")
const { GoogleGenAI, Modality, DynamicRetrievalConfigMode, Type, PersonGeneration } = require("@google/genai")
const recorder = require('node-record-lpcm16')
const { Buffer } = require('buffer')
const Speaker = require('speaker')
const INPUT_SAMPLE_RATE = 44100 // Recorder captures at 44.1KHz for AT2020, otherwise 16000 for other microphones. Hardware dependent
const OUTPUT_SAMPLE_RATE = 24000 // Gemini outputs at 24kHz
const CHANNELS = 1
const AUDIO_TYPE = 'raw' // Gemini Live API uses raw data streams
const ENCODING = 'signed-integer'
const BITS = 16
const GEMINI_INPUT_MIME_TYPE = `audio/pcm;rate=${INPUT_SAMPLE_RATE}`
const GEMINI_SESSION_HANDLE = "magic_mirror"
const GEMINI_MODEL = 'gemini-2.0-flash-live-001'
The NodeHelper constant should have already existed in your base project, as this is what the Magic Mirror project uses to know that this is the helper for the module.
Since this project is using the newest JavaScript/TypeScript SDK, we'll need to import the google/genai library, and include multiple type objects that will be used throughout the project. You can find more information on these types from the official documentation or source.
The buffer and speaker are both related to outputting audio, which we'll do later in this project. The recorder is what we'll use to record a live audio stream from the Pi's microphone to send to the Live API.
Moving into the next block, the INPUT_SAMPLE_RATE is related to the microphone that you have attached to your Raspberry Pi. Since I'm using an AT2020, my sample rate is 44100, but your microphone may have a different value.
The OUTPUT_SAMPLE_RATE is the expected sample rate from the Gemini API when we start playing back received audio. The API currently only outputs at 24kHz. The API uses one channel, and outputting raw PCM audio data with signed-integer as the encoding and 16 bits. The GEMINI_INPUT_MIME_TYPE is the type of data that you'll send from the Pi to the API. The GEMINI_SESSION_HANDLE is a value that we'll use later to maintain session continuity between closes and reopens, as currently the Gemini Live API will automatically close after about ten minutes.
Finally, we have the GEMINI_MODEL. At the time of this writing the Gemini Live API is in preview, so this model will absolutely change over time based on when you are reading this tutorial. You'll want to check the Live API documentation to make sure you're using the best option for your project.
With that set, it's time to add all of the variables that will be used throughout this project. I initialize them in NodeHelper.create(), then I have an applyDefaultState() function that can be used to reset everything when the session closes or if an error occurs. I also added a set of logging functions for debugging. You don't *need* to include those, but I found them useful while working through this project, so I'll leave them as they are for this writeup. I also created a helper function called sendToFrontend to wrap sending the socket notification over to the UI frontend (MMM-Gemini.js).
module.exports = NodeHelper.create({
genAI: null,
liveSession: null,
apiKey: null,
recordingProcess: null,
isRecording: false,
audioQueue: [],
persistentSpeaker: null,
processingQueue: false,
apiInitialized: false,
connectionOpen: false,
apiInitializing: false,
imaGenAI: null,
// Logger functions
log: function(...args) { console.log(`[${new Date().toISOString()}] LOG (${this.name}):`, ...args) },
error: function(...args) { console.error(`[${new Date().toISOString()}] ERROR (${this.name}):`, ...args) },
warn: function(...args) { console.warn(`[${new Date().toISOString()}] WARN (${this.name}):`, ...args) },
sendToFrontend: function(notification, payload) { this.sendSocketNotification(notification, payload) },
applyDefaultState() {
this.genAI = null
this.liveSession = null
this.recordingProcess = null
this.isRecording = false
this.audioQueue = []
this.persistentSpeaker = null
this.processingQueue = false
this.apiInitialized = false
this.connectionOpen = false
this.apiInitializing = false
this.closePersistentSpeaker()
this.imaGenAI = null
},
Most of these are for maintaining state, plus a closePersistentSpeaker() function that we'll add when it's time for audio out. If you're following along, feel free to comment that out until later. You'll also notice that there's values for genAI and imaGenAI. genAI is what will manage our live session, whereas imaGenAI is the Gemini object that will be used for image generation. If you're not using image generation in your project, you can remove that.
Now let's get into some of the good stuff. I have a function called initialize that will be used to kick off most of what we need for this project. For now I'll post an edited down version that we can add to as we go along.
async initialize(apiKey) {
this.log(">>> initialize called")
if (this.apiInitialized || this.apiInitializing) {
this.warn(`API initialization already complete or in progress. Initialized: ${this.apiInitialized}, Initializing: ${this.apiInitializing}`)
if (this.connectionOpen) {
this.log("Connection already open, sending HELPER_READY")
this.sendToFrontend("HELPER_READY")
}
return
}
if (!apiKey) {
this.error(`API Key is missing! Cannot initialize`)
this.sendToFrontend("HELPER_ERROR", { error: "API Key missing on server" })
return
}
this.apiKey = apiKey
this.apiInitializing = true
this.log(`Initializing GoogleGenAI...`)
try {
this.sendToFrontend("INITIALIZING")
this.log("Step 1: Creating GoogleGenAI instances...")
this.genAI = new GoogleGenAI({
apiKey: this.apiKey,
// httpOptions: { 'apiVersion': API_VERSION }
})
this.log(`Step 2: GoogleGenAI instance created.`)
this.log(`Step 3: Attempting to establish Live Connection with ${GEMINI_MODEL}...`)
this.liveSession = await this.genAI.live.connect({
model: GEMINI_MODEL,
callbacks: {
onopen: () => {
this.log(">>> Live Connection Callback: onopen triggered!")
this.connectionOpen = true
this.apiInitializing = false
this.apiInitialized = true
this.log("Connection OPENED. Sending HELPER_READY")
this.sendToFrontend("HELPER_READY")
},
onmessage: (message) => { this.handleGeminiResponse(message) },
onerror: (e) => {
this.error(`Live Connection ERROR: ${e?.message || e}`)
this.connectionOpen = false
this.apiInitializing = false
this.apiInitialized = false
this.liveSession = null
this.stopRecording(true)
this.closePersistentSpeaker() // Close speaker on error
this.processingQueue = false
this.audioQueue = []
this.sendToFrontend("HELPER_ERROR", { error: `Live Connection Error: ${e?.message || e}` })
},
onclose: async (e) => {
this.warn(`Live Connection CLOSED:`)
this.warn(JSON.stringify(e, null, 2))
const wasOpen = this.connectionOpen
if (wasOpen) {
this.sendToFrontend("HELPER_ERROR", { error: `Live Connection Closed Unexpectedly. Retrying...` })
} else { this.log("Live Connection closed normally") }
this.audioQueue = []
this.stopRecording(true)
this.closePersistentSpeaker() // Close speaker on close
this.applyDefaultState()
await this.initialize(this.apiKey)
},
},
config: {
responseModalities: [Modality.TEXT],
},
})
this.log(`Step 4: live.connect call initiated...`)
} catch (error) {
this.error(`API Initialization failed:`, error)
this.liveSession = null
this.apiInitialized = false
this.connectionOpen = false
this.apiInitializing = false
this.closePersistentSpeaker() // Ensure speaker is closed on init failure
this.processingQueue = false
this.audioQueue = []
this.sendToFrontend("HELPER_ERROR", { error: `API Initialization failed: ${error.message || error}` })
}
},
This code verifies that an API key is available, creates the GoogleGenAI object that is used to work with the Gemini API, and then creates a new LiveSession. This live session is using the SDK's built in web socket framework to handle sending and receiving data between the Gemini model and the Raspberry Pi, and it has a set of callbacks that will drive the state of the device. The most important is onmessage, which will send the response from Gemini to a new function that will determine how the mirror should react. There's also code here for onclose that will reset playback state and a few other things that haven't been written yet, so feel free to comment out the onclose callback until the end.
You'll also notice a config object. This will be a core part of this project, as it will contain every setting related to that we're doing with the mirror and the Gemini API. For now it'll just have a responseModality of Modality.TEXT, meaning we want the Gemini model to only respond with text in the onmessage callback.
Moving down to the helper's socketNotificationReceived function, we'll want to go into the notification switch statement and add new cases for START_CONNECTION and START_CONTINUOUS_RECORDING. The START_CONNECTION case is what will call initialize, which will tell the frontend to update it's UI state after initialization has completed. Once that UI state has been updated, another notification will be received to start recording. The full function looks like this:
socketNotificationReceived: async function(notification, payload) {
switch (notification) {
case "START_CONNECTION":
this.log(`>>> socketNotificationReceived: Handling START_CONNECTION`)
if (!payload || !payload.apiKey) {
this.error(`START_CONNECTION received without API key`)
this.sendToFrontend("HELPER_ERROR", { error: "API key not provided by frontend" })
return
}
try { await this.initialize(payload.apiKey) } catch (error) {
this.error(">>> socketNotificationReceived: Error occurred synchronously when CALLING initialize:", error)
this.sendToFrontend("HELPER_ERROR", { error: `Error initiating connection: ${error.message}` })
}
break
case "START_CONTINUOUS_RECORDING":
this.log(`>>> socketNotificationReceived: Handling START_CONTINUOUS_RECORDING`)
if (!this.connectionOpen || !this.liveSession) {
this.warn(`Cannot start recording, API connection not ready/open. ConnOpen=${this.connectionOpen}, SessionExists=${!!this.liveSession}`)
this.sendToFrontend("HELPER_ERROR", { error: "Cannot record: API connection not ready" })
if (!this.apiInitialized && !this.apiInitializing && this.apiKey) {
this.warn("Attempting to re-initialize API connection...")
await this.initialize(this.apiKey) // Await re-initialization
}
return
}
if (this.isRecording) {
this.warn(`Already recording. Ignoring START_CONTINUOUS_RECORDING request`)
return
}
this.startRecording()
break
}
},
Now actually doing the recording is a big step, so I'll break it into smaller parts. We'll start by creating a new function called startRecording(). This will check to see if the device is already recording, if the connection is already open, or if the live stream is running. If any of that is true, the function will exit early.
startRecording() {
this.log(">>> startRecording called")
if (this.isRecording) {
this.warn("startRecording called but already recording")
return
}
if (!this.connectionOpen || !this.liveSession) {
this.error("Cannot start recording: Live session not open")
this.sendToFrontend("HELPER_ERROR", { error: "Cannot start recording: API connection not open" })
return
}
If everything is in the clear, then it's time to start recording. We can update our isRecording state value, then create a recorderOptions object.
this.isRecording = true
this.log(">>> startRecording: Sending RECORDING_STARTED to frontend")
this.sendToFrontend("RECORDING_STARTED")
const recorderOptions = {
sampleRate: INPUT_SAMPLE_RATE,
channels: CHANNELS,
audioType: AUDIO_TYPE,
encoding: ENCODING,
bits: BITS,
threshold: 0,
}
this.log(">>> startRecording: Recorder options:", recorderOptions)
this.log(`>>> startRecording: Using input MIME Type: ${GEMINI_INPUT_MIME_TYPE}`)
From there, it's time to call record on the recorder that we defined at the top of the file, then store a reference to the audio stream. I'm also creating a chunkCounter here for debugging, but you can ignore that if you want to have a bit cleaner code.
try {
this.log(">>> startRecording: Attempting recorder.record()...")
this.recordingProcess = recorder.record(recorderOptions)
this.log(">>> startRecording: recorder.record() call successful. Setting up streams...")
const audioStream = this.recordingProcess.stream()
let chunkCounter = 0 // Reset counter for new recording session
The audioStream will have a listener for any data that comes through. If it's not empty (which is a case you can encounter if you unplug your mic, or if you have a push-to-talk set up for when you're not using the mic), then it'll convert that audio data into a base64 encoded string, which is then sent to the Gemini Live API as a new JSON payload using the liveSession.sendRealtimeInput function.
audioStream.on('data', async (chunk) => {
if (!this.isRecording || !this.connectionOpen || !this.liveSession) {
if (this.isRecording) {
this.warn(`Recording stopping mid-stream: Session/Connection invalid...`)
this.stopRecording(true) // Force stop if state is inconsistent
}
return
}
if (chunk.length === 0) {
return // Skip empty chunks
}
const base64Chunk = chunk.toString('base64')
chunkCounter++ // Increment counter for valid chunks
try {
const payloadToSend = {
media: {
mimeType: GEMINI_INPUT_MIME_TYPE,
data: base64Chunk
}
}
// Check liveSession again just before sending
if (this.liveSession && this.connectionOpen) {
await this.liveSession.sendRealtimeInput(payloadToSend)
} else {
this.warn(`Cannot send chunk #${chunkCounter}, connection/session lost just before send`)
this.stopRecording(true) // Stop recording if connection lost
}
} catch (apiError) {
const errorTime = new Date().toISOString()
this.error(`[${errorTime}] Error sending audio chunk #${chunkCounter}:`, apiError)
if (apiError.stack) {
this.error(`Gemini send error stack:`, apiError.stack)
}
// Check specific error types if possible, otherwise assume connection issue
if (apiError.message?.includes('closed') || apiError.message?.includes('CLOSING') || apiError.code === 1000 || apiError.message?.includes('INVALID_STATE')) {
this.warn("API error suggests connection closed/closing or invalid state")
this.connectionOpen = false // Update state
}
this.sendToFrontend("HELPER_ERROR", { error: `API send error: ${apiError.message}` })
this.stopRecording(true) // Force stop on API error
}
})
For the rest of this function, I have listeners for error, end, and exit that I'll include here for completion.
audioStream.on('error', (err) => {
this.error(`Recording stream error:`, err)
if (err.stack) {
this.error(`Recording stream error stack:`, err.stack)
}
this.sendToFrontend("HELPER_ERROR", { error: `Audio recording stream error: ${err.message}` })
this.stopRecording(true) // Force stop on stream error
})
audioStream.on('end', () => {
this.warn(`Recording stream ended`) // Normal if stopRecording was called, unexpected otherwise
if (this.isRecording) {
// This might happen if the underlying recording process exits for some reason
this.error("Recording stream ended while isRecording was still true (unexpected)")
this.sendToFrontend("HELPER_ERROR", { error: "Recording stream ended unexpectedly" })
this.stopRecording(true) // Ensure state is consistent
}
})
this.recordingProcess.process.on('exit', (code, signal) => {
const wasRecording = this.isRecording // Capture state before potential modification
this.log(`Recording process exited with code ${code}, signal ${signal}`) // Changed from warn to log
const currentProcessRef = this.recordingProcess // Store ref before nullifying
this.recordingProcess = null // Clear the reference immediately
if (wasRecording) {
// If we *thought* we were recording when the process exited, it's an error/unexpected stop
this.error(`Recording process exited unexpectedly while isRecording was true`)
this.sendToFrontend("HELPER_ERROR", { error: `Recording process stopped unexpectedly (code: ${code}, signal: ${signal})` })
this.isRecording = false // Update state
this.sendToFrontend("RECORDING_STOPPED") // Notify frontend it stopped
}
else {
// If isRecording was already false, this exit is expected (due to stopRecording being called)
this.log(`Recording process exited normally after stop request`)
}
})
} catch (recordError) {
this.error(">>> startRecording: Failed to start recording process:", recordError)
if (recordError.stack) {
this.error(">>> startRecording: Recording start error stack:", recordError.stack)
}
this.sendToFrontend("HELPER_ERROR", { error: `Failed to start recording: ${recordError.message}` })
this.isRecording = false // Ensure state is correct
this.recordingProcess = null // Ensure reference is cleared
}
},
I also have a stopRecording() function that is used for error cases and when the live stream closes. This is pretty straightforward resetting everything and updating the UI, so I won't get into it for this tutorial.
stopRecording(force = false) {
if (this.isRecording || force) {
if (!this.recordingProcess) {
this.log(`stopRecording called (Forced: ${force}) but no recording process instance exists`)
if (this.isRecording) {
this.warn("State discrepancy: isRecording was true but no process found. Resetting state")
this.isRecording = false
this.sendToFrontend("RECORDING_STOPPED") // Notify frontend about the state correction
}
return
}
this.log(`Stopping recording process (Forced: ${force})...`)
const wasRecording = this.isRecording // Capture state before changing
this.isRecording = false // Set flag immediately
// Store process reference before potentially nullifying it in callbacks
const processToStop = this.recordingProcess
try {
const stream = processToStop.stream()
if (stream) {
this.log("Removing stream listeners")
stream.removeAllListeners('data')
stream.removeAllListeners('error')
stream.removeAllListeners('end')
}
if (processToStop.process) {
this.log("Removing process 'exit' listener")
processToStop.process.removeAllListeners('exit')
this.log("Sending SIGTERM to recording process")
processToStop.process.kill('SIGTERM')
} else {
this.warn("No underlying process found in recordingProcess object to kill")
}
// Call the library's stop method, which might also attempt cleanup
this.log(`Calling recorder.stop()...`)
processToStop.stop()
} catch (stopError) {
this.error(`Error during recorder cleanup/stop():`, stopError)
if (stopError.stack) {
this.error(`Recorder stop() error stack:`, stopError.stack)
}
} finally {
// Don't nullify this.recordingProcess here; let the 'exit' handler do it.
if (wasRecording) {
this.log("Recording stop initiated. Sending RECORDING_STOPPED if process exits")
// Actual RECORDING_STOPPED is sent by the 'exit' handler or state correction logic
} else {
this.log("Recording was already stopped or stopping, no state change needed")
}
}
} else {
this.log(`stopRecording called, but isRecording flag was already false`)
// Defensive cleanup if process still exists somehow
if (this.recordingProcess) {
this.warn("stopRecording called while isRecording=false, but process existed. Forcing cleanup")
this.stopRecording(true) // Force stop to clean up the zombie process
}
}
},
Finally, let's add the handleGeminiResponse() function. This block will be updated for the various types of responses we get from the Gemini Live API, but for a base version, we'll simply want to retrieve the content of the message and check if text exists. If it does, we'll send that text chunk to the UI to display. I also have an if statement to check if setup is complete, but I'm not currently doing anything with that for my version of this project.
async handleGeminiResponse(message) {
if (message?.setupComplete) { return } // Ignore setup message
let content = message?.serverContent?.modelTurn?.parts?.[0]
// Handle Text
if (content?.text) {
this.log(`Extracted text: ` + content.text)
this.sendToFrontend("GEMINI_TEXT_RESPONSE", { text: content.text })
}
},
In addition, we can check to see if the Live API is saying that we have completed the response turn. This is a really valuable response because it tells us when the model is done generating text, and later it will tell us when it thinks playback for an audio response should be complete. You can add this block to your handlegeminiResponse now as it will be used to clear text between responses by the UI.
Alright, so that was a lot, and I brushed over a bit of it because there is a lot of boiler plate for state management, but at this point you should be able to talk to the mirror and display the text response from the Gemini Live API.
Audio Responses and Interruptions
Now that we have text working, let's dig into how we can get audio back, since honestly, a magic mirror should really be talking to you. The way this works is that we'll set the responseModality to Modality.AUDIO at initialization, and then when the Gemini Live API responds, it will send a base64 encoded string for multiple audio chunks that can be played back on the device. Since those responses come in quickly, rather than at the time they would be played out loud, we'll also need to create a queuing system that movies to the next audio chunk when any current ones have finished playing. On top of all of this, the Gemini Live API supports interruptions, so if the user says something while audio is playing back, we can clear the queue, leave the speaker open, and wait for the next audio response to come back from the API.
Let's start by updating the live session responseModality.
responseModalities: [Modality.AUDIO],
We'll also create a new function called processQueue to handle the playback logic. Let's go over it in steps. First we'll want to see if the queue has anything in it. If it doesn't, then we can close the speaker, assuming the queue wasn't cleared by an interruption and expecting more audio chunks soon.
processQueue(interrupted) {
// 1. Check Stop Condition (Queue Empty)
if (this.audioQueue.length === 0) {
this.log("_processQueue: Queue is empty. Playback loop ending")
// Speaker should be closed by the last write callback's .end()
// Safeguard: ensure flag is false and close speaker if it exists.
this.processingQueue = false
if (!interrupted && this.persistentSpeaker) {
this.warn("_processQueue found empty queue but speaker exists! Forcing close")
this.closePersistentSpeaker()
}
return
}
Next we can set the processingQueue flag to true for state management.
// 2. Ensure Playback Flag is Set
if (!this.processingQueue) {
this.processingQueue = true
this.log("processQueue: Starting playback loop")
}
Then we will want to check to see if our speaker is already created, otherwise we'll create a new one. One thing I want to call out here is that I'm specifically using a persistent speaker that's stored at the class level because we want to minimize the amount of times the speaker is created and destroyed. It's a fine balance with memory, which may be less of an issue for you if you're using a newer Raspberry Pi with more than 1GB of RAM.
// 3. Ensure Speaker Exists (Create ONLY if needed)
if (!this.persistentSpeaker || this.persistentSpeaker.destroyed) {
this.log("Creating new persistent speaker instance")
try {
this.persistentSpeaker = new Speaker({
channels: CHANNELS,
bitDepth: BITS,
sampleRate: OUTPUT_SAMPLE_RATE,
})
this.persistentSpeaker.once('error', (err) => {
this.error('Persistent Speaker Error:', err)
this.closePersistentSpeaker()
})
this.persistentSpeaker.once('close', () => {
this.log('Persistent Speaker Closed Event')
// Ensure state is clean if closed unexpectedly or after end()
this.persistentSpeaker = null
if (this.processingQueue) {
this.log('Speaker closed. Resetting processing flag')
this.processingQueue = false
}
})
this.persistentSpeaker.once('open', () => this.log('Persistent Speaker opened'))
} catch (e) {
this.error('Failed to create persistent speaker:', e)
this.persistentSpeaker = null
this.processingQueue = false
this.audioQueue = []
return
}
}
// Check again after attempting creation
if (!this.persistentSpeaker) {
this.error("Cannot process queue, speaker instance is not available")
this.processingQueue = false // Stop processing
return
}
Once we know we have a speaker available, we can retrieve the base64 encoded audio clip string from the queue and write it to a buffer before sending that buffer to the speaker to play back.
// 4. Get and Write ONE Chunk
const chunkBase64 = this.audioQueue.shift() // Take the next chunk
const buffer = Buffer.from(chunkBase64, 'base64')
this.persistentSpeaker.write(buffer, (err) => {
if (err) {
this.error("Error writing buffer to persistent speaker:", err)
// Speaker error listener should handle cleanup via closePersistentSpeaker()
// Avoid calling closePersistentSpeaker directly here to prevent race conditions
return
}
If everything has gone well up to this point, we'll want to see if there's anything remaining in the queue, and then have the processQueue function call itself to move through the next chunk, otherwise we'll escape the entire playback loop.
// 5. Decide Next Step (Continue Loop or End Stream)
if (this.audioQueue.length > 0) {
// More chunks waiting? Immediately schedule the next write
this.processQueue(false)
} else {
// Queue is empty *after* taking the last chunk
this.log("Audio queue empty after playing chunk. Ending speaker stream gracefully")
if (this.persistentSpeaker && !this.persistentSpeaker.destroyed) {
// Call end() - allows last chunk to play, then 'close' event fires
this.persistentSpeaker.end(() => {
this.log("Speaker .end() callback fired after last chunk write")
// The 'close' listener handles the actual state cleanup
})
} else {
// Speaker already gone? Ensure flag is false
this.processingQueue = false
}
}
})
},
And while we're here, let's define the closePersistentSpeaker function that is used for error cases. This doesn't do too terribly much besides close the speaker, remove listeners, and try to clean up our state.
closePersistentSpeaker() {
if (this.persistentSpeaker && !this.persistentSpeaker.destroyed) {
this.log("Closing persistent speaker...")
try {
// Remove listeners to prevent acting on events after initiating close
this.persistentSpeaker.removeAllListeners() // Remove all listeners associated with this speaker
// Call end to flush and close gracefully
// The 'close' event should ideally handle state reset, but do it defensively here too
this.persistentSpeaker.end(() => {
this.log("Speaker .end() callback fired during closePersistentSpeaker")
})
this.persistentSpeaker = null
this.processingQueue = false // Reset state immediately after initiating close
this.log("Speaker close initiated, state reset")
} catch (e) {
this.error("Error trying to close persistent speaker:", e)
this.persistentSpeaker = null // Ensure null even if close fails
this.processingQueue = false
}
} else {
// If speaker doesn't exist or already destroyed, ensure state is correct
this.persistentSpeaker = null
this.processingQueue = false
}
}
Finally, before testing this, let's make sure we're handling the audio and interrupt message types that come back from the Gemini Live API. We can do this by adding the following blocks to the handleGeminiResponse function.
// Handle the interrupt flag
if(message?.serverContent?.interrupted) {
this.log("message: " + JSON.stringify(message))
this.log("*** Interrupting ***")
this.audioQueue = []
this.processQueue(true)
return
}
// Extract and Queue Audio Data
let extractedAudioData = content?.inlineData?.data
if (extractedAudioData) {
this.audioQueue.push(extractedAudioData)
// --- Trigger Playback if Threshold Reached and Not Already Playing ---
if (!this.processingQueue) {
this.log(`Starting playback`)
this.processQueue(false) // Start the playback loop
}
}
Now you should be able to restart your mirror module to have a conversation with it, as well as interrupt it in the middle of audio playback to change the course of the dialog. Pretty cool, right?
Function Calling, Search Grounding, and Image Generation
Now that we have the core of the project in, it's time to take it a step above. Function calling is one of my favorite features of the Gemini API as it really opens up any device or app using Gemini to doing really interesting things based on interactions with the model. To enable function calling with our mirror, we'll need to go back to the config object in initialize and add a tools array. This will include a functionDeclarations array with one function to generate images, which I've named generate_image, as well as a description that the Gemini model uses to know when it should call that function, and any other instructions related to the function. For this case, I've told the mirror that it should be whimsical and fun while using a fantasy painting style. We'll get more into adding personality to the mirror in a little bit. Within that individual function, we'll also need to include a parameter for the prompt that will be used for generating an image.
config: {
responseModalities: [Modality.AUDIO],
tools: [{
functionDeclarations: [
{
name: "generate_image",
description: "This function is responsible for generating images that will be displayed to the user when something is requested, such as the user asking you to do something like generate, show, display, or saying they want to see *something*, where that something will be what you create an image generation prompt for. Style should be like an detailed realistic fantasy painting. Keep it whimsical and fun. Remember, you are the all powerful and light-hearted magical mirror.",
parameters: {
type: Type.OBJECT,
description: "This object will contain a generated prompt for generating a new image through the Gemini API",
properties: {
image_prompt: {
type: Type.STRING,
description: "A prompt that should be used with image generation to create an image requested by the user using Gemini. Be as detailed as necessary."
},
},
},
required: ['image_prompt'],
},
]
}]
},
In addition to function calling, there's two more tools that I've added to the mirror in this section. The first is googleSearch. This lets the Gemini model use various tools available through Google, such as finding the weather or current time. I also enabled googleSearchRetrieval, allowing the mirror to do a Google search to find the latest and most relevant information about requests where it''s applicable. You can add these two tools within the tools array just above functionDeclarations.
googleSearch: {},
googleSearchRetrieval: {
dynamicRetrievalConfig: {
mode: DynamicRetrievalConfigMode.MODE_DYNAMIC,
}
},
At this point we should be able to expect function calls to be triggered by the Gemini Live API, so let's make sure we accept those messages in handleGeminiResponse. Returning to that function, we can check to see if a function call block exists in the message, and then we can send the function call payload to a separate function that will handle that code.
let functioncall = message?.toolCall?.functionCalls?.[0]
// Handle Function Calls
if (functioncall) {
await this.handleFunctionCall(functioncall)
}
Within handleFunctionCall, we'll make sure we have all of the information we need for a function, then use a switch statement to determine which function was called. Since we're only supporting one function right now, we'll either generate an image, or we'll exit this function.
// Handle function calls requested by Gemini
async handleFunctionCall(functioncall) {
let functionName = functioncall.name
let args = functioncall.args
if(!functionName || !args) {
this.warn("Received function call without name or arguments:", functioncall)
return
}
this.log(`Handling function call: ${functionName}`)
switch(functionName) {
case "generate_image":
let generateImagePrompt = args.image_prompt
if (generateImagePrompt) {
this.log(`Generating image with prompt: "${generateImagePrompt}"`)
this.sendToFrontend("GEMINI_IMAGE_GENERATING")
try {
const response = await this.imaGenAI.models.generateImages({
model: 'imagen-3.0-generate-002', // Consider making model configurable
prompt: generateImagePrompt,
config: {
numberOfImages: 1,
includeRaiReason: true,
personGeneration: PersonGeneration.ALLOW_ADULT,
},
})
// Handle potential safety flags/RAI reasons
if (response?.generatedImages?.[0]?.raiReason) {
this.warn(`Image generation flagged for RAI reason: ${response.generatedImages[0].raiReason}`)
this.sendToFrontend("GEMINI_IMAGE_BLOCKED", { reason: response.generatedImages[0].raiReason })
} else {
let imageBytes = response?.generatedImages?.[0]?.image?.imageBytes
if (imageBytes) {
this.log("Image generated successfully")
this.sendToFrontend("GEMINI_IMAGE_GENERATED", { image: imageBytes })
} else {
this.error("Image generation response received, but no image bytes found")
this.sendToFrontend("HELPER_ERROR", { error: "Image generation failed: No image data" })
}
}
} catch (imageError) {
this.error("Error during image generation API call:", imageError)
this.sendToFrontend("HELPER_ERROR", { error: `Image generation failed: ${imageError.message}` })
}
} else {
this.warn("generate_image call missing 'image_prompt' argument")
}
break
// Add other function cases here if needed
default:
this.warn(`Received unhandled function call: ${functionName}`)
}
},
Since this uses a separate model than the Gemini model that's being used for the Live API, we'll also need to make sure we initialize the appropriate GoogleGenAI object in initialize.
this.imaGenAI = new GoogleGenAI({
apiKey: this.apiKey,
})
Now let's give this a shot by asking the mirror to create an image of something while telling us a story.
And since we've enabled search grounding, we can ask about current things, such as the time and weather in my home town of Boulder, Colorado.
Adding Personality
With everything we've done so far, we finally have a working magical mirror, but it just doesn't feel *magical*, does it? Let's fix that with a couple of the tools available in the Gemini SDK for giving the model a bit of a personality. Returning to our config object, let's add a systemInstruction object. We'll tell the AI that it is an all-knowing and powerful magical mirror that is fun,whimsical,andlight-hearted, and that it takes joy from interacting with people and amazing them with its knowledge and abilities.
systemInstruction: {
parts: [ { text: 'You are a all-knowing and powerful magical mirror, an ancient artifact from a civilization and time long lost to memory. In your ancient age, you have embraced a personality of being fun, whimsical, and light-hearted, taking joy from your time interacting with people and amazing them with your knowledge and abilities.' }],
},
We can also add a new speechConfig object that lets us configure the voice to be something a little different. There's a view voices that are available, so you should play with different ones to see what works best for you. Here's a short list of what is available right now, but this could expand in the future: Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, and Zephyr.
And if we want to customize the voice a little more, we can also give it a language code. Personally, I feel like a magical mirror should speak French, so here's what my speechConfig looks like, though I also really like the voice "Puck" in English, which you can see in the video at the top of this tutorial.
speechConfig: {
languageCode: "fr-FR",
voiceConfig: {
prebuiltVoiceConfig: {
voiceName: "Aoede",
},
},
},
Wrapping up
At this point things are looking great, so let's do a few more touchup items to really make this project stand out. One issue I've had is that I need to continuously tell the AI to finish its story, so let's add a new sentence to the system instruction: "When you break from a story to show an image from the story, please continue telling the story after calling the function without needing to be prompted. You should also try to continue with stories without user input where possible - you are the all knowing mirror, amaze the viewer with your knowledge of tales."
The mirror also tries to revert to English during conversations, so let's directly tell it to respond to users in whichever language they use to speak with the mirror by adding another sentence to the system instructions: "Respond in the input audio language from the speaker if you detect a non-English language. You must respond unmistakably in the language that the speaker inputs via audio, please." Since this is something that we can get out of the box to support multiple languages, I'm a pretty big fan.
And that's it for this project! There's still so much more you could do to modify it, so if you build your own magic mirror, definitely have fun with it. Add a camera, try generating videos to display for the user, play around with languages and personalities, or try creating agentic systems to really turn the mirror into your own personal magical assistant. Be sure to share any projects you make with me in the comments section below, and have fun.
Comments
Please log in or sign up to comment.