David Packman
Published © CC BY

ClippyGPT

Everyone is asking if ChatGPT is actually Clippy in disguise. Well, what better way to find out than building a ChatGPT-powered Clippy?

IntermediateFull instructions provided24 hours12,049
ClippyGPT

Things used in this project

Hardware components

small through-hole protoboard or small breadboard
You can always cut down a protoboard to suit the needs. Basically you'll use this to create a power splitter to provide 5v power to all the parts via a 1 to 3 split.
×1
USB Adapter, Right-Angle
USB Adapter, Right-Angle
You may want to get a 4-pack from Amazon so you have left and right angle to ensure you have the right part.
×1
5v 4a power adapter
If you use a higher voltage adapter you'll need to add a 5v step-down regulator. There are mounting holes for a https://www.pololu.com/product/4091 regulator.
×1
Adafruit Adafuit CRICKIT for Raspberry PI
The CRICKIT should work on any SBC with a Pi-compatible GPIO. As an alternative, you can use a serial or I2C PWM/Servo controller instead. There are mounting holes for an Adafruit 16 channel PWM Servo controller.
×1
Adafruit Mono 2.5a audio amp
While the Crickit has a built in 3a amp that you can leverage instead, it can be difficult to make it work. So this is an alternative that you can use, mounting points are in the build.
×1
SG90 Micro-servo motor
SG90 Micro-servo motor
Note, you can also use the Feetech FS90 if there are none at Adafruit, there are a lot of TowerPro SG90 knock offs of variable quality. You can find the FS90 at https://www.pololu.com/product/2818.
×2
Flat ribbon HDMI cable with HDMI or Micro HDMI headers
These are basically HDMI ribbon cables made for drone cameras. If you're using an SBC with microHDMI output you'll want to look for HDMI ribbon cable sets that include those and regular HDMI terminals instead.
×1
Raspberry Pi 3 Model B
Raspberry Pi 3 Model B
Hey, these are impossible to come by, so you should also be able to get another brand SBC with a RasPi pinout/mounting pattern like the Libre LePotato, Libre Renegade, or RockPi. Just make sure the USB ports are positioned like the RasPi 3B or the right angle USB adapter won't work and you'll need to figure out how to make a USB extension cable work instead. :(
×1
Speaker with a 1.75 inch mounting hole pattern
This build used a salvaged speaker from an Alexa Echo Dot. So you'll want to find something with a similar hole pattern.
×1
Elecrow 5 inch HDMI display
Basically any 5 inch HDMI display with the HDMI and Power at the top of the display.
×1
Fishing line
Any strong thread with minimal stretch. I use 100 test pound braided line.
×1
female barrel jack
×1
M2, M2.5 and M3 Screws, nuts, and heatset nuts
I used a variety of M2, M2.5, and M3 screws, nuts, heatset nuts, square nuts, and standoffs in this build.
×1

Software apps and online services

Raspbian
Raspberry Pi Raspbian
You can use Debian or a Debian distro like Raspbian, whatever your SBC of choice uses. However, speech services on Ubuntu requires OpenSSLv1, which isn't supported on Ubuntu anymore.
OpenAI api python library
Microsoft Azure Speech Services

Hand tools and fabrication machines

Soldering iron (generic)
Soldering iron (generic)

Story

Read more

Schematics

Offline Wake Word training table

This is the wake word table for the "Hey Cippy" wake word, download this to the same location where you put the sample python code if you want to use the Hey Clippy wake word.

Code

Clippy sample python code

Python
Copy this to your SBC then edit to:
1. line 59 - replace x's with your Azure Speech Key
2. line 60 - replace x's with your Azure Speech Region
3. line 61 - replace x's with your OpenAI API key
NOTE: You will probably want to adjust the servomotor angles for your servos wherever you see "crickit_servo"
import os
import time
import openai
import tiktoken
import azure.cognitiveservices.speech as speechsdk
from adafruit_crickit import crickit


def keyword_from_microphone():

    """runs keyword spotting locally, with direct access to the result audio"""
    # Creates an instance of a keyword recognition model. Update this to
    # point to the location of your keyword recognition model.
    model = speechsdk.KeywordRecognitionModel("dd238e75-10d4-4c44-a691-9098aeac7e28.table")
    # The phrase your keyword recognition model triggers on, matching the keyword used to train the above table.
    keyword = "Hey Clippy"

    # Create a local keyword recognizer with the default microphone device for input.
    keyword_recognizer = speechsdk.KeywordRecognizer()
    done = False

    def recognized_cb(evt):
        # Only a keyword phrase is recognized. The result cannot be 'NoMatch'
        # and there is no timeout. The recognizer runs until a keyword phrase
        # is detected or recognition is canceled (by stop_recognition_async()
        # or due to the end of an input file or stream).
        result = evt.result
        if result.reason == speechsdk.ResultReason.RecognizedKeyword:
            print("RECOGNIZED KEYWORD: {}".format(result.text))
        nonlocal done
        done = True

    def canceled_cb(evt):
        result = evt.result
        if result.reason == speechsdk.ResultReason.Canceled:
            print('CANCELED: {}'.format(result.cancellation_details.reason))
        nonlocal done
        done = True

    # Connect callbacks to the events fired by the keyword recognizer.
    keyword_recognizer.recognized.connect(recognized_cb)
    keyword_recognizer.canceled.connect(canceled_cb)

    # Start keyword recognition.
    result_future = keyword_recognizer.recognize_once_async(model)
    print('Clippy is ready to help...'.format(keyword))
    result = result_future.get()

    # Read result audio (incl. the keyword).
    if result.reason == speechsdk.ResultReason.RecognizedKeyword:
        crickit.servo_2.angle = 100
        time.sleep(.5)
        crickit.servo_2.angle = 150
        Responding_To_KW()


def Responding_To_KW():
   # Let's add our api keys and other api settings here
   speech_key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"   #REPLACE X's WITH YOUR AZURE SPEECH SERVICES API KEY!!!
   service_region = "xxxxxxx"                       #REPLACE X's WITH YOUR AZURE SERVICE REGION!!!
   openai.api_key = "sk-xxxxxxxxxxxxxxxxxxxxxxxxxx" #REPLACE X's WITH YOUR OPENAI API KEY!!!
 
   # Let's configure our speech services settings here
   speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
   # Set voice, there are many to choose from in Azure Speech Studio
   speech_config.speech_synthesis_voice_name = "en-US-GuyNeural"
   speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)
   audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)
   speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
   
   # Set response string
   resp_text = "How can I help?"
   # say response
   result = speech_synthesizer.speak_text_async(resp_text).get()
   
   # Wait until finished talking
   if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
        # Then listen for a response
        speech_recognition_result = speech_recognizer.recognize_once_async().get()
        # After a response is heard
        if speech_recognition_result.reason == speechsdk.ResultReason.RecognizedSpeech:
            #print the response to the console
            print("Recognized: {}".format(speech_recognition_result.text))
            #Move eyebrows to signal question received 
            crickit.servo_2.angle = 100
            crickit.servo_1.angle = 90
            time.sleep(.5)
            crickit.servo_1.angle = 140
            time.sleep(.5)
            crickit.servo_2.angle = 150

            # Check to see if chat mode is initiated
            if speech_recognition_result.text == "Let's chat.":
                # Set response
                resp_text = "Ok, what would you like to chat about?"
                # Print, then say response
                print(resp_text)
                result = speech_synthesizer.speak_text_async(resp_text).get()
                #wait until done speaking
                if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
                    #Configure chat settings
                    system_message = {"role": "system", "content": "You are Clippy, the digital assistant, and you provide succinct and helpful advice"}
                    max_response_tokens = 250
                    token_limit= 4096 #this is the token limt for GPT3.5, adjust if using another model
                    conversation=[]
                    conversation.append(system_message) #this keeps the system role in the conversation

                    # Here is where we count the tokens
                    def num_tokens_from_messages(messages, model="gpt-3.5-turbo"):
                        encoding = tiktoken.encoding_for_model(model)
                        num_tokens = 0
                        for message in messages:
                            num_tokens += 4  # every message follows <im_start>{role/name}\n{content}<im_end>\n
                            for key, value in message.items():
                                num_tokens += len(encoding.encode(value))
                                if key == "name":  # if there's a name, the role is omitted
                                    num_tokens += -1  # role is always required and always 1 token
                        num_tokens += 2  # every reply is primed with <im_start>assistant
                        return num_tokens
                    
                    while(True):
                        #Now we start listening
                        speech_recognition_result = speech_recognizer.recognize_once_async().get()
                        if speech_recognition_result.reason == speechsdk.ResultReason.RecognizedSpeech:
                            #Do the eybrow thing
                            crickit.servo_2.angle = 100
                            time.sleep(.5)
                            crickit.servo_2.angle = 150
                            #print what it heard
                            print("Recognized: {}".format(speech_recognition_result.text))
                            user_input = speech_recognition_result.text     
                            # Append the latest prompt to the conversation
                            conversation.append({"role": "user", "content": user_input})
                            # and do the token count
                            conv_history_tokens = num_tokens_from_messages(conversation)

                            # check token count
                            while (conv_history_tokens+max_response_tokens >= token_limit):
                                # And delete the top section if count is too high
                                del conversation[1] 
                                conv_history_tokens = num_tokens_from_messages(conversation)
                
                            response = openai.ChatCompletion.create(
                                model="gpt-3.5-turbo", # Set the model to be used
                                messages = conversation, #send the conversation history
                                temperature=.6, #set the temperature, lower is more specific, higher more random
                                max_tokens=max_response_tokens, #set max tokens based on count
                            )
                            
                            #format conversation and print response
                            conversation.append({"role": "assistant", "content": response['choices'][0]['message']['content']})
                            response_text = response['choices'][0]['message']['content'] + "\n"
                            #print and say response
                            print(response_text)
                            result = speech_synthesizer.speak_text_async(response_text).get()
                            #check for exit phrase
                            if "I'm done" in speech_recognition_result.text:
                                keyword_from_microphone()

            # If not chat mode, then...
            else: 
                #Send question as prompt to ChatGPT
                completion_request = openai.ChatCompletion.create(
                    model="gpt-3.5-turbo",                  #Here's where you pick which model to use
                    messages = [
                        #The system role is set here, where you can somewhat guide the response personality
                        {"role": "system", "content": "You are Clippy, the digital assistant, and you provide succinct and helpful advice"},
                        #This is where the prompt is set
                        {"role": "user", "content": (speech_recognition_result.text)},
                        ],
                    max_tokens=250,                             #Max number of tokens used
                    temperature=0.6,                            #Lower is more specific, higher is more creative responses
                )

                #Get and print response
                response_text = completion_request.choices[0].message.content
                print(response_text)
                #Say response
                result = speech_synthesizer.speak_text_async(response_text).get()
                #Go back once done talking
                if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
                    keyword_from_microphone()

keyword_from_microphone()

Credits

David Packman

David Packman

3 projects • 22 followers
I make robot friends

Comments