Published April 26, 2017 © MIT

Offline Speech Processing

Create your very own hotwords like "Alexa," "Ok Google" and "Hey Cortana," and trigger events just by speaking - without the Internet.

IntermediateProtip1 hour16,651

Things used in this project

Hardware components

UDOO QUAD

Logitech webcam-c270 (because it has built in mic or use a separate mic)

Software apps and online services

Udoobuntu

Snowboy

Story

Offline Speech Processing Demo

I was working on a project using my Udoo quad board and I needed to implement speech processing in it. The first thing which came to my mind was the google's speech API. But it was an online process and also there is a limit up to which I can use it. So I started looking for an offline speech processing API. I traveled the galaxy in the search of the so-called "offline speech processing" and then one day I stumbled upon a website called kitt.ai. Where I finally found my "offline speech processing". But wait, they call it snowboy, a "hotword detector" and they don't make it for Udoo boards but they do make it for the raspberry pi. I researched further on their website and found out that I can also use sentences. So I decided to install it on my Udoo board as raspberry pi and Udoo board is somewhat similar.

Here are the steps that you need to follow in order to make snowboy work in Udoo.

Step 1

From this page download the Raspberry Pi version: http://docs.kitt.ai/snowboy/#downloads

Step 2: Install Sox

$ sudo apt-get install python-pyaudio python3-pyaudio sox

Step 3: Install PortAudio’s Python bindings

$ pip install pyaudio

Step 4

To check whether you can record via your microphone, open a terminal and run:

$ rec temp.wav

Step 5: Running a demo

Extract the file that you have downloaded in the first step.

Go to the "rpi-arm-raspbian-8.0-1.2.0/resources" folder and copy the snowboy.umdl and paste it outside the "resources" folder.

Run this in the terminal:

$ python demo.py snowboy.pmdl

When you say snowboy it should give output like this:

$ INFO:snowboy:Keyword 1 detected at time: 2017-01-01 01:01:14

But it's more likely that you will get an error because this package is not made for Udoo boards.

ImportError: /usr/lib/arm-linux-gnueabihf/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /home/udooer/rpi-arm-raspbian-8.0-1.2.0/_snowboydetect.so)

So let's solve this error.

Step 6: Solving the error

Try this:

$ sudo apt-get update 
$ sudo apt-get install gcc-4.7 g++-4.7

If it doesn't work, create the source file manually:

$ sudo nano /etc/apt/sources.list.d/toolchain.list

Paste this content:

$ deb http://ppa.launchpad.net/ubuntu-toolchain-r/test/ubuntu precise main   
$ deb-src http://ppa.launchpad.net/ubuntu-toolchain-r/test/ubuntu precise main

Save the file with Ctrl-X and then press Y key to confirm saving.

After adding those lines, issue this command to fix the key error.

$ sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 1E9377A2BA9EF27F

Then run these commands in the terminal:

$ sudo apt-get update
$ sudo apt-get install gcc-4.7 g++-4.7

It should fix the error. Now run the demo again.

Step 7: Create Your Own Hotword

Improve the hotwords prepared by me like YES and NO.

Download the YES and NO hotwords because we are going to use it in the codes given below.

import snowboydecoder
import sys
import signal

# Demo code for listening two hotwords at the same time
def hotWord(models):
	
	sensitivity = [0.5]*len(models)
	detector = snowboydecoder.HotwordDetector(models, sensitivity=sensitivity)
	print('Listening... Press Ctrl+C to exit')

	# main loop
	# make sure you have the same numbers of callbacks and models
	word = detector.start(detected_callback=snowboydecoder.play_audio_file,
					sleep_time=0.03)
	return(word)	

words = ['Yes.pmdl', 'No.pmdl']
word = hotWord(words)
if word == '1':
	print "Yes"
elif word == '2':
	print "No"

#!/usr/bin/env python

import collections
import pyaudio
import snowboydetect
import time
import wave
import os
import logging

logging.basicConfig()
logger = logging.getLogger("snowboy")
logger.setLevel(logging.INFO)
TOP_DIR = os.path.dirname(os.path.abspath(__file__))

RESOURCE_FILE = os.path.join(TOP_DIR, "resources/common.res")
DETECT_DING = os.path.join(TOP_DIR, "resources/ding.wav")
DETECT_DONG = os.path.join(TOP_DIR, "resources/dong.wav")


class RingBuffer(object):
    """Ring buffer to hold audio from PortAudio"""
    def __init__(self, size = 4096):
        self._buf = collections.deque(maxlen=size)

    def extend(self, data):
        """Adds data to the end of buffer"""
        self._buf.extend(data)

    def get(self):
        """Retrieves data from the beginning of buffer and clears it"""
        tmp = bytes(bytearray(self._buf))
        self._buf.clear()
        return tmp


def play_audio_file(fname=DETECT_DING):
    """Simple callback function to play a wave file. By default it plays
    a Ding sound.

    :param str fname: wave file name
    :return: None
    """
    ding_wav = wave.open(fname, 'rb')
    ding_data = ding_wav.readframes(ding_wav.getnframes())
    audio = pyaudio.PyAudio()
    stream_out = audio.open(
        format=audio.get_format_from_width(ding_wav.getsampwidth()),
        channels=ding_wav.getnchannels(),
        rate=ding_wav.getframerate(), input=False, output=True)
    stream_out.start_stream()
    stream_out.write(ding_data)
    time.sleep(0.2)
    stream_out.stop_stream()
    stream_out.close()
    audio.terminate()


class HotwordDetector(object):
    """
    Snowboy decoder to detect whether a keyword specified by `decoder_model`
    exists in a microphone input stream.

    :param decoder_model: decoder model file path, a string or a list of strings
    :param resource: resource file path.
    :param sensitivity: decoder sensitivity, a float of a list of floats.
                              The bigger the value, the more senstive the
                              decoder. If an empty list is provided, then the
                              default sensitivity in the model will be used.
    :param audio_gain: multiply input volume by this factor.
    """
    def __init__(self, decoder_model,
                 resource=RESOURCE_FILE,
                 sensitivity=[],
                 audio_gain=1):

        def audio_callback(in_data, frame_count, time_info, status):
            self.ring_buffer.extend(in_data)
            play_data = chr(0) * len(in_data)
            return play_data, pyaudio.paContinue

        tm = type(decoder_model)
        ts = type(sensitivity)
        if tm is not list:
            decoder_model = [decoder_model]
        if ts is not list:
            sensitivity = [sensitivity]
        model_str = ",".join(decoder_model)

        self.detector = snowboydetect.SnowboyDetect(
            resource_filename=resource.encode(), model_str=model_str.encode())
        self.detector.SetAudioGain(audio_gain)
        self.num_hotwords = self.detector.NumHotwords()

        if len(decoder_model) > 1 and len(sensitivity) == 1:
            sensitivity = sensitivity*self.num_hotwords
        if len(sensitivity) != 0:
            assert self.num_hotwords == len(sensitivity), \
                "number of hotwords in decoder_model (%d) and sensitivity " \
                "(%d) does not match" % (self.num_hotwords, len(sensitivity))
        sensitivity_str = ",".join([str(t) for t in sensitivity])
        if len(sensitivity) != 0:
            self.detector.SetSensitivity(sensitivity_str.encode())

        self.ring_buffer = RingBuffer(
            self.detector.NumChannels() * self.detector.SampleRate() * 5)
        self.audio = pyaudio.PyAudio()
        self.stream_in = self.audio.open(
            input=True, output=False,
            format=self.audio.get_format_from_width(
                self.detector.BitsPerSample() / 8),
            channels=self.detector.NumChannels(),
            rate=self.detector.SampleRate(),
            frames_per_buffer=2048,
            stream_callback=audio_callback)


    def start(self, detected_callback=play_audio_file,
              interrupt_check=lambda: False,
              sleep_time=0.03):
        """
        Start the voice detector. For every `sleep_time` second it checks the
        audio buffer for triggering keywords. If detected, then call
        corresponding function in `detected_callback`, which can be a single
        function (single model) or a list of callback functions (multiple
        models). Every loop it also calls `interrupt_check` -- if it returns
        True, then breaks from the loop and return.

        :param detected_callback: a function or list of functions. The number of
                                  items must match the number of models in
                                  `decoder_model`.
        :param interrupt_check: a function that returns True if the main loop
                                needs to stop.
        :param float sleep_time: how much time in second every loop waits.
        :return: None
        """
        if interrupt_check():
            logger.debug("detect voice return")
            return

        tc = type(detected_callback)
        if tc is not list:
            detected_callback = [detected_callback]
        if len(detected_callback) == 1 and self.num_hotwords > 1:
            detected_callback *= self.num_hotwords

        assert self.num_hotwords == len(detected_callback), \
            "Error: hotwords in your models (%d) do not match the number of " \
            "callbacks (%d)" % (self.num_hotwords, len(detected_callback))

        logger.debug("detecting...")

        while True:
            if interrupt_check():
                logger.debug("detect voice break")
                break
            data = self.ring_buffer.get()
            if len(data) == 0:
                time.sleep(sleep_time)
                continue

            ans = self.detector.RunDetection(data)
            if ans == -1:
                logger.warning("Error initializing streams or reading audio data")
            elif ans > 0:
                message = "Keyword " + str(ans) + " detected at time: "
                message += time.strftime("%Y-%m-%d %H:%M:%S",
                                         time.localtime(time.time()))
                #logger.info(message)
                callback = detected_callback[ans-1]
                if callback is not None:
                    callback()
                return str(ans)

        logger.debug("finished.")

    def terminate(self):
        """
        Terminate audio stream. Users cannot call start() again to detect.
        :return: None
        """
        self.stream_in.stop_stream()
        self.stream_in.close()
        self.audio.terminate()

Credits

Swapnil Verma

10 projects • 45 followers

I like to make things which can walk, talk and see.

Contact

Comments

Please log in or sign up to comment.

Offline Speech Processing

Things used in this project

Hardware components

Software apps and online services

Story

Step 1

Step 2: Install Sox

Step 3: Install PortAudio’s Python bindings

Step 4

Step 5: Running a demo

Step 6: Solving the error

Step 7: Create Your Own Hotword

Code

New Demo Code

Modified snowboydecoder.py code

Credits

Swapnil Verma

Comments

Embed the widget on your own site

Offline Speech Processing

Offline Speech Processing

Things used in this project

Hardware components

Software apps and online services

Story

Step 1

Step 2: Install Sox

Step 3: Install PortAudio’s Python bindings

Step 4

Step 5: Running a demo

Step 6: Solving the error

Step 7: Create Your Own Hotword

Code

New Demo Code

Modified snowboydecoder.py code

Credits

Swapnil Verma

Comments

Related channels and tags