Published June 24, 2022 © MIT

Live Captioning with Azure Cognitive Services!

Generate text captions in real-time for videos, podcasts, and in person talks!

BeginnerFull instructions provided1 hour1,465

Live Captioning with Azure Cognitive Services!

Things used in this project

Hardware components

Raspberry Pi 4 Model B

4GB version recommended. You'll also need a 5V power supply, microSD card, and a USB keyboard and mouse if you prefer to access directly (vs remote access)

USB Microphone

Aldec Optional: LCD Screen

Software apps and online services

Microsoft Azure Cognitive Services

Story

This project uses Microsoft Azure Cognitive Services speech recognition* to generate real-time captions on a Raspberry Pi. You can also run this project in a.NET framework on a laptop or computer!

Demo of Live Captions from a YouTube video via Azure Cognitive Services

Speech is captured via a USB microphone and run through a.NET framework which calls Azure Cognitive Services speech-to-text service. Cognitive Services displays converted text in real-time captions to an LCD screen. You can also generate captions on a remote screen via SSH.

Privacy Note: This project does NOT store captions. If you use this to generate in-person captions, please be sure to inform all speakers that they are being transcribed but not recorded.

* You can sign up for a free 30-day trial of Azure w/ $200 in credits to test out this project.

Read Time: 10 min

Build Time: 20 min (excluding installation times)

Cost:

Free Tier (1 concurrent request): 5 free audio hours per month
Standard Tier (100 concurrent requests): $1 per audio hour

More info on cost here.

Many thanks to the original developer of this open source project: Mohsin Ali! You can see Mohsin's other GitHub projects here: m-mohsin-ali (M Mohsin Ali) (github.com)

Raspberry Pi Setup

This section shows you how to configure your Raspberry Pi SD card and how to set it up for first time use.

Note: We recommend using the Ubuntu 22 64 bit OS because it has better support for the architecture we're using. However, Raspberry Pi OS will work for this project.

Raspberry Pi Imager for flashing an OS on an SD card.

1. On your desktop computer, download and install Raspberry Pi Imager

2. Run Raspberry Pi Imager. The home screen will appear.

3. Select 'CHOOSE STORAGE'

4. Insert the microSD card into your computer (or via a card reader).

5. Select the connected microSD card as your storage device.

6. On the home screen, select 'CHOOSE OS'.

7. Select in this order: 'Other general-purpose OS' > 'Ubuntu' > 'Ubuntu Desktop 22.04 LTS (RPi 4/400)'

Choose 'General-purpose OS'

Select Ubuntu OS

Finally, choose Ubuntu Desktop 22.04 LTS (RPi 4/400)

Note: Although Raspbian does come in a 64bit version, Ubuntu has better support for the architecture and available software.

8. On the home screen, select 'WRITE'.

9. A loading bar will appear.

Writing the Ubuntu Desktop OS to an SD Card.

Note: Flashing the SD Card may take a few minutes to an hour to complete.

10. Safely eject the SD card and insert it into the Raspberry Pi.

11. If you're connecting directly to the Pi, connect the display, keyboard, and mouse.

12. Finally, connect the power supply!

13. Once the Pi boots up, configure your WiFi settings, keyboard layout and timezone.

14. CHANGE YOUR PASSWORD. This is important because otherwise someone could get access to your Pi and make your closed captions come out all silly.

Change the default password on the Pi to protect your device, files, and projects!

Software Updates and Installs

This section shows you how to install dependencies for the project onto your Raspberry Pi. Follow these steps on your Raspberry Pi computer.

1. Open the terminal.

2. Make a directory to store our project by running the following commands

mkdir live-captioning
cd live-captioning

3. Setup the.NET Framework by running the following commands:

curl -sSL https://dot.net/v1/dotnet-install.sh | bash /dev/stdin --channel Current

4. Once the files are installed, set the environment variables by running the following commands:

echo 'export DOTNET_ROOT=$HOME/.dotnet' >> ~/.bashrc
echo 'export PATH=$PATH:$HOME/.dotnet' >> ~/.bashrc
source ~/.bashrc

5. Check and verify the installation:

dotnet --version

6. Finally, install the Azure Cognitive Services speech-to-text dependencies with the following commands:

sudo apt-get update
sudo apt-get install build-essential libssl-dev libasound2 wget

We need to manually install libssl1.0.0 as its not available for ubuntu 22. Since it is a core required dependency, we will manually install it with the following command:

wget http://ftp.us.debian.org/debian/pool/main/o/openssl/libssl1.1_1.1.1n-0+deb11u1_arm64.deb

Next, install from file:

sudo apt install -f ./libssl1.1_1.1.1n-0+deb11u1_arm64.deb

Set up Azure Cognitive Services

Now it's time to sign up for Azure Cognitive Services and get our API keys! Follow these steps on your desktop or laptop computer.

1. Sign up for a free Azure account here. Your free trial lasts 30 days and includes $200 Azure credits.

Azure Home Dashboard

2. Once you're logged in to your Azure dashboard, select 'Create a Resource'.

Creating a new Cognitive Services resource on Azure

3. Select (or search for) Cognitive Services.

4. Create a new speech service.

Create a new speech service in Cognitive Services.

5. From here, you will need the keys and the region to set up speech-to-text on the Raspberry Pi.

Grab a speech service key and location to run Cognitive Services on your Pi.

6. Copy one of the keys (any of them will work) and the location region.

Run the Project!

This section shows you how to run the project on your Raspberry Pi. Follow these steps on your Raspberry Pi computer.

1. If you don't already have it, install git with the following command:

sudo apt install git

2. Navigate to the project folder that we created earlier:

cd live-captioning

3. Clone this repository:

git clone https://github.com/m-mohsin-ali/closed-captioning-azure-speech-ai

4. Navigate to the folder with the project code:

cd closed-captioning-azure-speech-ai/code/AzureSpeechCC

5. Add your Cognitive Services keys to the code:

nano Program.cs
class Program
{
static string YourSubscriptionKey = "Enter your Key Here";
static string YourServiceRegion = "Enter your Region here";

6. Press CTRL+X and save/overwrite the file.

7. Add the Azure Speech SDK package to the code directory by running the following:

dotnet add package Microsoft.CognitiveServices.Speech

8. We did it!! Let's run the code and see our wizardry in action

dotnet build
dotnet run

Test out different audio sources, try different sounds and voices, and explore the capabilities and limits of the live speech-to-text translation!

Going Further

1. Make the project portable by getting an enclosure for the Pi, a small touch screen, and a USB-C battery.

2. Travel plans? Convert the project into a translator by selecting different language inputs and outputs from Cognitive Services!

Show us your creations by tagging us on Twitter, @MakersAtMicrosoft, or using the hashtag #AzureLiveCaptions!

Credits

Jen Fox

35 projects • 145 followers

Dabbled in dark matter, settled into engineering w/ a blend of inventing and education! Sr.PM @ MicrosoftFounder/CEO of FoxBot Industries

Contact

Comments

Please log in or sign up to comment.

Live Captioning with Azure Cognitive Services!

Things used in this project

Hardware components

Software apps and online services

Story

Raspberry Pi Setup

Software Updates and Installs

Set up Azure Cognitive Services

Run the Project!

Going Further

Code

Live Closed Captioning with Azure Cognitive Services

Credits

Jen Fox

Comments

Embed the widget on your own site

Live Captioning with Azure Cognitive Services!

Live Captioning with Azure Cognitive Services!

Things used in this project

Hardware components

Software apps and online services

Story

Raspberry Pi Setup

Software Updates and Installs

Set up Azure Cognitive Services

Run the Project!

Going Further

Code

Live Closed Captioning with Azure Cognitive Services

Credits

Jen Fox

Comments

Related channels and tags