This project uses Microsoft Azure Cognitive Services speech recognition* to generate real-time captions on a Raspberry Pi. You can also run this project in a.NET framework on a laptop or computer!
Speech is captured via a USB microphone and run through a.NET framework which calls Azure Cognitive Services speech-to-text service. Cognitive Services displays converted text in real-time captions to an LCD screen. You can also generate captions on a remote screen via SSH.
Privacy Note: This project does NOT store captions. If you use this to generate in-person captions, please be sure to inform all speakers that they are being transcribed but not recorded.
* You can sign up for a free 30-day trial of Azure w/ $200 in credits to test out this project.
Read Time: 10 min
Build Time: 20 min (excluding installation times)
Cost:
- Free Tier (1 concurrent request): 5 free audio hours per month
- Standard Tier (100 concurrent requests): $1 per audio hour
Many thanks to the original developer of this open source project: Mohsin Ali! You can see Mohsin's other GitHub projects here: m-mohsin-ali (M Mohsin Ali) (github.com)
Raspberry Pi SetupThis section shows you how to configure your Raspberry Pi SD card and how to set it up for first time use.
Note: We recommend using the Ubuntu 22 64 bit OS because it has better support for the architecture we're using. However, Raspberry Pi OS will work for this project.
1. On your desktop computer, download and install Raspberry Pi Imager
2. Run Raspberry Pi Imager. The home screen will appear.
3. Select 'CHOOSE STORAGE'
4. Insert the microSD card into your computer (or via a card reader).
5. Select the connected microSD card as your storage device.
6. On the home screen, select 'CHOOSE OS'.
7. Select in this order: 'Other general-purpose OS' > 'Ubuntu' > 'Ubuntu Desktop 22.04 LTS (RPi 4/400)'
Note: Although Raspbian does come in a 64bit version, Ubuntu has better support for the architecture and available software.
8. On the home screen, select 'WRITE'.
9. A loading bar will appear.
Note: Flashing the SD Card may take a few minutes to an hour to complete.
10. Safely eject the SD card and insert it into the Raspberry Pi.
11. If you're connecting directly to the Pi, connect the display, keyboard, and mouse.
12. Finally, connect the power supply!
13. Once the Pi boots up, configure your WiFi settings, keyboard layout and timezone.
14. CHANGE YOUR PASSWORD. This is important because otherwise someone could get access to your Pi and make your closed captions come out all silly.
This section shows you how to install dependencies for the project onto your Raspberry Pi. Follow these steps on your Raspberry Pi computer.
1. Open the terminal.
2. Make a directory to store our project by running the following commands
mkdir live-captioning
cd live-captioning
3. Setup the.NET Framework by running the following commands:
curl -sSL https://dot.net/v1/dotnet-install.sh | bash /dev/stdin --channel Current
4. Once the files are installed, set the environment variables by running the following commands:
echo 'export DOTNET_ROOT=$HOME/.dotnet' >> ~/.bashrc
echo 'export PATH=$PATH:$HOME/.dotnet' >> ~/.bashrc
source ~/.bashrc
5. Check and verify the installation:
dotnet --version
6. Finally, install the Azure Cognitive Services speech-to-text dependencies with the following commands:
sudo apt-get update
sudo apt-get install build-essential libssl-dev libasound2 wget
We need to manually install libssl1.0.0 as its not available for ubuntu 22. Since it is a core required dependency, we will manually install it with the following command:
wget http://ftp.us.debian.org/debian/pool/main/o/openssl/libssl1.1_1.1.1n-0+deb11u1_arm64.deb
Next, install from file:
sudo apt install -f ./libssl1.1_1.1.1n-0+deb11u1_arm64.deb
Set up Azure Cognitive ServicesNow it's time to sign up for Azure Cognitive Services and get our API keys! Follow these steps on your desktop or laptop computer.
1. Sign up for a free Azure account here. Your free trial lasts 30 days and includes $200 Azure credits.
2. Once you're logged in to your Azure dashboard, select 'Create a Resource'.
3. Select (or search for) Cognitive Services.
4. Create a new speech service.
5. From here, you will need the keys and the region to set up speech-to-text on the Raspberry Pi.
6. Copy one of the keys (any of them will work) and the location region.
Run the Project!This section shows you how to run the project on your Raspberry Pi. Follow these steps on your Raspberry Pi computer.
1. If you don't already have it, install git with the following command:
sudo apt install git
2. Navigate to the project folder that we created earlier:
cd live-captioning
3. Clone this repository:
git clone https://github.com/m-mohsin-ali/closed-captioning-azure-speech-ai
4. Navigate to the folder with the project code:
cd closed-captioning-azure-speech-ai/code/AzureSpeechCC
5. Add your Cognitive Services keys to the code:
nano Program.cs
class Program
{
static string YourSubscriptionKey = "Enter your Key Here";
static string YourServiceRegion = "Enter your Region here";
6. Press CTRL+X and save/overwrite the file.
7. Add the Azure Speech SDK package to the code directory by running the following:
dotnet add package Microsoft.CognitiveServices.Speech
8. We did it!! Let's run the code and see our wizardry in action
dotnet build
dotnet run
Test out different audio sources, try different sounds and voices, and explore the capabilities and limits of the live speech-to-text translation!
Going Further1. Make the project portable by getting an enclosure for the Pi, a small touch screen, and a USB-C battery.
2. Travel plans? Convert the project into a translator by selecting different language inputs and outputs from Cognitive Services!
Show us your creations by tagging us on Twitter, @MakersAtMicrosoft, or using the hashtag #AzureLiveCaptions!
Comments