Reading is one of the best ways to learn about the world. Luckily there is a vast collection of books available to most people! However, not everyone has an easy time reading.
People with visual impairments should have access to the knowledge contained in books. Braille books are expensive and not easily available to most people. Many books are not digitized in any format that is useful for text to speech.
Background
As an avid reader, the problem statement above really resonated with me. After some initial research, I didn't find any obtainable options for a robotic system that can turn and read the pages of books. There seems to be a few university lab projects like the one below, but you can't buy it nor are there any instructions for how to build it.
Given the capabilities of the Lego Mindstorms EV3 kit, it seemed possible to build a page turning device. I looked around and found a few examples of page-turning robots documented on YouTube. There still were no instructions for building them nor any code for their operation. My design of the page-turning mechanism is based on these videos:
I was also aware of several easy-to-use Optical Character Recognition (O.C.R.) libraries like Tesseract that would make the image to text translation fairly simple, even though I hadn't used any of them.
I was excited to work on a robot to solve this problem and I felt like I had a clear path to doing so.
Project GoalsThe goal of this project was to produce a proof of concept for a relatively low-cost device which can be constructed with accessible items (Alexa Dot, Lego Mindstorms, Webcam) that can read and turn the the pages of books.
My hope is that once the concept is proven, I can work with the incredible members of the open source community, Amazon, and Lego to turn this proof of concept into a robust, easily replicable tool for people that need it.
Meet Paige TurnerTechnologyPaige Turner is built using the following technologies:
- Lego Mindstorms EV3
- Amazon Alexa Voice Skill Kit
- Google Cloud Vision API
Here are all the steps to build Paige. I'll go through each of them one by one below.
- Get the parts
- Hook everything up (20 minutes)
- Set up a Google Vision Account (5 minutes)
- Set up the camera computer (20 minutes)
- Set up Alexa (10 minutes)
- Set up the Lego EV3 intelligent brick (1 hour)
- Tune the page-turning mechanism (20 minutes)
Here's a complete list of parts you'll need to build Paige Turner:
- Lego Mindstorms EV3 31313 Kit
- Amazon Alexa Dot (I had a second generation Dot, but any of the newer ones work as well)
- Document Camera (There are cheaper USB cameras that I tried)
- Wifi Adapter
- SD Card (This has to be 32GB or less to work with the software we'll use)
- 6 AA Batteries (I find the Amazon basics rechargeable AA's to be the best value)
- A book of your choice :)
Block Diagram
The block diagram below shows all the connections necessary to build Paige.
I got one of the awesome Lego Mindstorms EV3 31313 kits, and Paige's body is constructed only of parts from that set.
I wanted to create good instructions for how to build Paige Turner. I used Studio 2 (free) from Bricklink to build a 3D model and generate the instructions.
I attached the Studio 2.0 file to this page so you can view the instructions yourself.
Otherwise, follow the PDF instructions here.
After following those instructions, you should have all the physical parts constructed :)
Now you'll need to to connect the motors to the Intelligent Brick.
Connect the motor with the wheel (the page starter motor) to Port A and the motor with the arm (turner motor) to Port D.
These are set on line 44 of the python code.
# Motors for the page turning.
self.page_starter = LargeMotor(OUTPUT_A)
self.page_turner = LargeMotor(OUTPUT_D)
Finally, connect the document camera's USB connection to the Linux computer you'll set up later.
Now you can place Paige onto a book and position the document camera for a completed physical setup :)
In this project, the translation from an image to text is done via an API from Google. You'll need to create a free account which allows for 1000 free API requests a month :)
I followed the excellent instructions here to set up my API project.
1. Login to the Google Cloud Console: https://console.cloud.google.com/
2. Create a project by clicking on Select a project and then "New Project"
Give it a name:
Enable the vision API by following this magical link.
Success!
Once I had the API enabled, I created a service account and downloaded my API keys as a JSON file. You'll need this later! Do this by doing the following:
1. Login to your Google Cloud Console: https://console.cloud.google.com/
2. Select the project you created above
In the sidebar on the left, go to the IAM & Admin section -> Service accounts
3. Click on "Create Service Account"
4. Enter a name
5. Select Owner under access rights
6. Create a key, we'll need this later
7. Download the key files as JSON, and save it somewhere you won't lose it. We'll use it in the next steps.
Hooray! You are done configuring the Google Cloud API!
Camera ComputerGet the Linux Computer Running
The document camera is controlled by a Linux Laptop (or a computer running a Linux virtual machine).
I have a Macbook, so I used Parallels to create a virtual machine. VirtualBox should also work. If you have a Linux laptop, you can skip this step.
I followed the instructions here to download Parallels and install Ubuntu.
If you have a Linux laptop, you can skip this step.
Set up Google Credentials
You'll need to download the Google API credential JSON file into the Linux computer somewhere. Then you'll need to set the environment variable that Paige's script uses to authenticate with the Google API.
After the JSON file is on the linux machine, run the command below. You'll have to update the path if you place the JSON file somewhere other than the desktop of your parallels image.
export GOOGLE_APPLICATION_CREDENTIALS="/home/parallels/Desktop/creds.json"
Install Linux Computer Dependancies
The linux computer needs to have Python and the Google Cloud library installed in order to run our optical character recognition script. The following commands will install what you need.
sudo apt update
sudo apt install python3 python3-dev python3-venv
wget https://bootstrap.pypa.io/get-pip.py
sudo python get-pip.py
pip install --upgrade google-cloud-storage
You'll also need to install the program used to control the document camera. Run the command below.
sudo apt install fswebcam
Download the Code to the Linux Computer
You'll need to download the gocr.py file from the Paige Turner repo onto the linux computer. This is the script that does the following:
- Takes a picture with the document camera
- Sends the image to the Google Cloud Vision Service
- Reads the text returned from the Google Vision Service
Test the O.C.R. Script
You can test that the linux computer is configured correctly by running the gocr.py script.
You may need to update the file paths on line 1 and line 11 depending on how your Linux computer is set up and where you placed the creds.json file.
First position the document camera above an open book.
Then run the script using the command below:
python3 gocr.py
The script should output the text in the book! If this works, it means that the Google cloud service and document camera are all working properly.
AlexaCreateAlexa Developer Account
In order for Alexa to understand what you are saying, you'll need to create an Amazon Developer account.
Follow the excellent instructions from Hackster about how to do this starting here.
Stop once you have your Alexa ID and Gadget secret. Save this, you'll need it later in this tutorial.
Set up your Alexa Skill
You'll now need to create an Alexa skill to teach Paige how to speak.
Go to the Alexa developer section and click "Create a skill"
https://developer.amazon.com/alexa/console/ask
Enter Paige Turner, which is how you'll tell Alexa to open Paige.
Select "Custom" for the skill model:
and Alexa-Hosted (Node.js) at the bottom of the page:
Now click the create button in the top right.
Now you'll see the Dashboard for your Alexa skill. Click on the JSON Editor option on the left side of the screen:
Drag and drop the JSON file from the Github repo here into the box shown.
This JSON file is what teaches Alexa how to listen to you :)
After uploading the JSON file, hit the Save and Build Buttons in that order at the top of the screen:
Alexa Code
Now select the Code button at the top of the screen:
This is the Node.js code that teaches Alexa how to interface with our Lego Intelligent brick.
We'll need to add all of the files from the "lambda" folder in Github into the Skill Code online.
Create a file called "common.js" by clicking the create file button.
Now copy the contents of all four files from Github into the files in the Alexa Developer Consoler.
When you are done, click Save, and then Deploy in the top right.
Now your skill is live on Echo devices connected to your Amazon account. This means you can use Paige with a real Echo device!!
Set up PaigeLego EV3 Intelligent Brick
We need to install ev3dev to run our custom Python scripts to control Paige.
Follow the instructions here to get your Lego Intelligent Brick set up with ev3dev.
You can stop once you see the ev3 connected in Visual Studio code and skip downloading the Alexa missions. We won't need those :)
Install Python Dependancies
Now that we have a connection to our Lego Brick, we need to install some Python dependancies. Open an ssh terminal to the Lego Brick by right clicking on the EV3 brick in Visual Studio.
The password is "maker".
Run the following commands:
sudo apt update
sudo apt install python3-pip
sudo pip3 install pexpect
This can take a while because Paige's processor is pretty slow.
Configure Paige's Code
Download the code from my Github repo here.
You'll need to configure a few things.
The script creates an SSH connection into the Linux computer we set up earlier.
def _get_text_from_image(self):
"""Enter the correct credentials here"""
host = '192.168.50.31'
cd = 'python3 gocr.py'
user = 'parallels'
psw = 'password'
text = self._ssh(host, cd, user, psw, timeout=30, bg_run=False)
return text
Bluetooth Connection
You'll need to add your Amazon ID and Alexa Gadget Key to the "paige_turner.ini".
You should have noted those in the earlier Amazon step of this tutorial.
[GadgetSettings]
amazonId = #FIXME
alexaGadgetSecret = #FIXME
[GadgetCapabilities]
Custom.Mindstorms.Gadget = 1.0
This tells Paige how to connect to your Alexa device.
Enable Bluetooth on the Intelligent Brick
Paige connects to Alexa via a Bluetooth connection. You'll need to enable Bluetooth on the Lego brick.
Follow the excellent Hackster instructions here to enable Bluetooth.
Stop once you see the Bluetooth icon appear in the top right of the Lego screen.
Download Code to Paige
Click on the Send workspace to device button in Visual Studio Code to download the code to Paige.
Activate Paige
Whew! That was a long setup huh. It's finally time to activate Paige!!!
Reminder that you should have all of the connections from the diagram earlier:
Make sure the Linux computer is on, a book is placed under Paige's wheel, the document camera is pointed at the book, and you're ready to go.
Open an SSH connection to Paige and start Paige's script.
sudo python3 paige_turner.py
Bluetooth Pairing
The first time that you run the Paige Turner application, you'll need to connect it to your Alexa device.
Once the script starts up, you'll the following message in your SSH console:
Follow these directions from Amazon to pair your Alexa device with the Lego brick.
Talking to PaigeOnce you hear Paige beep and see the Bluetooth connection message in the console, she's ready to read to you.
Say the following:
"Alexa, open Paige Turner"
She'll respond with:
"Hi! Ask me to read or turn the page."
You can then say "Read the page." or "Turn the page" and watch the magic happen β¨.
Tuning the Page Turning
The page-turning sequence and placed might have to be adjusted depending on the size of your book.
Here's a video of a good page turn:
Here's a video of Paige failing to turn the page π:
In this scenario, the first motor didn't run for long enough, so the page wasn't scrunched far enough for the turning arm to grab it.
This section of the code does the page turning.
Adjusting the time value, shown as 0.47 seconds below seems to be the best way to control how much the page gets moved forward before the turning arm flips it.
# Scrunch the page and turn the page.
self.page_starter.on_for_rotations(SpeedPercent(20), -0.47)
self.page_turner.on_for_rotations(SpeedPercent(30), 1)
Lessons LearnedThis project was a lot of fun, and I learned a ton along the way.
The Cheapest Webcams Are JustNot Good Enough
I initially attempted to use a much cheaper and lower resolution web camera instead of the more expensive document camera I ended up using.
The camera I started with was the very popular Logitech C270.
Here's a sample image I took. You can see things are really blurry:
Compare that to the images from the document camera I used instead.
Installing Python Dependancies on the Lego EV3 Brick Can Be Challenging
When I started the project, I had planned to connect the camera directly to the Lego EV3 Intelligent Brick as shown in the diagram below.
However, I wasn't able to install the Google Cloud API on the Lego Intelligent Brick. Since the architecture isn't X86, it looks like not all of the dependancies are officially supported. This seems solveable if someone wanted to recompile some of the Python dependancies, but I didn't tackle that quite yet.
The Google Cloud Vision API is Awesome
I initially used the open source Tesseract API. However, I found that the results from the Google Cloud API were much better so I stuck with that.
Future WorkFor a proof of concept, I think this turned out to be successful, which of course means there is still a ton of work to improve this project.
Here are some of the work I'd like to do or see done by community:
- Paige can't hold a book down, so she can only read a fairly limited set of books at the moment. It would be awesome if she had a mechanism to hold the pages of a book that wants to close down.
- Paige can't move the camera on her own. It would be awesome if she had a robot arm to position the camera herself
- It would be nice to do all the compute onboard the Lego Intelligent brick, so that the laptop isn't required. That seems possible, someone just has to sort out all the Google Clould Python library dependancies, or invoke the API in another way.
- It would also be nice to not have to call an external service and do all the OCR processing onboard using Tesseract. As I mentioned above, I had better results with the Cloud API, but I'm sure a computer vision expert could achieve something similar with Tesseract or OpenCV.
Thanks for reading about my project! I hope you enjoyed it and it inspired you to make something to help someone near you.
Get in touch! I'm happy to help with any issues or ideas for future improvements.
Comments