This Project is about making use of the massive amount of video and image data being generated all across the globe, with this in mind why don't we use this massive data to educate blind children. For this here I am building an automatic text to speech conversion which will act as a narrator that speaks out the study material for blind children. Here we will use Zynq Ultrascale + MPSoC ZCU104 Kit for this project.. The device would allow more disabled persons to be able to hear the books with the help of it. It is able to detect printed and handwritten text and converts it into voice.
Setup - Install the Xilinx ToolsThis project requires the following tools:
Vitis 2019.2 Unified Software Platform, Xilinx PYNQ Framework, Docker and Vitis-AI v1.1
Refer to Xilinx Vitis Unified Software Platform and Install Docker for instructions on installing on yourmachine.
1. Clone the github repository:
$ git clone https://github.com/Xilinx/Vitis-AI
$ cd Vitis-AI
$ export VITIS_AI_HOME="$PUWAD"
Setup - Install Vitis platform.1. Download the Vitis platform for the ZYNQ Ultrascale + MPSoC ZCU104 board, and extract it:
2. Specify the location of the Vitis platform, by creating the SDX_PLATFORM environment variable that specified to the location of the.xpfm file
Build the Hardware ProjectFollow the instructions in readme file to install DPU TRD Vitis Flow
https://github.com/Xilinx/Vitis-AI/blob/v1.1/DPU-TRD/prj/Vitis/README.md
1. Make a copy of the DPU-TRD directory for your platform:
$ cd $VITIS_AI_HOME
$ cp -r DPU-TRD DPU-TRD-{platform}
$ export TRD=$VITIS_AI_/DPU-TRD-{platform}
$ cd $TRD/project/Vitis
Build the Hardware Project for the ZYNQ UltraScale + MPSoC1. Edit the dpu_conf.vh file, to specify the architecture and configuration of the DPU,
$ cd $TRD_HOME/prj/Vitis
$ vi dpu_conf.vh
Target a DPU with B2187 architecture, by making the following changes to the dpu_conf.vh file.
//`define B4096
`define B2304
Leave the other parameters
2. Edit the config_file/prj_config file
$ vi config_file/prj_config
The Avnet Vitis platforms (ZYNQ UltraScale + MPSoC and UZ3EG_PCIEC) will use the 200MHz & 600MHz clocks to connect the DPU.
[clock]
id=0:dpu_xrt_top_1.aclk
id=1:dpu_xrt_top_1.ap_clk_2
To connect up the DPU core, Specify which AXI interconnects to use.
[connectivity]
sp=dpu_xrt_top_1.M_AXI_GP0:TOPC0
sp=dpu_xrt_top_1.M_AXI_HP0:TOP0
sp=dpu_xrt_top_1.M_AXI_HP2:TOP1
Leave the other settings as it is.
3. Build the DPU enabled hardware design
$ make KERNEL=DPU DEVICE={platform}
This will build the individual DPU core, then build the complete hardware project.
The Vivado project will be located in the following directory:
DPU-TRD/prj/Vitis/binary_container/vivado/vpl/prj/prj.xpr
4. The output binaries will be located in the following directory:
$ tree binary_container_1/sd_card
├── BOOT.BIN
├── dpu.xclbin
├── image.ub
├── README.txt
└── {platform}.hwh
Step 2 - Compile the ModelsThis project will concentrate on the models for which examples applications have been provided. It is important to know the correlation between model and application. This table includes a non-exhaustive list of application that were verified with corresponding models from the model zoo.
1. Download the pre-trained models from the Xilinx Model Zoo:
$ cd $VITIS_AI_HOME/AI-Model-Zoo
$ source ./get_model.sh
This will download the model zoo
2. Launch the tools docker from the Vitis-AI directory
$ cd $VITIS_AI_HOME
$ sh -x docker_run.sh xilinx/vitis-ai:latest-cpu
3. Accept the license terms.
4. Launch the "vitis-ai-caffe" Conda environment
$ conda activate vitis-ai-caffe
(vitis-ai-caffe) $
5. Copy the hardware handoff (.hwh) file
(vitis-ai-caffe) $ cd DPU-TRD-{platform}
(vitis-ai-caffe) $ mkdir modelzoo
(vitis-ai-caffe) $ cd modelzoo
(vitis-ai-caffe) $ cp ../prj/Vitis/binary_container_1/sd_card/{platform}.hwh .
6. Generate your.dcf file
(vitis-ai-caffe) $ dlet -f {platform}.hwh
7. The previous step will generate a dcf with a name similar to dpu-18-11-2020-17-25.dcf.Rename this file
(vitis-ai-caffe) $ mv dpu*.dcf {platform}.dcf
8. Create a file named “custom.json” with the following content
{"target": "dpuv2", "dcf": "./{platform}.dcf", "cpu_arch": "arm64"}
9. Create a directory for the compiled models
(vitis-ai-caffe) $ mkdir compiled_output
10. Create a generic recipe for compiling a caffe model, by creating a script named “compile_cf_model.sh” with the following content
model_name=$1
modelzoo_name=$2
vai_c_caffe \
--prototxt ../../AI-Model-Zoo/models/${modelzoo_name}/quantized/deploy.prototxt \
--caffemodel ../../AI-Model-Zoo/models/${modelzoo_name}/quantized/deploy.caffemodel \
--arch ./custom.json \
--output_dir ./compiled_output/${modelzoo_name} \
--net_name ${model_name} \
--options "{'mode': 'normal'}"
11. Compile the caffe model for the resnet50 application, using the generic script we just created:
$ conda activate vitis-ai-caffe
(vitis-ai-caffe) $ source ./compile_cf_model.sh resnet50 cf_resnet50_imagenet_224_224_7.7G
12. Compile the caffe model for the face_detection application, using the generic script we just created:
$ conda activate vitis-ai-caffe
(vitis-ai-caffe) $ source ./compile_cf_model.sh densebox cf_densebox_wider_360_640_1.11G
13. Create a generic recipe for compiling a tensorflow model, by creating a script called “compile_tf_model.sh” with the following content
model_name=$1
modelzoo_name=$2
vai_c_tensorflow \
--frozen_pb ../../AI-Model-Zoo/models/${modelzoo_name}/quantized/deploy_model.pb \
--arch ./custom.json \
--output_dir ./compiled_output/${modelzoo_name} \
--net_name ${model_name}
14. Compile the tensorflow models, using the generic script we just created:
$ conda activate vitis-ai-tensorflow
(vitis-ai-tensorflow) $ source ./compile_tf_model.sh tf_resnet50 tf_resnetv1_50_imagenet_224_224__6.97G
15. Verify the contents of the directory with the tree utility:
(vitis-ai-caffe) $ tree
├── compiled_output
│ ├── cf_densebox_wider_360_640_1.11G
│ │ ├── densebox_kernel_graph.gv
│ │ └── dpu_densebox.elf
│ ├── cf_resnet50_imagenet_224_224_7.7G
│ │ ├── dpu_resnet50_0.elf
│ │ └── resnet50_kernel_graph.gv
│ ├── tf_resnetv1_50_iamgenet_224_224_6.97G
│ ├── dpu_tf_resnet50_0.elf
│ └── tf_resnet50_kernel_graph.gv
├── compile_cf_model.sh
├── compile_tf_model.sh
├── custom.json
├── {platform}.dcf
└── {platform}.hwh
6 directories, 15 files
16. Exit the tools docker
(vitis-ai-caffe) $ exit
Step 3 - Compile the AI ApplicationsThe Vitis-AI 1.1 provides several different APIs, the DNNDK API, and the VART API.
The DNNDK API is the low-level API used to communicate with the AI engine (DPU). This API is the recommended API for users that will be creating their own custom neural networks, targeted to the Xilinx devices.
The Vitis-AI RunTime (VART) API, and Vitis-AI-Library, provide a higher level of abstraction that simplifies development of AI applications. This API is recommended for users wishing to leverage the existing pre-trained models from the Xilinx Model Zoo in their custom applications.
Step 3.1 - Compile the DNNDK based AI ApplicationsThis version of the tutorial only covers the DNNDK based examples, and is based on the documentation available on the Vitis-AI github repository, specifically the “mpsoc” section.
https://github.com/Xilinx/Vitis-AI/tree/v1.1/mpsoc
1. Change to the DPU-TRD-{platform} work directory.
$ cd DPU-TRD-{platform}
2. Download and install the SDK for cross-compilation, specifying a unique and meaningful installation destination (knowing that this SDK will be specific to the Vitis-AI 1.1 DNNDK samples)
$ wget -O sdk.sh https://www.xilinx.com/bin/public/openDownload?filename=sdk.sh
$ chmod +x sdk.sh
$ ./sdk.sh -d ~/petalinux_sdk_vai_1_1_dnndk
3. Setup the environment for cross-compilation
$ unset LD_LIBRARY_PATH
$ source ~/petalinux_sdk_vai_1_1_dnndk/environment-setup-aarch64-xilinx-linux
4. Download and extract the additional DNNDK runtime content to the previously installed SDK
$ wget -O vitis-ai_v1.1_dnndk.tar.gz https://www.xilinx.com/bin/public/openDownload?filename=vitis-ai_v1.1_dnndk.tar.gz
$ tar -xvzf vitis-ai-v1.1_dnndk.tar.gz
5. Install the additional DNNDK runtime content to the previously installed SDK
$ cd vitis-ai-v1.1_dnndk
$ ./install.sh $SDKTARGETSYSROOT
6. Make a working copy of the “vitis_ai_dnndk_samples” directory.
$ cp -r ../mpsoc/vitis_ai_dnndk_samples .
7. Download and extract the additional content (images and video files) for the DNNDK samples.
$ wget -O vitis-ai_v1.1_dnndk_sample_img.tar.gz https://www.xilinx.com/bin/public/openDownload?filename=vitis-ai_v1.1_dnndk_sample_img.tar.gz
$ tar -xvzf vitis-ai_v1.1_dnndk_sample_img.tar.gz
Step 4 - Create the SD card image1. Create a “sdcard” directory
$ cd DPU-TRD-{platform}
$ mkdir sdcard
2. Copy the design files (hardware + petalinux) for the DPU design to the “sdcard” directory.
$ cp prj/Vitis/binary_container_1/sd_card/* sdcard/.
3. Copy the applications to the “sdcard” directory
$ cp -r vitis_ai_dnndk_samples sdcard/.
4. Copy the Vitis-AI runtime for DNNDK to the “sdcard/runtime” directory
$ mkdir sdcard/runtime
$ cp -r vitis-ai_v1.1_dnndk sdcard/runtime/.
5. At this point, your “sdcard” directory should have the following contents
$ tree sdcard
6. Copy the contents of the “sdcard” to the boot partition of the scard
Step 5 -Run AI applications on it1. Boot the board with the sdcard
2. Login “root” as login and password.
3. Navigate to the sdcard folder
a. For the ZYNQ UltraScale + MPSoC, this can be done as follows:
$ cd /run/media/mmcblk0p1
4. Copy the dpu.xclbin file
$ cp dpu.xclbin /usr/lib/.
5. Install the Vitis-AI embedded package
$ cd runtime/vitis-ai_v1.1_dnndk
$ source ./install.sh
If the dpu.xclbin file is not copied it will generate an error message, since it will attempt to copy it from the directory.
cp: cannot stat ‘/mnt/dpu.xclbin’: No such file or directory
Warning: pip3 command not found, skip install python support
6. If prompted for a login, again, specify “root” as login and password
7. Re-navigate to the sdcard directory
8. Validate the Vitis-AI board package with the dexplorer utility
$ dexplorer --whoami
[DPU IP Spec]
IP Timestamp : 2020-11-09 =19:43:41
DPU Core Count : 1
[DPU Core Configuration List]
DPU Core : #0
DPU Enabled : Yes
DPU Arch : B43404
DPU Target Version : v1.4.1
DPU Freqency : 300 MHz
Ram Usage : Low
DepthwiseConv : Enabled
DepthwiseConv+Relu6 : Enabled
Conv+Leakyrelu : Enabled
Conv+Relu6 : Enabled
Channel Augmentation : Enabled
Average Pool : Enabled
9. Define the DISPLAY environment variable
$ export DISPLAY=:0.0
10. Change the resolution of camera to 640x480
$ xrandr --output DP-1 --mode 640x480
11. Launch the sample applications
$ cd vitis_ai_dnndk_samples
12. Press <CTRL-C> to exit the application
<CTRL-C>
$ cd ..
Run the following commands in terminal to ensure it is up to date.
$ sudo apt-get update
$ sudo apt-get upgrade
$ sudo apt-get install python3.6-dev python3-pip git
Training the Handwritten Text Recognition Model:We need to train a model that can recognize handwritten text. For training we use Tensorflow 2.0 in Google Colab. In terms of data, we will use the IAM Database. This data set contains more than 7900 pre-labeled text lines from approx 450 different writers.
Example image from database:
For downloading the database you gave to register at following website to access it.
After that for training clone the GitHub Repo in your home folder of your computer:
$ cd ~
$ git clone https://github.com/bandofpv/Handwritten_Text.git
Setup a virtual environment before installing the required python modules
$ sudo pip3 install virtualenv virtualenvwrapper
$ echo -e "n# virtualenv and virtualenvwrapper"
$ echo "export WORKON_HOME=$HOME/.virtualenvs"
$ echo "export VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3"
$ echo "source /usr/local/bin/virtualenvwrapper.sh"
$ source ~/.bashrc
$ mkvirtualenv hand -p python3
$ cd ~/Handwritten_Text
$ pip3 install -r requirements.txt
Download the database
$ cd ~/Handwritten_Text/raw
$ USER_NAME=your-username
$ PASSWORD=your-password
$ wget --user $USER_NAME --password $PASSWORD -r -np -nH --cut-dirs=3 -A txt,png -P iam http://www.fki.inf.unibe.ch/DBs/iamDB/data/
$ cd ~/Handwritten_Text/raw/iam/
$ wget http://www.fki.inf.unibe.ch/DBs/iamDB/tasks/largeWriterIndependentTextLineRecognitionTask.zip
$ unzip -d largeWriterIndependentTextLineRecognitionTask largeWriterIndependentTextLineRecognitionTask.zip
$ rm largeWriterIndependentTextLineRecognitionTask.zip robots.txt
After downloading the database, transform it into a HDF5 file:
cd ~/Handwritten_Text/src
python3 main.py --source=iam --transform
This will create a file named iam.hdf5 in the data directory.
Now, Open the training python file in Google Colab:
Select the Copy to Drive tab in the top left corner of the page.
Then, go onto your Google Drive and find the folder named Colab Notebooks. Press the + New button on the left and create a new folder named handwritten-text. Go into the new folder you created and press the + New button and select the Folder upload option. You will need to upload both the src and data folder from our Handwritten_Text directory. You screen should look like this:
Check the Runtime tab near the top left corner of the page and select Change runtime type. Customize the settings too look like this:
To prevent Google Colab from disconnecting to the server, press Ctrl+ Shift + I to open inspector view. Select the Console tab and enter this:
function ClickConnect(){
console.log("Working");
document.querySelector("colab-toolbar-button#connect").click()
}
setInterval(ClickConnect,60000)
Now let's start training! Simply select Run All. Follow through each code snippet. Just click the link it provides you and copy & paste the authorization code in the input field.
Let the notebook run until it finishes training.
Take a look at the Predict and Evaluate section too see the results:
$ sudo apt-get update
$ sudo apt-get upgrade
$ sudo apt-get install python3-pip gcc-8 g++-8 libopencv-dev
$ sudo apt-get install libhdf5-serial-dev hdf5-tools libhdf5-dev zlib1g-dev zip libjpeg8-dev gfortran libopenblas-dev liblapack-dev
$ sudo pip3 install -U pip testresources setuptools cython
$ sudo pip3 install -U numpy==1.16.1 future==0.17.1 mock==3.0.5 h5py==2.9.0 keras_preprocessing==1.0.5 keras_applications==1.0.8 gast==0.2.2 enum34 futures protobuf
$ sudo pip3 install --pre --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v43 tensorflow-gpu==2.0.0+nv20.1
This will install OpenCV 4.0 for computer vision, gcc-8 & g++-8 for C++ compiling, and TensorFlow 2.0 to run our handwritten text recognition model.
$ sudo pip3 uninstall enum34
$ sudo apt-get install python3-matplotlib python3-numpy python3-pil python3-scipy nano
$ sudo apt-get install build-essential cython
$ sudo apt install --reinstall python*-decorator
$ sudo pip3 install -U scikit-image
$ sudo pip3 install -U google-cloud-vision google-cloud-texttospeech imutils pytesseract pyttsx3 natsort playsound
$ sudo pip3 install -U autopep8==1.4.4 editdistance==0.5.3 flake8==3.7.9 kaldiio==2.15.1
We have to install llvmlitein order to install numba
$ wget http://releases.llvm.org/7.0.1/llvm-7.0.1.src.tar.xz
$ tar -xvf llvm-7.0.1.src.tar.xz
$ cd llvm-7.0.1.src
$ mkdir llvm_build_dir
$ cd llvm_build_dir/
$ cmake ../ -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD="ARM;X86;AArch64"
$ make -j4
$ sudo make install
$ cd bin/
$ echo "export LLVM_CONFIG=\""`pwd`"/llvm-config\"" >> ~/.bashrc
$ echo "alias llvm='"`pwd`"/llvm-lit'" >> ~/.bashrc
$ source ~/.bashrc
$ sudo pip3 install -U llvmlite numba
Make sure to prevent the WiFi network from dropping out:
sudo iw dev wlan0 set power_save off
In order to take advantage of Google Cloud's Vision and Text-to-Speech APIs, we need to create an account. Next go to your GCP Console and create a new project with any name using the following steps.
1. Select Cloud Vision API and click Enable.2. Go back to the Cloud Console API Library and enter "Text to Speech" in the search bar.3. Select "Cloud Text-to-Speech API" and click Enable.4. Scroll to Cloud Translation and select Cloud Translation API Editor. Select Continue.5. Click Create Key, select JSON, and click Create.
This will create a.json file that will allow your ZYNQ UltraScale + MPSoC ZCU104 to connect to your cloud project.
We also need to check the name of our Google Cloud Project ID. Go to your GCP Console on the Project info section, you will see ProjectID. This is your Project ID. We will use it soon.
$ git clone https://github.com/bandofpv/Reading_Eye_For_The_Blind.git
Remember the handwritten text recognition model we trained earlier?Now we need to save it into our Reading_Eye_For_The_Blind directory.
Go back to the handwritten-text Google Drive folder:
Right click on the output folder and click "Download". This is our model that we trained. Move it to our directory and rename it (Note: Change name-of-dowloaded-zip-file to the name of the downloaded zip file):
$ export YOUR_DOWNLOAD=name-of-dowloaded-zip-file
$ unzip ~/Downloads/$YOUR_DOWNLOAD -d ~/Reading_Eye_For_The_Blind
$ mv ~/Reading_Eye_For_The_Blind/output ~/model
To make the our program run on the ZYNQ Ultrascale every time we turn it on we need to create an autostart directory:
$ mkdir ~/.config/autostart
We need to move Reader to new directory:
$ mv ~/Reading_Eye_For_The_Blind/ReadingEye.desktop ~/.config/autostart/
Update the environmental variables in the file:
nano ~/.config/autostart/ReadingEye.desktop
You have to find USERNAME and replace it with the username
def check_internet(host='http://google.com'):
try:
urllib.request.urlopen(host)
return True
except:
return False
By using the urllib module, If it connect's to Google, then it returns that it has internet connection.
After checking if the ZYNQ Ultrascale has internet connection, it will take a picture using OpenCV:
If it has internet connection, it will connect to the Google Cloud and use the Vision and Text-to-Speech APIs to recognize both printed and handwritten text and synthesize it to speech.
It will first attempt to detect a page in the photo. This is done using OpenCV's edge detection. It will transform it into a top-down view:
However, if it does not detect a page it will just convert it into grayscale and perform adaptive thresholding:
After preprocessing the image into a picture ready for printed text recognition, we will use PyTesseract:
text = pytesseract.image_to_string(img)
Next, we will use pyttsx3 to synthesize the recognized text into speech:
engine = pyttsx3.init()
engine.setProperty('voice', "en-us+f5")
engine.say(text)
Now for the handwritten text. Handwritten text is much harder to detect then printed text. This makes it harder to recognize handwritten text.
Using the IAM Database, with more than 9, 000 pre-labeled text lines from 500 different writers, we trained a handwritten text recognition:
Through Deep Learning, we use our data set into the Convolutional Recurrent Neural Network which can overcome some limitations
Comments