Problem Statement
Planning & Research
Project Proposal
MK1: Our First Prototype
MK2: Refining the Concept
Next Steps
Additional Technical Details
Demo Videos
Works Cited

Team blindsight:

•

•

•

Published June 23, 2018 © CC BY-NC-SA

Blindsight - Virtual Eyes Through Haptic Feedback

Blind people have a big problem: they can’t see. Blindsight simulates a novel sense of sight with haptic feedback vibration motors and ML.

IntermediateWork in progressOver 2 days8,334

Grand Prize: Trip to the Final Competition

2018 China-US Young Maker Competition

Blindsight - Virtual Eyes Through Haptic Feedback

Things used in this project

Hardware components

Raspberry Pi Zero Wireless

Wide Angle FOV160° 5-Megapixel Camera Module for Raspberry Pi

Coin Mobile Phone Vibration Motors

LED (generic)

Adafruit PowerBoost 500 Charger - Rechargeable 5V Lipo USB Boost @ 500mA+

Lithium Polymer Battery 3.7V 2000mAH

Bitcraze Qi charger deck

Fabric

Software apps and online services

Raspberry Pi Raspbian

Snappy Ubuntu Core

Python 3.6

OpenCV – Open Source Computer Vision Library OpenCV

Amazon Web Services AWS EC2

Google Firebase

Google Cloud Services for OCR

TensorFlow

Nvidia CUDA

Autodesk Inventor

Cura

Hand tools and fabrication machines

Soldering iron (generic)

Hot glue gun (generic)

3D Printer (generic)

Sewing Machine

Story

(Our 3 Minute Submission Video is all the way at the bottom. Here's a quick link)

Problem Statement

In today’s world, designers often create environments that fundamentally rely on their users being able to see. Accessibility to those without this crucial ability is an afterthought, if it is even considered at all. For the 285 million visually impaired people around the world, tasks like finding one’s keys or walking along busy sidewalks become arduous or even impossible. Vision difficulties are often associated with aging, and with a growing population, the World Health Organization predicts that the number of visually impaired people will triple by the year 2050.

Despite the pace of medical improvement in recent years, a permanent cure for blindness remains elusive, and even the most promising current treatments are highly experimental and extremely expensive. Blind people today rely on sighted guides, seeing-eye dogs, and canes even a century after the introduction of these solutions, with obvious associated limitations of functionality. Even so-called "cutting-edge" vision technologies are only able to describe a blind person's environment. Crucially, these devices fail absolutely when it comes to actually interacting with the environment. Without a functional, cost-effective way to actively engage in the world around them, people with visual disabilities are relegated to spectator status in our society.

To address this pressing need, our team has designed blindsight, a wearable device that greatly increases the autonomy of the blind.

Planning & Research

Before embarking on this ambitious project, our team devised a careful plan for solution development. In order to ensure that blindsight would be a significant improvement in the field of vision technologies, we conducted an extensive review of existing solutions, a selection of which is presented here:

The KNFB Reader is a text reader application for the blind. The user utilizes a mobile device to take a picture of a document or other text source, and the app subsequently converts that text into speech or Braille. Limitations: Since this technology is only available in app form, a blind user must continuously hold their device out in front of them, an awkward pose to maintain for long periods of time.

TapTapSee is an open source image recognition software than can tell blind users what objects are in a certain picture. By moving a mobile phone around, the blind user can scan through individual items in their vicinity. Limitations: The range and FOV of a phone's camera are limited through this app, which means that it functions well only in small, neatly-ordered environments that are consequently unrepresentative of the real world.

LookTel Money Reader is a service that calculates the amount of currency present in an image of paper money. Limitations: This app functions well in its specific use case, but its inability to handle anything other than currency means it is a contributor to the "app overload" blind users often experience.

From our analysis of these and other technologies, we were able to generalize the shortcomings of current solutions:

1. Slow and expensive: Many of these solutions rely heavily on offloading the actual processing to a server. Without an elegant way to handle the latency and provide rapid feedback to the user, these technologies will remain imperfect.

2. Uncomfortable and awkward: Since the majority of current solutions are apps, they require a user to continuously direct their mobile phone towards what they wish to interact with. This position is straining on the hands and also particularly taxing for the elderly, a generally weaker population that is much more susceptible to vision loss.

3. Passive and incomplete: Most significantly, no current solution provides a simple and effective way to directly engage the blind user with their environment. Using existing technologies essentially tethers a user to their phone, significantly impacting their overall autonomy.

Project Proposal

By developing blindsight, our team aspires to substantially improve the quality-of-life of the visually impaired in today's world. Our armband product will integrate a camera to view the user's environment and implement machine learning to interpret this data as a collection of objects and features of interest. Through companion iOS and Android apps equipped with headphones, users will issue verbal commands to the device. blindsight will also utilize a novel directional haptic feedback system, leveraging the increased sensitivity to touch that often accompanies a loss of vision. By applying vibrational stimuli to specific regions of a user's arm, our device will precisely guide the hand towards a target item, enabling interactions with objects that are otherwise invisible.

MK1: Our First Prototype

With our project proposal defined, we began the construction of our MK1 blindsight device. For the purposes of this first prototype, our device will have four main components:

Control Module: This module contains the Raspberry Pi Zero W, which serves as the computer controlling the entire device. The module contains the attached Raspberry Pi Camera along with a capacitive touch sensor, all housed in a 3D printed housing.

Control Module Lid, with touch sensor, camera, and wiring openings.

Control Module Base, with wiring openings and sewing holes.

Battery Modules (x2): These modules each contain 2xAA batteries to power the entire device, and are encased in 3D printed housings.

Battery Module Base, with wiring openings and sewing holes.

Stretchable Armband: The armband itself is the fundamental structure of the device. For MK1, our team utilized the stretching fabric of a black sock. The armband is embedded with 8 vibration motors, which are the essential components of our unique directional haptic feedback system. The other modules are all sewn directly onto the armband to complete the device.

Photo of the completed MK1.

MK1 Functionality Summary

blindsight is an assistive vision technology that helps the blind interact with their environment more effectively. Our prototype includes a Raspberry Pi ("RPI") Zero Wireless, an RPI camera module, haptic feedback motors, a Node.JS Express server, and an Android application. We created an armband by attaching the RPI, motors, and AA batteries into 3D-printed cases. These modules were then sewn onto a wrist sleeve.

When turned on, the Raspberry Pi waits for a wake command from the user. This can come from either saying the wakeword "Hey George" into the microphone of an Android phone running our application, or by placing a finger on the easy-to-find divot in the casing of the device. The user then verbalizes their command into the phone's microphone, or long-presses the capacitive touch sensor to trigger a default scan of the environment. This command is sent to the server to be relayed to the Raspberry Pi.

Once the Raspberry Pi receives the command from the server, it begins to execute the required action. First, the Raspberry Pi captures an image from the camera and converts it into a base64 string to be sent back to the server. The server loads a TensorFlow instance to describe the scene or pinpoint a specific object.

If the user simply wants a general description of the surrounding environment, the server sends its result back to the Android app, in order for the answer to be spoken aloud to the user. If the user instead wants to locate a specific object, the process is more complex. The server returns a bounding box of the target object to the Raspberry Pi. The RPi determines the offset between the object location and the user's hand (whose location is known by the fixed attachment of the camera). Utilizing OpenCV's motion vectors and the vibration motors, the Raspberry Pi vibrates in the direction that the user's hand must travel to reach the target object, and pulses once the user has successfully reached the target.

With this process, blindsight acts as a "new pair of eyes" for blind users, enabling them to directly engage with their world. Our MK1 prototype is fully functional, but had some limitations and areas of improvement. The reliance on a server is primarily because of the difficulty we encountered in running the TensorFlow instances on the mobile device itself, as well as for transfer of images between the RPi and the processing device. Even though we are using a server, which does slow down the overall process, the RPi remains capable of real-time object tracking through OpenCV, proving that an eventual server-free version is definitely in the realm of possibility.

The following are assorted videos and pictures documenting our MK1 build:

Computer Object Tracking Demo

Android Application Home Screen

blindsight MK1

Capacitative Touch Sensor as a means to activate George, the smart assistant

Electronic Components Inside

Using a sock as an armband

The Team

MK2: Refining the Concept

MK2 represented the next step forward in our design journey. Recognizing the success of the first prototype, our team decided to focus on reducing the functional barriers that would inhibit successful use of blindsight. Our targeted areas for improvement were:

Long-term powering and charging solution

Camera improvements

Additional user features implementation

Physical housing improvements

Mobile app improvements

Long-term Powering Solution

In MK1, power for the entire system was provided solely through 4 AA batteries enclosed in a dedicated case. We had chosen to use these batteries for their convenience in our early prototyping, but we understood that this was one of the most inefficient areas of our prototype. AA batteries have a finite lifespan and also have a low charge-to-weight ratio. Lithium Polymer batteries (“LiPo”) are rechargeable, and much more compact for the same charge. Because of the fact that the band needed to wrap around the user’s arm, we designed our device to use two smaller batteries wired in parallel, instead of a single, larger LiPo. We purchased 2 of the 2000mAH LiPos listed earlier in our documentation and used these for powering our device.

A major advantage of LiPo batteries was the ability to safely recharge the batteries without physically replacing them. Typically, LiPo batteries are charged with a specialized JST connector attached to a LiPo-specific charger. Since we recognized that people with visual impairments often struggle with messes of cables, we instead opted to pursue a wireless charging solution. Through the integration of the Qi charging receiver and PowerBoost 500C, blindsight can be charged simply by placing it on top of a generic Qi wireless charging pad. Qi is the industry standard wireless charging technology for smartphones and similar devices, so it was an obvious choice for our product.

Two 2000mAh Batteries

Camera Improvements

A different problem we encountered with our MK1 prototype was that the camera’s Field of View (FOV) was very narrow. During our testing, we realized that the official Raspberry Pi camera we had originally purchased could only see objects that were already within reach of the blind user’s hand. Obviously, this made the object tracking functionality effectively useless. Through our upgraded SainSmart camera, however, the Raspberry Pi is able to track objects with a 160-degree FOV, more than double the initial view. This allows the user to point blindsight towards a cluttered table and be able to scan the entire desk at once, making our device much easier to use.

160-degree FOV

Additional User Features Implementation

During our initial research of existing solutions, our team had recognized the issue of "app clutter" - the blind are often reliant on a multitude of apps that are each tailored to a singular and limited use-case, drastically detracting from the user experience. To address this better in our MK2, we focused on incorporating other standard features that a blind person would find useful.

Since not all text encountered in the real world is presented with a Braille equivalent, visually impaired individuals are often unable to read signs or posters bearing important information. By incorporating Optical Character Recognition ("OCR"), blindsight can effectively translate text scanned by the camera into spoken words through the user's headphones.

An additional challenge for blind people is the ability to recognize other people in front of them. Typically, blind people rely on gradually memorizing specific details of a person's voice in order to identify them, but this process takes time and is imperfect. By implementing keywords to train a facial recognition model on a new acquaintance's face and later identify someone in front of the user, blindsight addresses this specific problem.

Physical Housing Improvements

After assembling and wearing the MK1, we immediately recognized that the design was not particularly comfortable for long-term use. Its uneven weight distribution made the band prone to slippage, and the central module jutted out a considerable height from the otherwise sleek band. As we moved forward with blindsight, we also understood that we needed a better method of constructing the actual stretchable armband.

In part because of the numerous electronics hardware changes mentioned earlier, the original style of externally-visible 3D printed modules was replaced with a cleaner system of fully embedding electronics into the armband. Better-designed CAD models of these parts are included as attachments. Unfortunately, due to difficulties in accessing a 3D printer, our team was unable to print the parts we required, resulting in a less-than-elegant implementation.

The band itself was now produced with fabric purchased from a local store. Since we could fully manipulate the fabric, we were able to better position the motors and other components throughout the armband, leading to a significantly improved weight distribution. Additionally, Velcro dots added into the design allowed for easy maintenance as we ironed out bugs in our code.

1 / 15 • Assembly Front

Mobile App Improvements

Our primary improvement was to develop a blindsight companion app for the iOS platform as well, since the majority of visually impaired people actually utilize iPhones and other Apple devices. More superficial UI changes were also made to the original Android app. Functionality-related improvements included transitioning the name of our virtual assistant from George to Christy, and beginning the process of developing this assistant into a complete AI persona of the likes of Siri or Google Assistant.

MK2 Functionality Summary:

In almost all respects, the functionality of the MK1 prototype has been significantly improved in the MK2 iteration. The additional videos and images below demonstrate the features we have worked to add into our device:

MK2 Object Tracking Demo

Next Steps

While MK2 represents a definite improvement over MK1, there remain areas that must be worked on to bring blindsight closer to implementation in the real world:

3D print the designed parts: Since we had been unable to print the parts we designed in time for the MK2 prototype, this remains an important area where we must improve the overall aesthetic appeal of the device.

Localize the processing: In order to avoid relying on a server in the long-term, we need to move the processing onto the user's phone instead, saving time and money.

Create a Printed Circuit Board: By designing a custom PCB with the minimal required elements for our device, we can reduce the current draw from the RPi and also achieve a better form factor for our use case.

Improve the Christy virtual assistant: There are a number of additional specific scenarios where our technology can provide a significant benefit to the blind, and so we will continue adding capabilities to our smart assistant.

Implement night vision: By utilizing a special camera module without an InfraRed (IR) filter and then utilizing IR LEDs to shine on the camera's target, we can grant the blind night vision that, intriguingly, is superior to that of people without a visual disability.

Incorporate depth perception: If our device is able to determine exactly how far away the desired target object is through stereoscopy, we will be able to more carefully direct the blind user's hand towards the target, and can also implement features for collision avoidance.

Add gesture control: Since voice input is not always possible in extremely loud or quiet surroundings, gesture control through electromyography (EMG) sensors will add an alternative input method for users.

Additional Technical Details

The subsequent sections provide deeper insights into our design and implementation of specific subsystems of blindsight.

Machine Learning

There were several deep learning models used as a part of blindsight. Using AWS and Nvidia GPUs, we ran multiple convolutional neural networks with TensorFlow. We ran the Show-Attend-Tell model, which is capable of accurately describing a complex scene to the user. We also ran the TensorFlow object detection API with a frozen inference model and a custom trained model, allowing for over a thousand objects to be recognized. To reduce the time required to load in the model into a GPU, we wrote scripts that preloaded the model and ran it as a MongoDB listener. Whenever a new image needs to be recognized, the script runs it against the preloaded model, instead of taking 20 seconds to load the model into the VRAM.

Using OpenCV, the Raspberry Pi could take a picture of an image and request for a bounding box. The Raspberry Pi would then use a Median Flow Tracker to track the bounding box, instead of repeatedly calling the server for new bounding boxes. This approach avoids wasting server processing power and time, while also being more fault tolerant. In the future, we plan to run the TensorFlow models onto the user’s phone instead of relying on expensive cloud computing services. We also plan to create a distributed network of phone devices to share compute power to users with less GPU power on their device. We used the Google Cloud Vision API for OCR along with a facial recognition library to detect faces locally on the Raspberry Pi. However, due to the limited compute power of the Raspberry Pi, the face detection runs slowly. We plan to use open source OCR alternatives to Google Cloud Vision and consider offloading the facial recognition onto the user’s phone device instead.

CAD and 3D Printing

The physical plastic parts we designed for blindsight served important structural purposes. The first component was a two-part central module to encompass the Raspberry Pi and camera module. This case was difficult to make as it had to ensure the individual electrical pieces did not slide around, while also ensuring tolerances were high enough to be safely 3D printed. The elasticity of the jumper wires making connections to the Raspberry Pi also had to be considered, to ensure that repeated motion of the band would not wear down the wires and potentially cause a catastrophic short.

The second part was a LED case. This case was designed to hold eight LEDs which would represent the eight motors within the armband. This serves as an indication for spectators that blindsight does, in fact, function correctly. Since outside observers cannot feel the vibrations from the device, the LED indicators provide appropriate verification.

The final part was a backing for the Qi receiver. The Qi receiver is a specially shaped coil of wire designed for induction to wirelessly charge. Its thin structure makes its incorporation into the armband easy, but also means that the receiver is prone to damage. A 3D printed backing piece was designed to reinforce the coil, while also allowing the PowerBoost 500C chip to be easily attached to the receiver itself, for easier cable management.

Exploded View

Electronics

While designing the electrical circuit to power and run the blindsight wristband, we took into account everyday usability and reliability. We expect blindsight to be in sleep mode for approximately 8 hours per day (only going into high-power mode on wake, for perhaps 90 minutes of total usage across the day). Since the device will not need to be used while the user sleeps, it can recharge overnight.

On MK1, 4xAA batteries were used to power blindsight, though they could not be recharged and had a bulky form factor. For MK2, we needed a battery that was lightweight, relatively thin, and could be recharged overnight. For our battery we chose to use two lithium polymer 3.7V 1s 2000mAh batteries connected together in a parallel configuration for a overall battery capacity of 4000mAh. We chose to use a lithium polymer chemistry for its higher energy density and minimal capacity loss over time compared to lead or nickel batteries. Additionally we chose to use a 1s 3.7 battery over a 2s 7.4V battery for its lighter and thinner design and because it does not require a voltage regulator.

The Raspberry Pi, vibration motors, and LEDs all require a 5V connection, so we used a PowerBoost 500C. This chip converts the 3.7V connection to 5V, charges the LiPo battery, and provides short circuit and overdischarge protection to the battery. One important factor of blindsight is ease of use, and one potential issue that we anticipated was finding a user-friendly method charging the band. Instead of using a traditional barrel plug or micro-USB charger, we opted for a wireless charging solution, so that user can charge the band on any Qi universal wireless charging pad. When charging on a Qi pad at 2A, the band can go from 10% charge to 100% in as little as seven hours.

1 / 4 • Battery Testing

Mobile Apps

In both our Android and iOS apps, we allow the user to access speech recognition and text-to-speech technologies. The Android app uses CMU’s PocketSphinx for wake word detection and Google speech recognition, and the iOS app uses the built-in Apple speech recognition. When the user says the “wake word” or presses a button on the armband device, speech recognition is activated. After the user’s speech is processed in the app, the command is sent to our server, which activates the camera on the armband device. After the server processes the image, information is sent back to the app, where it is read out through text-to-speech.

iOS Demo

Demo Videos

Below are demonstration videos of the MK1 and MK2 iterations. Our 3 minute video submission is the second of the two videos, labelled "MK2 Demo."

MK1 Demo

MK1 Demo Video

MK2 Demo - Actual Contest Submission Video

NOTE: Since viewers cannot feel the haptic vibrations shown in the video, we have included LED indicator lights that display the same pattern. The red LED represents the motor on the armband that rests against the forearm's upper surface, with the ring of LEDs paralleling the ring of motors throughout the band. We hope this will serve as a proof of our technology's functionality.

Our 3 Minute Submission. Overall Demo Video

Visit our website

Works Cited

Mariotti, Silvio P. “Global Data on Visual Impairments 2010.” World Health Organization, www.who.int/blindness/GLOBALDATAFINALforweb.pdf.

Most recent global report on blindness from World Health Organization

Riccobono, Mark. “Blindness Statistics.” National Federation of the Blind, 12 Mar. 2018, nfb.org/blindness-statistics.

Shows demographics and basic information on the blind population in America

Kurzwell, Ray. “What Can KNFB Reader Do for You?” About KNFB Reader | KNFB Reader, KNFB Reader, knfbreader.com/the-app.

Soon-Shiong, Patrick. “LookTel Products.” LookTel Money Reader, LookTel, www.looktel.com/moneyreader.

Jensen, Alexander. “Press Resources.” Be My Eyes - Bringing Sight to Blind and People with Low Vision, Be My Eyes, www.bemyeyes.com/press.

“TapTapSee.” TapTapSee - Blind and Visually Impaired Assistive Technology - Powered by CloudSight.ai Image Recognition API, MIT, taptapseeapp.com/.

Schematics

Code

from picamera import PiCamera
from picamera.array import PiRGBArray
from gpiozero import Button, PWMOutputDevice, DigitalInputDevice, LED
from subprocess import check_call
from time import sleep

import requests
import numpy as np
import cv2
import time
import sys
import base64
import face_recognition
import glob
import os

url = ":<INSERT URL OF SERVER>"

# These pin values define where the components are plugged in
powerPin = -1 # YOUR PIN NUMBERS
touchPin = -1 # YOUR PIN NUMBERS
motorNEPin = -1 # YOUR PIN NUMBERS
motorNPin = -1 # YOUR PIN NUMBERS
motorNWPin = -1 # YOUR PIN NUMBERS
motorEPin = -1 # YOUR PIN NUMBERS
motorWPin = -1 # YOUR PIN NUMBERS
motorSEPin = -1 # YOUR PIN NUMBERS
motorSPin = -1 # YOUR PIN NUMBERS
motorSWPin = -1 # YOUR PIN NUMBERS

# These pin values define where the indicator LEDs are plugged in
LEDNEPin = -1 # YOUR PIN NUMBERS
LEDNPin = -1 # YOUR PIN NUMBERS
LEDNWPin = -1 # YOUR PIN NUMBERS
LEDEPin = -1 # YOUR PIN NUMBERS
LEDWPin = -1 # YOUR PIN NUMBERS
LEDSEPin = -1 # YOUR PIN NUMBERS
LEDSPin = -1 # YOUR PIN NUMBERS
LEDSWPin = -1 # YOUR PIN NUMBERS

# This function is called to shutdown the RPi
def shutdown():
	check_call(['sudo', 'poweroff'])

# This code is used to enable a shutdown button if you want one
powerSwitch = Button(powerPin, hold_time=5)
powerSwitch.when_held = shutdown

# This is the capacitive touch sensor, or "wake button"
touchBtn = DigitalInputDevice(touchPin)

# Defining motors around the armband (North, South, East, West)
motorNE = PWMOutputDevice(motorNEPin)
motorN = PWMOutputDevice(motorNPin)
motorNW = PWMOutputDevice(motorNWPin)
motorE = PWMOutputDevice(motorEPin)
motorW = PWMOutputDevice(motorWPin)
motorSE = PWMOutputDevice(motorSEPin)
motorS = PWMOutputDevice(motorSPin)
motorSW = PWMOutputDevice(motorSWPin)

# Similarly, defining the LEDs
LEDNE = LED(LEDNEPin)
LEDN = LED(LEDNPin)
LEDNW = LED(LEDNWPin)
LEDE = LED(LEDEPin)
LEDW = LED(LEDWPin)
LEDSE = LED(LEDSEPin)
LEDS = LED(LEDSPin)
LEDSW = LED(LEDSWPin)

# Create an arry to iterate through these motors + LEDs
motors = [motorNE, motorN, motorNW, motorE, motorW, motorSE, motorS, motorSW, LEDNE, LEDN, LEDNW, LEDE, LEDW, LEDSE, LEDS, LEDSW]

# Initialize the camera module
camera = PiCamera()
camera.resolution = (640, 640) # Low resolution makes the processing faster
camera.framerate = 32 # Low framerate as well
rawCapture = PiRGBArray(camera, size=(640, 640))

# Initialize this boolean flag variable
reachedObject = False

# This function will issue a high-powered "pulse" on all motors
def pulse(strength=1.0):
	for motor in motors:
		motor.value = 1.0
	time.sleep(0.5)
	print("pulsed")
	for motor in motors:
		motor.value = 0
	time.sleep(0.25)

# This function is used for object tracking motor updates
def updateMotors(p1, p2, image):
	yFactor = 0
	tolerance = 100 # How close we need to get to the exact center
	
	xOffset = (p1[0] + p2[0])/2 - (640/2) # Calculated from the bounding box
	yOffset = (p1[1] + p2[1])/2 - (640/2) + yFactor # Includes offset for camera elevation
	cv2.putText(image, "Center : " + str(xOffset) +" "+ str(yOffset), (100,90), cv2.FONT_HERSHEY_SIMPLEX, 0.25, (50,170,50), 1);
	if(abs(xOffset) <= tolerance and abs(yOffset) <= tolerance):
                # If we are centered over the object, tell the user we are done
		cv2.putText(image, "CENTERED", (100,100), cv2.FONT_HERSHEY_SIMPLEX, 0.75, (255,255,50), 3);
		return True
		pulse()
		pulse()
	updateMotors = []
	# Otherwise, determine which motors need to be updated, and append
	if(xOffset > tolerance):
		updateMotors.append(motorE)
		updateMotors.append(LEDE)
	if(xOffset < -tolerance):
		updateMotors.append(motorW)
		updateMotors.append(LEDW)
	if(yOffset > tolerance):
		updateMotors.append(motorN)
		updateMotors.append(LEDN)
	if(xOffset < -tolerance):
		updateMotors.append(motorS)
		updateMotors.append(LEDS)
	if(xOffset > tolerance and yOffset > tolerance):
		updateMotors.append(motorNE)
		updateMotors.append(LEDNE)
	if(xOffset < -tolerance) and yOffset > tolerance:
		updateMotors.append(motorNW)
		updateMotors.append(LEDNW)
	if(xOffset > tolerance and yOffset < -tolerance):
		updateMotors.append(motorSE)
		updateMotors.append(LEDSE)
	if(xOffset < -tolerance) and yOffset < -tolerance:
		updateMotors.append(motorSW)
		updateMotors.append(LEDSW)
	# Turn off all previous motors
	for motor in motors:
		motor.value = 0
	time.sleep(0.25)
	# Turn on the motors to indicate direction
	for motor in updateMotors:
		motor.value = 1

# A simple helper function to grab a still and convert to base 64
def base64ify():
	camera.start_preview()
	sleep(0.25)
	camera.capture('foo.jpg', format="jpeg")
	camera.stop_preview()
	image_64 = base64.b64encode(open('foo.jpg', "rb").read())
	return str(image_64.decode('utf-8'))

# This function will tell the phone to speak whatever words are sent
def postCaption(words):
	print(words)
	r = requests.post(url+"/caption", json={"caption":words})

# This function is used for the show-attend-tell model
def describeScene():
	img64 = base64ify()
	r = requests.post(url+"/show-attend-tell", json={"b64" : 'data:image/jpeg;base64,' + img64})
	postCaption("I see " + str(r.json()['response']['caption']))

# This function is used for the OCR
def readText():
	img64 = base64ify()
	r = requests.post(url+"/ocr", json={"b64" : 'data:image/jpeg;base64,' + img64})
	print(r.json())
	if("err" in r.json()):
                # In case text is too blurry
		postCaption("Sorry, I couldn't read that")
	else:
		try:
			postCaption(r.json()['texts'][0]['description'])
		except:
			postCaption("Sorry, I couldn't read that")

# This function "trains" the facial recognition by simply storing the image
def trainFacialRecognition(name):
	camera.start_preview()
	sleep(0.25)
	camera.capture('./faces/' + name + '.jpg', format="jpeg")
	camera.stop_preview()
	# Nice verbal confirmation so user knows how Christy pronounces name
	postCaption("Nice to meet " + name)
	
# This function iterates through the saved faces to find a match
def identifyPerson():
	camera.start_preview()
	sleep(0.25)
	camera.capture('./unknown.jpg', format="jpeg")
	camera.stop_preview()
	postCaption("This will take a while")
	for img in glob.glob("./faces/*.jpg"):
		known_img = face_recognition.load_image_file(img)
		unknown_img = face_recognition.load_image_file("./unknown.jpg")
		known_encoding = face_recognition.face_encodings(known_img)[0]
		unknown_encoding = face_recognition.face_encodings(unknown_img)[0]

		if face_recognition.compare_faces([known_encoding], unknown_encoding)[0]:
			person = os.path.basename(img) # return image
			postCaption("It's " + person[:-3])
			return
	# If the face isn't there, it's probably somebody new
	postCaption("I don't know who it is")

# This function uses OpenCV to track an object
def objectTracking():
	rawCapture.truncate(0)
	img64 = base64ify()
	#Get contents of image
	r = requests.post(url+"/object-recognition", json={"b64" : 'data:image/jpeg;base64,' + img64})
	if r.json()['response']['num_detections'] == 0:
		postCaption('No objects found')
	else:
		bb_orig = r.json()['response']['boxes'][0]
		postCaption('tracking ' + r.json()['response']['classes'][0])
		tracker = cv2.TrackerMedianFlow_create()
		
		# this is the output format
		yMin = int(640 * bb_orig[0])
		xMin = int(640 * bb_orig[1])
		yMax = int(640 * bb_orig[2])
		xMax = int(640 * bb_orig[3])

		# this is what the bbox rect wants
		height = yMax - yMin 
		width = xMax - xMin

		bbox = (xMin, yMin, xMax, xMin)
		
		# Initialize tracker with first frame and bounding box
		n = 0
		ok = True
		for frame in camera.capture_continuous(rawCapture, format="bgr", use_video_port=True):
			image = frame.array
			if n == 0:
				ok = tracker.init(image, bbox)
				n += 1
			else:
				ok, bbox = tracker.update(image)
			if ok:
				# Tracking success
				p1 = (int(bbox[0]), int(bbox[1]))
				p2 = (int(bbox[0] + bbox[2]), int(bbox[1] + bbox[3]))
				cv2.rectangle(image, p1, p2, (255,0,0), 2, 1)
				if updateMotors(p1, p2, image):
					pulse()
					break
			else :
				# Tracking failure
				cv2.putText(image, "Searching...", (100,80), cv2.FONT_HERSHEY_SIMPLEX, 0.75,(0,0,255),2)
	 
			rawCapture.truncate(0)
		camera.close()

# This function handles voice input
def processVoice():
	r = requests.get(url + '/voice')

	if r.json()['command'] != None:
		pulse()
		cmd = r.json()['command'].split(' ')
		firstWord = cmd[0]
		
		#Triggers tracking
		objectTrackingWords = [
			'detect',
			'find'
		]
		#Triggers describing
		objectDescribingWords = [
			'describe',
			'show',
			'tell'
		]
		#Triggers face training
		faceTrainingWords = [
			'friend',
			'meet',
			'meat'
		]
		#Triggers face recognizing
		faceRecognizeWords = [
			'who',
			"who's",
			'whose'
		]
		#Triggers OCR
		readingWords = [
			'read'
		]

		# Based on what the command seems to be saying, execute the action
		if firstWord in objectDescribingWords:
			describeScene()
		elif firstWord in objectTrackingWords:
			objectTracking()
		elif firstWord in faceTrainingWords:
			trainFacialRecognition(cmd[1])
		elif firstWord in faceRecognizeWords:
			identifyPerson()
		elif firstWord in readingWords:
			readText()
		else:
                        # If the command was not recognized, inform user
			postCaption("Sorry, I didn't catch that")
		return True
	return False

# Simple wake up function for startup
def wakeUp():
	requests.post(url+'/phone-wait')

if __name__ == '__main__':
	while True:
                # Wake by button
		if touchBtn.value == 1:
			pulse(0.5)
			i = 0
			while touchBtn.value == 1:
				sleep(0.25)
				i += 1
			if(i < 8):
				wakeUp()
				print("wake")
			else:
				pulse()
				describeScene()
				print("des")
		else:
			print("processing")
			processVoice()
		sleep(0.25)

// libraries
var express = require('express');
var app = express();
var bodyParser = require('body-parser');
var shelljs = require('shelljs');
var base64ToImage = require('base64-to-image');
var admin = require('firebase-admin');
var vision = require('@google-cloud/vision');
var mongoose = require('mongoose');

// mongo set up
mongoose.connect('mongodb://localhost/test');

var schema = new mongoose.Schema({
	file: { type: String },
	processed: { type: Boolean, default: false },
	response: { type: Object, default: null },
	date: { type: Date }
});

var ObjDetImage = mongoose.model('Object_Detection', schema);
var SatImage = mongoose.model('Show_Detection', schema);

// directories setup
var imagesPath = __dirname + '/images/';
var modelFile = __dirname + '/show-attend-tell/289999.npy';

// server setup
app.use(bodyParser.json({ 'limit': '5mb' }));
app.use(bodyParser.urlencoded({ extended: true }))
app.set('port', process.env.PORT || 3000)

// google firebase setup
var serviceAccount = require('<json file here>');

admin.initializeApp({
	credential: admin.credential.cert(serviceAccount),
	databaseURL: '...'
});

// Creates a google vision client
var client = new vision.ImageAnnotatorClient({
	keyFilename: __dirname + '<json file here>'
});

// variable setup
var command = null;
var phoneWait = false;
var caption = null;

// this resets the variables
app.get('/reset', function(req, res){
	command = null;
	phoneWait = false;
	caption = null;
	res.json({ success: true });
});

// get variables
app.get('/stats', function(req, res){
	res.json({
		command: command,
		phoneWait: phoneWait,
		caption: caption
	});
});

// rpi sends req to server
app.post('/phone-wait', function(req, res){
	phoneWait = true;
	res.json({ phoneWait: phoneWait });
});

// phone constantly calls this and on true, gets triggered to do voice command
app.get('/phone-wait', function(req, res){
	phoneWait = false; // no loops
	res.json({ phoneWait: phoneWait });
});

// phone gets command
app.post('/voice', function(req, res){
	command = req.body.command;
	res.json({ command: command });
});

// rpi constantly calls this and on value != null, gets triggered to send pic to obj rec
app.get('/voice', function(req, res){
	var cmd = command;
	command = null; // no loops
	res.json({ command: cmd });
});

// rpi then sends the b64 encoded image here
app.post('/object-recognition', function(req, res){
	var b64 = req.body.b64; // convert b64 to image...
	// random filename
	var filename = base64ToImage(b64, imagesPath).fileName;
	console.log(imagesPath + filename);
	// execute python
	shelljs.exec('python mongo-listener.py -src=' + imagesPath + filename + ' -m="object-detection"', function(code, stdout, stderr){
		ObjDetImage.findOne({ file: imagesPath + filename }, function(err, img){
			if(err) console.log(err);
			console.log(img);
			res.json({ response: img.response });
		});
	});
});

// or here
app.post('/show-attend-tell', function(req, res){
	var b64 = req.body.b64; // convert b64 to image...
	// random filename
	var filename = base64ToImage(b64, imagesPath).fileName;
	console.log(imagesPath + filename);
	// execute python
	shelljs.exec('python mongo-listener.py -src=' + imagesPath + filename + ' -m="show-attend-tell"', function(code, stdout, stderr){
		SatImage.findOne({ file: imagesPath + filename }, function(err, img){
			if(err) console.log(err);

			res.json({ response: img.response });
		});
	});
});

// or here!
app.post('/ocr', function(req, res){
	var b64 = req.body.b64; // convert b64 to image...
	// random filename
	var filename = base64ToImage(b64, imagesPath).fileName;
	// Performs label detection on the image file
	console.log('file uploading');
	console.log('no errors');
	// upload to google cloud vision
	client
		.textDetection(imagesPath + filename)
		.then(response => {
			var texts = response[0].textAnnotations;
			res.json({ texts: texts });
		})
		.catch(err => {
			console.error('ERROR:', err);
			res.json({ err: err });
		});
});

// tell the phone what to say
app.post('/caption', function(req, res){
	caption = req.body.caption;
	res.json({caption: caption});
});

// phone gets caption
app.get('/caption', function(req, res){
	var capt = caption;
	caption = null; // no loops
	res.json({caption: capt});
});

function getOutputJSON(str){
	return JSON.parse(str.substring(str.indexOf('{')-1, str.indexOf('}')+1)); 
}

// run the server
app.listen(app.get('port'), '0.0.0.0', function(){
	console.log('app listening on port: ' + app.get('port'));
});

#!/usr/bin/python
# imports
import tensorflow as tf
import sys
import json
import pymongo

from pymongo import MongoClient
from config import Config
from model import CaptionGenerator
from dataset import prepare_test_data, custom_prepare_test_data

# basic mongo setup
client = MongoClient()
db = client.test
images_collection = db.show_detections

# flags
FLAGS = tf.app.flags.FLAGS

tf.flags.DEFINE_string('phase', 'test',
					   'The phase can be train, eval or test')

tf.flags.DEFINE_boolean('load', False,
						'Turn on to load a pretrained model from either \
						the latest checkpoint or a specified file')

tf.flags.DEFINE_string('model_file', './289999.npy',
					   'If sepcified, load a pretrained model from this file')

tf.flags.DEFINE_boolean('load_cnn', False,
						'Turn on to load a pretrained CNN model')

tf.flags.DEFINE_string('cnn_model_file', './vgg16_no_fc.npy',
					   'The file containing a pretrained CNN model')

tf.flags.DEFINE_boolean('train_cnn', False,
						'Turn on to train both CNN and RNN. \
						 Otherwise, only RNN is trained')

tf.flags.DEFINE_integer('beam_size', 3,
						'The size of beam search for caption generation')
						
# main function
def main(argv):
  # configuration from flags
	config = Config()
	config.phase = FLAGS.phase
	config.train_cnn = FLAGS.train_cnn
	config.beam_size = FLAGS.beam_size

  # setup tensorflow session
	with tf.Session() as sess:

		# load model into VRAM
		model = CaptionGenerator(config)
		model.load(sess, FLAGS.model_file)

		print "Running show-attend-tell machine learning server..."
    
    # forever generate captions for new files that are put onto MongoDB
		while True:
			for image in images_collection.find().sort('date', pymongo.DESCENDING):
				if not image["processed"]:
					config.image_file = image["file"]
					
					# testing phase
					data, vocabulary = custom_prepare_test_data(config)
					tf.get_default_graph().finalize()
					prediction = model.test(sess, data, vocabulary)
					# bc there is only one item
					images_collection.find_one_and_update({ "file": image["file"] }, { "$set": { "response": { 'caption': prediction['caption'][0], 'prob': prediction['prob'][0] }, "processed": True } })

if __name__ == '__main__':
	tf.app.run()

# imports
import argparse
import datetime

from pymongo import MongoClient

# basic mongo setup
client = MongoClient()
db = client.test
images_collection = None

# run this if this file is not imported
if __name__ == '__main__':
  # command line arguments
	parser = argparse.ArgumentParser()
	parser.add_argument('-src', '--source', dest='image_path', type=str,
						default=None, help='Path to image')
	parser.add_argument('-m', '--model', dest='model', type=str,
						default=None, help='Which model to predict the image')

  # tensorflow models
	models = [
		'show-attend-tell',
		'object-detection'
	]
  
  # check for errors in args
	args = parser.parse_args()
	if not args.image_path:
		raise ValueError('Arguments must contain a source image!')
	if not args.model:
		raise ValueError('Arguments must contain a model!')
	if args.model not in models:
		raise ValueError('Model name must be valid!')

	if args.model == 'object-detection':
		images_collection = db.object_detections
	elif args.model == 'show-attend-tell':
		images_collection = db.show_detections

  # create image object
	image = {
		"file": args.image_path,
		"processed": False,
		"response": None,
		"date": datetime.datetime.utcnow()
	}
  
  # insert into MongoDB for ML server to recognize
	images_collection.insert(image)

  # listen for a change in MongoDB and then return response to server
	while True:
		if images_collection.find_one({ "file": args.image_path })["processed"]:
			break