A large part of our interaction with the world nowadays comes from surfing the web. This requires constant interaction with keyboard and mouse, posing a large problem for those who are not able to do this due to physical impairments or disabilities. BrowserHelp attempts to tackle this problem by offering an alternative, a natural voice interaction with your Amazon Echo device to give you complete control over your Chrome Browser
What it doesAfter installing the Alexa skill and companion Chrome extension, you can navigate the web and perform all of your basic browser interactions without having to lay a finger on your keyboard or mouse! Searching with Google, scrolling, tab management, navigating to arbitrary links on a page and moving through your history are some examples, but you can also set your preferred news site and up to 3 favourites for easy access
How I built itBrowserHelp consists of three components:
- The BrowserHelp Alexa Skill, backed by a Node.JS Lambda function
- The BrowserHelp Chrome extension
- A NodeJS server, needed to:
- Form a bridge between the HTTP-based requests from Lambda functions, and the websocket connections needed by the Chrome extension
- Facilitate Login with Amazon when setting up the Chrome Extension
User interaction and conversation flow is handled within the Lambda function, which uses Account Linking to identify different users. Actions to be performed are sent from Lambda function to the server via a secure connection, together with a hashed user identifier. The Chrome extension, once installed, uses Login With Amazon via the server to acquire and store the same hashed identifier. After this, server and extension establish a dedicated and secure socket.io channel for that hash through which all communication for that user runs. The extension then performs requested actions using a mix of Chrome APIs and injected content scripts.
Challenges I ran into- The only scalable way of keeping a Chrome extension in sync with the actions it needs to perform, is by using websockets and the Publisher-Subscriber (PubSub) pattern. This does not, however, work well with the stateless architecture of Lambda functions, which cannot keep a websocket connection alive. The most scalable way I could find was relaying all of the lambda function requests to a server, which creates dedicated websocket channels for users to which their Chrome extension can subscribe
- Free-form text input, as needed for searching any website or adding any input in a page, is still quite a challenge using Alexa. As a (hopefully) temporary solution, I decided to use the Web Speech API to interpret search queries
- BrowserHelp is my first project going into production, and it's incredibly gratifying to finish a project and seeing it usable by people from all over the world
- Receiving incredibly positive feedback from multiple users during beta testing and while live, and hearing about ways in which BrowserHelp is used I couldn't have thought about before
- Despite Amazon's strict certification process, coming up with a unique use case for Alexa that's different from most Alexa skills which just stick to the voice-based interaction with your Echo
- Visit browserhelp.me to install the Alexa Skill and Chrome Extension.
- After installing the skill, enable Account Linking for that skill via either the Alexa app on your phone or the Alexa web app.
- When you've installed the companion Chrome Extension, Alexa BrowserHelp, you will be prompted to login via Login With Amazon. Log in using the same account details as used for installing the Alexa Skill.
- Once this is complete, a message will appear telling you you can now close the tab and start using the skill, or setup your favourite websites and news site via the options page
- You are now ready to start using the skill by saying "Alexa, start BrowserHelp". Alternatively, try "Alexa, ask Browserhelp to scroll down" or one of the other supported phrases listed below.
To install the skill, follow the following steps:
- Clone the code from https://github.com/BerioFlow/BrowserHelp
- Deploy the server on any platform, and enable https
- Update all occurrences of the "serene-harbor-37271.herokuapp.com" URL in the project to the baseUrl of your own server
- Install the chrome extension found in the extension folder as described in https://developer.chrome.com/extensions/getstarted#unpacked
- Create a new Skill for Alexa, and when configuring the Interaction Model use the settings stored in intentScheme.json, LIST_OF_ITEMS.txt, and sampleUtterances.txt from the skill directory to define the allowed voice interactions
- Configure and upload your skill via the AWS CLI as described in this link: https://developer.amazon.com/blogs/post/Tx1UE9W1NQ0GYII/publishing-your-skill-code-to-lambda-via-the-command-line-interface and use the already included publish.sh to re-upload
- The setup should now be complete, and if the skill was uploaded correctly it has been automatically made available for usage on Alexa devices on which you are logged in with your Amazon account. Test the skill by asking 'Alexa, start BrowserHelp'
Try some of the following sample utterances:
- Search with Google
- Show News
- Highlight links
- Open Link {x}
- Remove highlighting
- Open favourite {1/2/3}
- Help
- Navigate {back/forward}
- Scroll {up/down}
- {Open/close} tab
- Press Enter
- Reload page
- Open {Youtube/Google/Facebook/Twitter/Hacker News}
- Offer custom integration and specific voice commands for large platforms such as Youtube or Facebook
- Inject Web Speech API for filling in any form and search box on the page, in the same way as currently done for highlighting and selecting links
- Extend commands. Commands present in the next version include:
- Dictate Page
- List Bookmarks
- Open Bookmark {x}
- Log In
- Press {Tab/Backspace/Spacebar/Up/Down/Left/Right}
- Use Input {1/2/3}
- Setting a single or repeated timer for any of the existing commands
Comments