Team Hackathoners:

Rad Ghost

•

Derek Williams

•

Imgyeong Lee

Published November 26, 2023 © MIT

AI Gesture Art

Create stunning visuals from dynamic hand gestures with built-in sketch autocompletion and AI art generation driven by OpenCV.

IntermediateFull instructions provided1 hour902

Popular Vote - Finalist

OpenCV AI Competition 2023

Things used in this project

Software apps and online services

OpenCV – Open Source Computer Vision Library OpenCV

Android Studio

Google MediaPipe

Google Sketch RNN

ClipDrop

Story

Introduction

This project was conceived with the aim of delivering an immersive experience, allowing users to draw images in the air and witness their creations undergo impressive transformations through the application of AI.

TL;DR

If you want to skip all the documentation and set up the project for yourself or see the final application in action, you can skip to the Final Product section. You can grab all the open-source code in the Code section at the bottom of the page.

Features

Hand tracking
Gesture recognition
Sketch autocompletion
AI image generation
Gallery view
Sharing / Deleting the photo

Design

We utilized several technologies in our tech stack to bring our application to life. To start, we used Android Studio as the development environment to build our application. To implement the design of our user interface, we used Jetpack Compose to streamline the process of styling components and provide a comfortable interface for our users. Next, we made use of the Android device’s camera to capture data from the user’s fingers and gestures using MediaPipe. We used OpenCV to perform all the image preprocessing and drawing onto the screen. If the user desires to autocomplete a sketch, the drawn sketch is sent to a backend SketchRNN micro-service that returns back an autocompleted sketch. When the user saves the final sketch, the sketch is sent to a third-party service to generate an improved AI-generated image of the sketch. In summary, our application utilizes a number of technologies to both collect and synthesize the user’s gestures into AI-generated artwork.

Platforms and Technologies:

Android Application (Internet connection & Camera)
MediaPipe (by Google) for hand tracking & gesture detection
OpenCV for image manipulation, preprocessing, and drawing
APIs for AI model (implemented using custom micro-services)
SketchRNN to convert gesture strokes into sketches
ClipDrop to convert sketches into finished drawings
Jetpack Compose for our front-end design and implementation

Figure 1: Overall workflow of the application

Our application seamlessly transforms user gestures into art through intuitive gestures and modern AI technologies. When the user accesses the drawing page, they simply put their hand within view of the front-facing camera. MediaPipe, by Google, takes charge of gesture detection, providing support for various gesture categories and pinpointing landmarks on the user's hand. The primary gesture that activates the drawing mode is when the user points their index finger upward. The app tracks the tip of the index finger as the user draws strokes on the canvas.

When the user is ready to convert their strokes into polished artwork, they can make a closed-fist gesture. At this stage, the app sends the collected strokes to an endpoint on our SketchRNN HTTP server. Within the server, the SketchRNN AI model works its magic, auto-completing the sketch based on a chosen category from the dropdown menu. The AI-generated strokes are then returned to the user's Android device via HTTP and displayed on their screen.

For added convenience, a thumbs-down gesture allows the user to clear the screen, offering a fresh canvas if they are not satisfied with their drawing, the autocompleted sketch, or the final piece. Otherwise, a thumbs-up gesture signals the app to send the sketch to ClipDrop, where it undergoes a transformation into a detailed work of art. This involves background removal, grayscale conversion, and the fine-tuning of image characteristics to ensure optimal results from the AI art generator. Ultimately, the completed artwork is saved in the user's gallery, ready to be admired or shared. Through intuitive gestures and modern AI technologies, our app bridges the gap between creative expression and AI-powered artistry, offering users a delightful and interactive artistic experience.

AI Art Generator - How it works

Implementation

View and View Model

Figure 2: Example of View, View model, and Model

The architectural framework of our application has been meticulously structured with a distinct separation of concerns through the implementation of a view and view model paradigm. Each individual view file is responsible for orchestrating the realization of UI components and facilitating user interactions, adhering to a modular and organized codebase. Concurrently, each view model file encapsulates the essential functions and data necessary for the respective activities. Additionally, model files have been employed to oversee the management of our proprietary custom data format, further enhancing the coherence and maintainability of our codebase.

User Interface and Experience (UI/UX)

To design our product's user interface and experience (UI/UX), we leveraged Figma and Jetpack Compose. In Figma, we created each screen of our product and connected them to understand the navigation flow and what features we should include. We also employed Jetpack Compose, a modern Android UI toolkit, in Android Studio to create the components easily and reusable. Additionally, Figma provides the code for Jetpack compose, we were able to facilitate the rapid and modular development of UI components.

Totally, we have 4 screens:

Home screen: The user can start their drawing or navigate to their gallery page.
Draw screen: The user can draw in the air. The program will support utilities for the user based on their gesture.

Point up: Draw the lines.

Close the fist: Automatically complete your drawing. The user can do this repeatedly.

Thumbs down: Clean up the screen.

Thumbs up: The program will send a request to the external API to get an AI-generated image.

Gallery screen: The user is able to look through their pictures.
Individual screen: The user can see their pictures bigger, and be able to share their picture or delete it.

This was our initial design and logic at the beginning.

Figure 3: Figma design (The old version)

It later evolved into this with auto layout constraints and prototyping implemented.

Figure 4: Figma Design (The final version)

We used Jetpack Compose to create the frontend design. Here is a snippet of our home page written in Jetpack Compose. The Figma editor has an editor mode which did help in generating some of the design and layout code.

Figure 5: A snippet of the Jetpack Compose code from our application

Gesture Tracking

In the development of our application, we dedicated significant effort to the implementation of a robust Gesture Tracking system. Initially, we explored the possibility of creating our own gesture-tracking solution using OpenCV, coupled with a skin tone calibration system to attempt to isolate the hand and track its center position. Here is the result we got when we implemented this solution along with a snippet of the relevant code.

Figure 6: Testing the hand tracking

Figure 7: A snippet code for the calibration system

This approach worked by isolating HSV values that matched the user’s skin tone. The white sections in the middle camera frame indicated a potential hand target. The problem with this solution is that if other light sources such as ambient matched a similar HSV value, it’d also be considered as part of the hand. To attempt to mitigate this we tried to only consider the largest white blogs on in the screen to be the target hand, but this meant that the solution wouldn’t work if the user’s hand was far away from the camera’s view, causing it to track the ambient light instead. To improve the HSV calibration, we added a calibration page to fine-tune this light sensitivity. After further testing, we decided that we needed a better approach to hand tracking in order to better track the index finger and account for hand gestures.

Then we discovered the powerful capabilities of MediaPipe by Google, which not only simplified our task but also provided reliable gesture recognition across a diverse range of skin tones without the need for calibration procedures. In this section, we discuss the step-by-step process of how we integrated MediaPipe into our Android application, enabling real-time tracking of user hand gestures and hand position on the screen, we outline the seamless transition into 'draw mode, ' where the app captures and stores strokes created by the user’s index finger movement. These strokes are then sent to our SketchRNN microservice, which transforms them into compelling sketches.

Figure 8: How gesture tracking works in our application

This is a screenshot of how we set up the MediaPipe dependency in Android and automatically downloaded the hand tracking & gesture recognition model into the app.

Figure 9: Mediapipe dependency setup + download ML model to the app

Integrate MediaPipe for Gesture Recognition and Positional Tracking

In general, these where the steps we needed to accomplish in order to get hand tracking and gesture recognition.

Integrate MediaPipe into the project by following the official documentation. (https://developers.google.com/mediapipe)
Configure access to the device's camera for real-time input.
Set up MediaPipe's gesture recognition module to track the user's hand gestures and position on the screen.

Essentially, for every camera frame, we trigger a function called handleFrame(). This function runs all the hand tracking via MediaPipe and drawing of sketches via OpenCV. In regards to MediaPipe, on the right is a snippet of how we recorded the points of the index finger when the gesture was Pointing_Up. The snippet in the bottom left shows how we parsed out the gestures and kept track of them.

Figure 10: A code snippet for MediaPipe

Implement Gesture Modes

Pointing Up

Detect when the user is pointing their finger to enter draw mode by accessing the current gesture mode from MediaPipe.
Store the strokes generated by the user with the tip of their index finger during draw mode.

Closed Fist

Send a request to SketchRNN for an autocompleted sketch back

Thumbs Down

Clears the sketch

Thumbs Up

Pre-processing of user sketch
Send sketch to image AI generator and get AI art back

Sketch Autocompletion

To transform user gestures into stunning works of art, the integration of a sketch completion model, such as SketchRNN, is an essential intermediate step. In this section, we delve into the process involved in integrating SketchRNN into a dedicated backend microservice accessible from the Android app. Drawing inspiration from open-source examples, we establish the groundwork for the model's operation. To facilitate communication between the Android app and the SketchRNN microservice, we write HTTP endpoints, creating a bridge for the exchange of gesture strokes and AI-generated sketch strokes. As the strokes are transmitted from the user's gestures to the SketchRNN model, we address the data preprocessing and format conversion requirements to ensure that the model seamlessly interprets and responds to input and that the sketches are properly displayed on the Android device’s screen. Finally, we outline the process of converting these strokes between relative and absolute coordinates, ultimately culminating in the display of the generated sketches within our Android application.

Figure 11: The example of SketchRNN Model

SketchRNN Model Integration

Integrate the SketchRNN model into a backend microservice. We built our microservice with Express based on the following open-source example (https://github.com/MindExMachina/smartgeometry). We forked this repo and updated it to work since it had broken dependencies as well as added additional logic to handle better parsing of the input data and deal with duplicate data points. You can find our version of the SketchRNN server in the code section at the bottom of this article.
Convert the strokes into the appropriate format required by the SketchRNN model.
Ensure the model can receive and process the stroke data sent by the app.
Implement data preprocessing or format conversion to prepare the strokes for input to the model.
Trigger the SketchRNN model to complete the sketch based on the received strokes.

Figure 12: HTTP Post request to get SketchRNN points

HTTP Endpoints for the SketchRNN Microservice

Create a web server for a microservice running the SketchRNN model
Create an HTTP endpoint to send gesture stroke data to the SketchRNN microservice
Create an HTTP endpoint to receive sketch stroke data from the SketchRNN microservice

Conversion of Strokes between Relative and Absolute Coordinates

After receiving the generated strokes from the SketchRNN model, convert them from absolute to relative coordinates if needed.
Ensure that the strokes are properly scaled to be rendered on the screen.

Displaying Generated Sketches within the Android App

Send the generated sketch data, as absolute coordinates, back to the app through an HTTP endpoint in your backend microservices.
Implement the necessary logic in the app to display the completed sketch to the user.

OpenCV

OpenCV is an essential technology in our app's functionality, serving as a critical component for various image-processing tasks. In particular, we harness the power of OpenCV v4.8.0 to draw sketches on the screen, including user-drawn and AI-generated strokes, and for processing images to be better suited to provide high-quality results from our AI image generation models like SketchRNN and Clipdrop. OpenCV is instrumental in merging the autocompleted sketch with the original strokes, ensuring a seamless integration of user input and AI-generated art. Its versatility and robustness in image processing contribute significantly to the fluid and interactive user experience that our app provides.

Here is a screenshot of how we set up the OpenCV SDK for use with our Android Studio project.

Figure 13: OpenCV SDK setup in Android

This is a screenshot of how OpenCV was used to draw the user-drawn points as well as the Sketch RNN points onto the screen. We basically take a list of all points and for every point, we draw a line from the previous point to the next which forms the sketch.

Figure 14: OpenCV draw sketch onto screen + draw SketchRNN points onto the screen

This is a screenshot of how OpenCV was used to implement the following:

Take a drawn sketch & remove the background
Merge SketchRNN sketch into drawn sketch
Convert to grayscale
The absolute threshold to black and white colors
Invert colors (i.e. white -> black, black -> white)

This technique is used to clarify the user-drawn sketch to get better results when we pass it to the AI image generator.

Figure 15: OpenCV remove background, thresholding, convert black and white, invert operations

AI Image Generation

Figure 16: The sample image from Clipdrop API official website

We have utilized the Clipdrop API for generating AI images. This API requires an API key, a prompt, and an image in JPG format as parameters. To facilitate the generation of API requests, a separate utility function has been created. It is crucial for this function to operate successfully that an internet connection is established. Initially, the prompt allowed users to manually describe what they had drawn. However, in the final version, we have integrated it with an image auto-completion feature and configured it to use the selected model from a dropdown menu as additional context. This results in an AI-generated image that better resembles the original sketch.

Figure 17: A simple showcase of AI image generation in our application

This is a snippet of code that shows how the sketch JPEG is sent to the ClipDrop API. It is called a file. The prompt includes the model selected plus some additional description to get better results from the AI image generator.

Figure 18: A code snippet of the utility function for the API request

Once the API request is conducted successfully, the image file will be stored in the phone’s internal storage so that the user can confirm their image in the gallery.

Final Product

Setup Instruction

This video provides setup instructions on how to try the application out for yourself!

AI Art Generator - Setup Instructions

Final Demo

This video showcases how the final project works.

AI Art Generator - Final Demo

Ideas and technologies that weren’t used in the final project

In the initial design phase, our plan involved the integration of the Hugging Face ML model, encompassing image-to-text functionality. The idea behind this was to take the user-drawn sketch and generate a text prediction out of it (e.g. image of a hamburger produces the text hamburger). This text would then be passed to a third-party service to generate the AI art. Subsequently, however, a decision was made to omit these components in the final version as the results would not closely resemble the user's original sketch. To streamline the process, we opted for the utilization of an external API, Clipdrop API, allowing for the direct generation of images when users submit their sketch images. Given the seamless functionality of our current implementation, there is no longer a necessity to leverage the Hugging Face ML model and image-to-text mechanisms for AI image generation.

Figure 19: A sample GIF image of iSketchNFill

Additionally, we also excluded iSketchNFill. We assumed that iSketchNFill would support the autocompletion of the user's sketch. Unfortunately, the feature of iSketchNFill did not perform sketch autocompletion but rather morphed an image to the boundaries of a drawn sketch. This was not the solution we were looking for. The iSketchNFill does not generate multiple autocompleted sketches and the program seems to be more like editing the preset sketches rather than autocompletion, thus the reason why it was not used.

Conclusion

In summary, we aim to merge the worlds of technology and art through our project. Specifically, by utilizing gesture recognition technology, we will enable our users to create artwork through computer vision technology. In addition, we believe that our autocomplete art tracing functionality will streamline the process of art creation and improve user experience. Consequently, we would like to conclude that our application certainly allows users to make the work of art generation more accessible to everyone.

Code

Credits

Thanks to Nono Martínez Alonso and Jose Luis García del Castillo y López.

Comments

Please log in or sign up to comment.

Awards

Popular Vote - Finalist

OpenCV AI Competition 2023

AI Gesture Art