The keyboard and mouse PC input devices are really good for certain interaction with the GUI of our computer, but not for all of them.
Hands are the de facto human tool to interact with the environment.
The idea is to allow people to use their hands as an input device to interact with the GUI of their computer in a natural way.
This will provide a more effective input device for certain tasks or actions in a program (for instance to change the orientation of an object in a 3D computer graphic program).
It may also be a more accessible input device for certain people, like the elderly. Computer will be less intimidating to them if the usage is more intuitive.
See-through effect
The idea of this project is to put the PC monitor between the user and their hands, and to display on the PC monitor a virtual 3D environment that reproduce the hands of the user.
This is similar to what can be seen with VR headset, but without the discomfort to have to wear an headset, and with the arms laying on the desk in our day-to-day natural working-at-desk pose.
We just have our monitor placed in a more horizontal way, and our hands behind it. But the monitor displays the hands, and also the real objects (dice, sheet, ...) put behind the monitor.
By manipulating these objects, the user will trigger action in the GUI of the program used on the computer.
The idea is to have a see-through effect on the screen, as if the monitor was transparent, but with the addition of information from the program so we can create a mixed reality GUI for the running program.
How is it supposed to work ?
We have a traditionnal PC or laptop setup with a monitor.
A webcam is pointing at the user to track his face and iris.
This is used to precisely detect the point of view of the user, so we can display on the monitor a scene that is altered when the user moves his head, to keep the see-through illusion.
Another wecam records what happends behind the monitor.
It is used to track the hands of the user, and the objects that have been put there (dice, sheet).
This is used to display these elements on the monitor at their real relative position, to not break the see-trough effect and to allow the user to easily take and manipulate the objects.
But what is displayed are not the real object, but 3D model that match the interaction these objects are there for in the the used program.
All of this require a good amount of processing power to accelerate the multiple real time tracking involved in the proposed solution :
- face and iris tracking,
- hands tracking,
- object tracking.
The acceleration allowed by the AMD Ryzen XDNA seems to be a good candidate to :
- obtain the best accurracy,
- achieve a sufficiently high framerate to be able to post-process the key points and reduce the pose estimation jitter,
- and keep all this to a reduced latency to not break the see-through effect.
First step is to install Linux on the UM790 mini PC.
The more up to date step-by-step tutorial I have found to install Linux on the UM790 is here :
https://github.com/Xilinx/mlir-aie/blob/main/docs/buildHostLin.md
This will also introduce you to the MLIR-AIE tool from AMD, a low level tool that allow to take advantage of the XDNA.
2) Install UPBGE : Blender Game EngineUPBGE is an open-source, 3D game engine forked from the old Blender Game Engine and deployed with Blender itself. This unified workflow is its main strength, as you can make your game from start to finish without leaving Blender.
This is a specific version of Blender that allow to run an included game engine. This game engine allow to render a scene with interactive control.
To install UPBGE, download the archive, then uncompress it.
$ wget https://github.com/UPBGE/upbge/releases/download/v0.36.1/upbge-0.36.1-linux-x86_64.tar.xz
$ tar Jxvf upbge-0.36.1-linux-x86_64.tar.xz
You can now launch UPBGE this way :
$ upbge-0.36.1-linux-x86_64/blender
3) Install MediapipeThis project use Mediapipe for face and Iris tracking :
- https://ai.google.dev/edge/mediapipe/solutions/vision/face_landmarker
- https://research.google/blog/mediapipe-iris-real-time-iris-tracking-depth-estimation/
The project also use the Mediapipe for Hand tracking :
The hand tracking is done, but the rendering of the corresponding 3D hand meshes in the scene is not working correctly enough for now, so I have not included it in the project.
You need to install Mediape so it is available for the python of UPBGE.
This can be done in the following way :
$ cd upbge-0.36.1-linux-x86_64
$ 3.6/python/bin/python3.10 -m ensurepip
$ 3.6/python/bin/python3.10 -m pip install --upgrade pip
$ 3.6/python/bin/python3.10 -m pip install mediapipe
$ 3.6/python/bin/python3.10 -m pip install opencv-python
4) Get the ThrooD project filesYou can get the last version of the files for this project from the following github repository :
git clone https://github.com/BrunoJJE/throod.git
5) Run the ThrooD blender projectLaunch UPBGE.
Open the "eyes.blend" project file.
Ensure you have a webcam attached on your monitor and facing you.
In the Render part of the Blender interface, launch the standalone rendering (in full screen).
This will display a scene made of cubes with a view point that is changed according to the position of your eyes. The project also track your hands, but this is not used in the rendering for now (it is supposed to be done on another webcam than the one facing the head of the user).
Use the ESC key to quit.
That the only things that is available for now.
As the capacity of the XDNA of the Ryzen AI CPU is not used yet, the performances is poor.
ConclusionThe project was a bit too ambitious for the given time frame of the AMD Pervasive AI Developer Contest.
My decision to use Linux as an operating system on the AMD Ryzen 9 PC for this development has not helped.
The Ryzen AI SW is indeed not available on Linux yet.
The other AMD tool available to use the XDNA is not yet compatible with Linux for the required ONNX inference acceleration required for this project.
https://riallto.ai/notebooks/5_1_pytorch_onnx_inference.html
This is not currently supported on the Linux release of Riallto.
But the ThrooD idea seems to have a great potential, and I plan to continue the development as soon as the AMD tools required to use the XDNA of the Ryzen AI CPU will be available on Linux.
Comments