The project is attempting to address the problem of tracking people on escalators in retail spaces. Computer Vision is seeing a boom of algorithms / models / solutions when it comes to people detection. However, while most options cover front / side facing angles with top / overhead angles, there is very little data to tackle the problem of tracking people on escalators in a retail environment. While the environment offers favorable conditions (such as indoor areas with controlled lighting in most situations) there are also challenges: featureless white walls in vicinity, very restrictive control over the placement and size of camera equipment, the perspective affected by large escalator slopes amplifying occlusions, no control over what people wear (e.g. either highly reflective or highly light absorbent clothing), LED displays for advertisement of interactive content, etc.
The proposed solution involves building a system that takes advantage of the Jetson AGX Orin Developer Kit for its impressive 275 TOPS and 2048-core NVIDIA Ampere architecture GPU with 64 Tensor Cores which allow not only fast at the edge deployment as well the potential to gather datasets and process/train on the device on site in a small form factor with very little infrastructure as a standalone all in one AI solution. Additionally using ZED Cam features such as active stereo (addressing featureless areas), being able to cover depth at large distances to provide data for the Orin.It is an opportunity to survey SOTA object detection models (e.g. YOLOv8, ViT (EfficientVit, NanoOWL, etc) as well as Generative AI models (such as StableDiffusion) to drive beautiful AI generated graphics. As opposed to prototypes, the solution has the chance to be part of a live project as the input of a large (10x10m) generative art installation in a highly visible retail space on famous Regent St (Oxford Circus area) in Central London. The object detection model can also be used in other industries (for example safety in areas where ensuring top down views are helpful (e.g. cranes on build sites, rescue helicopters, etc.))
In summary, the steps to solve this would include:
- Prototype and test performance of a point cloud solution using traditional computer vision techniques (e.g. DBSCAN clustering)
- Train and tune models for overhead person detection (YoloV8, ViTs, etc.) taking into account changing scales, occlusions, etc.
- Document prototyping results
- Fine tune the winning strategy
- Document the final solution
Follow the rest of the story on Github where precompiled Jetson CUDA accelerated Open3D / Torch / TorchVision wheels as well as Yolov8 a pretrained model and dataset are available.For more info on the in-situ installed project check out the Hirsch & Mann x variable.io Making of video:
It includes short screencasts of older NVIDIA RTX PC software, while the github repo includes Jetson specific code.
Full commerical project credits:
Design:
Hirsch & Mann, London, United Kingdom Variable.io, London, United Kingdom
Project Team:
Hirsch & Mann: Daniel Hirschmann (Experiential & Technical Director) Joanne Harik (Creative Director) Adam Ray Braun (System Architect) Martin White (Project Manager) George Profenza (Computer Vision Specialist)Variable.io: Marcin Ignac (Generative System Director) Damien Seguin (Technical Lead)
Comments