Artificial vision system for tracking, locating and analyzing customers within an establishment applied to the retail industry
Problem to SolveIn the retail industry, considerable challenges persist related to the effective distribution of products and a full understanding of how customers interact in store spaces. These challenges not only negatively impact the economic results of companies, but also affect the quality of the customer's shopping experience.
Proposed SolutionTo address these challenges, we propose the development of an artificial vision system. This system leverages security cameras already installed in retail establishments to carry out accurate tracking of customers, locate them and analyze their behavior patterns in depth. As a result, detailed heat maps are generated that highlight the most trafficked areas in the store and a full understanding of customer preferences and movements is obtained. This information will be used to optimize product placement and marketing strategies, including improving planograms. Ultimately, this solution will improve the shopping experience for customers and increase the profitability of retail companies.
Below we propose an architecture for the artificial vision system in order to solve the problem posed, Also in the development of this project libraries such as OpenCV, TensorRT, PyTorch and TensorFlow will be used. The code will be developed in Python and C++ depending on the performance that is needed.
The different submodules of single camera tracking will be containerized in Docker and orchestrated with Helm:
This module focuses on the acquisition of visual data through a single camera located in the establishment. It includes video capture, detection of people in that video, tracking their movement over time, extraction of appearance-based features omitting people's biometric information and camera calibration to translate pixel coordinates to physical coordinates in the real world. In summary, this module is responsible for collecting data from a single camera and preparing it for analysis.
1.1. Video capture:In this crucial step, video sequences are acquired using a single camera installed in the establishment. This is achieved through the use of a video capture device, such as a security camera, in order to be used in a subsequent process, allowing the detection and tracking of people in the store environment.
1.2. Pedestrian detection:Using computer vision algorithms, people are identified and located in each video frame. This involves the detection of people.
1.3. Tracking pedestrian:At this stage, the position of people will be tracked, an identity will be assigned to each person and their movement will be followed through successive video frames while they are within the camera's field of view.
1.4. Feature extraction:At this stage, relevant attributes of detected people are extracted, such as landmarks on the body, appearance features for further analysis and global tracking.
1.5. Camera calibration:Camera calibration involves determining the intrinsic and extrinsic parameters of the camera, such as focal length and position in 3D space, to convert pixel coordinates into real-world coordinates.
1.6. Pixel to physical coordinates:Through camera calibration and pixel location of each person, physical coordinates in the real world will be obtained, allowing distances and locations to be measured accurately.
2. Multi camera tracking:The second module deals with the coordination, re-identification and tracking of people as they move between multiple cameras within the establishment. This involves comparing appearance features extracted from different cameras for the same person and the spatio-temporal relationship to determine a global id for each person.
2.1. Feature similarity:The similarity between features extracted from people detected in different cameras is calculated, using feature comparison techniques such as visual descriptors.
2.2. Spatio-temporal association:A relationship is established between people detections in different cameras, considering their location in space and their temporal evolution, to reliably identify a person in multiple camera views.
2.3. Re-identification pedestrian:When a person moves from one camera to another, re-identification algorithms are employed to ensure that the same person is correctly recognized in both cameras. This involves the use of specific features and re-identification models.
3. Behaviour analytics:The third module focuses on the analysis of data collected from all cameras and previous tracking. Its main objective is to generate a "heat map" that visualizes the areas of highest activity within the store. This is achieved by analyzing customer behavior patterns, such as their movements and location preferences within the establishment. The result is a graphical representation that highlights the most trafficked areas, providing valuable information to improve product placement and customer experience.
3.1. Heatmap store:Heat maps are generated based on information collected from all cameras and people tracking. These maps visually represent the most trafficked areas of the store, indicating the density of people in different locations and times, allowing for a detailed understanding of customer behavior patterns within the establishment.
Comments