A video analytic pipeline for real-time fare evasion detection is proposed in this project.
Important Links
IntroductionFare evasion is the act of deliberately not paying for usage of public transportation. It is a common issue in major urban centers like New York and Toronto. In recent years, Fare evasion has been an increasing major problem for New York City Transit authority, resulting in massive loss of revenue. According to the report from 2018 year:
Estimated revenue lost to fare evasion in 2018 is $215 M
- Estimated revenue lost to fare evasion in 2018 is $215 M
- $96 M in subways and $119 M in buses
- The estimated uncollected revenue is an increase of $110 M over 2015
Similar reports from Toronto Transit Commission found that they lost about $61 million to fare evasion in 2018 alone. [here]
Major challenges:Data Collection
- Data Collection
- Enforcement
- Racial socio-economic bias
Current process of data collection for estimating fare evasion is far from perfect and relies on human generated data and sampling techniques.
MTA fare evasion data collection (source). There have been reports that survey teams are not able to complete the data collection in a satisfactory fashion. [Source]
An automated system that collects spatiotemporal (for each station with time) fare evasion data in realtime can be really useful for:
- Fare evasion data will have a precise time granularity as opposed to sampling.
- Fare evasion data will have fine spatial granularity (station entrance level) unlike system-wide estimates by current method.
- Accurate estimates will enable MTA to do better cost benefit analysis and decide upon their next steps in a data driven fashion.
- Spatio-temporal nature of data would enable efficient staff scheduling to prevent fare evasion.
- It will cut the cost of manual data collection
The intervention policies so far have mostly focused on physically stopping fare evasion by deploying the police officers to stations, who can make arrests or charge court hearings for the offenders that they are able to catch red handed. Given the scale of public transit in NY, deploying officers even to small fraction of stations can be costly. Some analysts think that the expenditure related to these interventions outweighs the projected monetary gains [here].
If an automated real time fare evasion detection system is deployed, an improved enforcement strategy can build on top of it. Examples include:
- Realtime messages/pictures to cops in vicinity
- Displaying violators on screens in real-time for deterrence (as done in Toronto)
There are some accounts which indicate that fare evaders are more likely to be criminals, hence supporting strict enforcement of fare evasion. According to William Bratton (Ex New York City Police Commissioner), when fare evaders were stopped in 2014, they found that one out of every seven people were wanted on a warrant and one out of every 21 were carrying weapons [here]. On the contrary, there is criticism against the crackdown of fare evasion by police, terming it as criminalizing poverty. [Here] Data shows that most fare evasion occurs in poor neighborhoods. Furthermore, people of color make the overwhelming majority of those arrested for the offense [here]. Socio-economic, racial and demographic correlations in fare evasion make it a very sensitive problem, and any automated solution should be audited for algorithmic bias because systems that learn from biased data can exhibit these biases. I will discuss some tips to prevent your system from capturing these biases.
Computer Vision Problem FramingAbout half of the subway fare evasion is composed of jumping over or ducking under the turnstile. We will build a video analytics system to identify the fare evasion of this form. This system should:
Detect people in each frame
- Detect people in each frame
- Track people in across frame
- Classify action of person in each frame
- Perform action recognition across frames
If a person walks to a turnstile and jumps over it, we classify this as fare evasion. If he walks through it, then we classify him as a normal passenger.
AlgorithmIn an ideal setting and access to a large volume of labelled video sequences I would preferably design a end-to-end deep learning pipeline with CNN backbone and LSTM head for temporal modelling and action recognition. But I had no access to data, so I broke the pipeline down to separate tasks and solved each independently.
Person Detection:This is usually done using object detection, and openvino luckily provides a wide array of pre-trained and optimized models for object detection which can be used off the shelf with good accuracy and small inference time. I used a pre-trained mobilenet ssd detector from the openvino model zoo.
Person Pose Estimation:Pose estimation returns a set of key points that define the posture of a person. We use pose estimation as features to classify the posture of the person as sanding or walking. Once again I found an off the shelf pre-trained model for pose estimation in openvino model zoo.
Pose Classification:We are classifying each person as jumping or walking based on pose features from the previous stage. In pose keypoint space these two pose states are very easily separable. After training a random forest model of a very small dataset built using google images, I was able to get near perfect accuracy on the validation set. Sklearn was used for this. Example training images are shown below
Having pose key points as features for the classifier rather than the raw image has an additional benefit, that the classifier does not have access to raw images and will not be able to learn the spurious correlations with race and gender etc. Thus keypoint bottle neck ensures that strictly pose information is used for fare evasion detection and there is less danger for the model to exhibit such biases.
Person Tracker:Above components give predictions at frame level. We need to associate people across frames so that we can track their actions across the video and detect fare evasion. I used a person tracker that corresponds detections between frames based on greedy assignment and IOU cost. Similar to iou_tracker. This assigns an ID to each person which is retained across frames.
State machine:This is where we code our fare evasion detection logic. Fare evasion is defined by sequence: walk -> jump -> walk. Furthermore, this state machine is also used for temporal smoothing of states by imposing a transition threshold; the classifier should predict the same state for n-frames for a successful transition. This decreases false positives by removing random errors.
Coarse Cost Benefit EstimationThe system currently runs at 2 fps (frames per second) on NCS2 (Neural Compute Stick 2). Given the limited movement possible at small times, I believe that running the system at about 4 fps should be enough for accurate object tracking and capturing fare evasions. And running the system upto 5 fps on NC2 seems like an easy engineering ask which can be achieved by experiments like:
- Design smaller networks (smaller backbones like squeezenet etc)
- Using single model for detection and pose estimation (bottom up approach)
- Further quantization of the model
Rough estimates for installation of this solution are shown below. Note that these estimates ignore the software cost associated with the product (edge software, server side and front end) because it depends on many design factors. These rough estimates only meant to emphasize the scale of investment vs possible benefits.
Assuming that 5 systems are deployed on each of 472 subway stations in New York.
Assuming that intervention policy designed on top of this system is able to reduce the subway fare evasion by 10 %. With 96M $ lost to fare evasion on an annual basis, this solution will save 0.1 x 96 /12 = 0.8M $ in a month. Thus giving full return of investment within the first month, and saving 9.6M $ annually.
Possible improvements- Using the bottom up approach for joint person localization and pose detection to reduce compute.
- Performance can be improved by retraining/fine-tuning the entire model pipeline on labelled MTA surveillance data. This is because every system generates data distribution with some characteristic properties. For example, the patterns of occlusion, angle of capture and lighting conditions will be unique for MTA surveillance data. Using similar data in training should be able to enhance the performance.
I believe that using an automated fare evasion detection pipeline like the one proposed here can improve the safety of subway stations, encourage law abiding behavior in citizens and save millions for the civic authorities.
Openvino toolkit timestamp:
I received email that my product signup was complete at Aug 17, 2020, 3:49 PM at um367@nyu.edu.
Comments