Recommended for

  • This article is recommended for people interested in the different approaches, with emphasis on deep learning-based ones, to perform abandoned object detection.

Expected reading duration

10 minutes

Introduction

Welcome to second part of our Object Detection blog series!
I’m Al of AI Research and Innovation Hub (ARIH). In this blog, I will briefly introduce "Deep Learning Approaches for Abandoned Object Detection".

Parts
1.  Introduction to Object Detection and ARIH’s Use Cases
2.  Deep Learning Approaches for Abandoned Object Detection
3.  AI in Soccer: Smart Analytics from Gameplay Videos
4.  Common challenges and possible solutions in Object Detection

Video surveillance automation for monitoring public spaces and increasing security measures has gained popularity over the years. One common application is abandoned object detection (AOD) which focuses on the early identification of unattended and static objects. With AOD, suspicious or lost items are automatically flagged so security personnel can be immediately alerted for appropriate actions.

In computer vision literature, abandoned objects are further defined in spatio-temporal terms. Specifically, the owner must stand or sit near the object for 30 consecutive seconds, usually within three meters. Otherwise, an object will be declared abandoned.

出典:Abandoned Object Detection in Video-Surveillance:Survey and Comparison
キャプション:Figure 1. Example of abandoned luggage for the AVSS_AB_2007 sequence (http://www.eecs.qmul.
ac.uk/~andrea/avss2007_d.html).
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6308643/pdf/sensors-18-04290.pdf

Figure 1. Simple demonstration of abandoned object detection in a train platform

Approaches to AOD

Conventional approaches to AOD mainly focus on foreground segmentation and stationary foreground detection to generate candidate objects (such as the ones shown in Figure 2). These background subtraction approaches, however, are not robust to complex scenarios and other visual factors such as low image quality, and illumination changes leading to instances of false or missed alarms.

出典:Abandoned Object Detection in Video-Surveillance:Survey and Comparison
キャプション:Figure 4. Static foreground detection computation using a simple persistence approach for Frame 3200
of the sequence in AVSS_AB_easy_2007 (http://www.eecs.qmul.ac.uk/~andrea/avss2007_d.html) and
Frame 2000 of the sequence ABODA_video3 (http://imp.iis.sinica.edu.tw/ABODA).
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6308643/pdf/sensors-18-04290.pdf

Figure 2. (left) fSample rame, (right) static foreground to generate regions of interest (ROI)

To overcome these drawbacks, some researchers have explored the additional use of deep learning to improve the effectiveness and accuracy of previous approaches.

As shown in Figure 3, one strategy is by adding a Convolutional Neural Network (CNN) classifier (i.e. Inceptionv3, GoogleNet) stage to validate the hypothesis generated in the static object detection phase. In this arrangement, the CNN is deployed for item verification of suspected abandoned objects (i.e. less false positives), or for further confirmation of real abandonment (i.e. less false positives if there is no owner).

出典:Real-Time Deep Learning Method for Abandoned Luggage Detection in Video (Referential material)
https://arxiv.org/pdf/1803.01160.pdf

Figure 3. Previous approach in AOD augmented with deep learning (CNN) for hypothesis validation

Another and more recent approach are through the use of more powerful CNN frameworks for people and object detection. Some famous models include the R-CNN and YOLO models, to generate class labels and bounding boxes, which would be used in the logic design of further frame processing i.e. calculating distances over time, and the application of spatio-temporal rule of abandonment.

出典:Abandoned Object Detection in Video-Surveillance:Survey and Comparison (Referential material)
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6308643/pdf/sensors-18-04290.pdf

Figure 4. Object detection with computer vision approach

Sample Use Cases

Let's go through three different use cases covering the different deep learning approaches to abandoned object detection mentioned above .

In Contractor, et. Al (2018), using Inceptionv3 pretrained on ImageNet, the last layer was trained with annotated frame data to differentiate between scenes with and without abandoned luggage. In this simple approach, the model learns the background and can therefore detect abandoned luggage as an anomalous event, which can be useful for creating models specific to location and the type of event.

出典:CNNs for Surveillance Footage Scene Classification CS 231n Project (Referential material)
キャプション:Figure 3. Example image 1 from the i-LIDS dataset, Figure 4. Example image 2 from the i-LIDS dataset, Figure 8. Sample 1 of undetected abandoned luggage, Figure 10. Sample 3 of undetected abandoned luggage
https://arxiv.org/pdf/1809.02766.pdf

Figure 5.

On the other hand, in Smeureanu & Ionescu (2018), the two-step approach as described in Figure 3 was used, where a region of interest (ROI) was identified through static object detection, then using those as input to two successive CNNs, so-called ‘Cascade of Classifiers’.

As in Figure 5, the first CNN is for verifying that the detected static object is a luggage item, while the second is for verifying true abandonment. In particular, GoogleNet pre-trained on ImageNet was used to train the last layers of each CNN. In this specific approach, generated data including images of luggage and people holding or standing near the lugagge were used to train the CNNs.

出典:Real-Time Deep Learning Method for Abandoned Luggage Detection in Video (Referential material)
キャプション:Figure 1. Static object detection (SOD) pipeline used in the first stage of our approach for abandoned luggage detection., Figure 3. Examples of abandoned luggage items detected by our approach based on SOD and CCNN trained with generated samples.
https://arxiv.org/pdf/1803.01160.pdf

Figure 6.

A combination of foreground estimation to detect static regions as candidates of abandonment, and use of Faster RCNN for person and object detection, and re-identification was used in Yang et. Al. (2018). Further frame processing was performed to calculate the average distance between the detected static object and detected persons nearby; spatio-temporal rules were then applied to perform owner labeling and detect abandonment event. This was further extended to performing event analysis such that they can classify if an object being tracked was (1) taken by owner, (2) moved by un-owner, or (3) stolen by someone; sample results are as shown in Figure 7 below.

出典:Security Event Recognition for Visual Surveillance
キャプション:Fig. 3: An example of experimental results on SERD.
https://arxiv.org/pdf/1810.11348.pdf

Figure 7.

Summary

In this blog, we went through the different deep learning approaches and use cases in performing abandoned object detection. Specifically, we learned that CNNs can be used to either augment the effectiveness of traditional static object detection approach, or for directly using class label and bounding box output for further frame processing to implement the abandonment logic.

For more information about the use cases, please contact our team.

 

■ 本ページでご紹介した内容・論文の出典元/References

Elena Luna, Juan Carlos San Miguel, Diego Ortego, José María Martínez,  ”Abandoned Object Detection in Video-Surveillance:Survey and Comparison”, Figure 1. Example of abandoned luggage for the AVSS_AB_2007 sequence (http://www.eecs.qmul.ac.uk/~andrea/avss2007_d.html)., Figure 4. Static foreground detection computation using a simple persistence approach for Frame 3200 of the sequence in AVSS_AB_easy_2007 (http://www.eecs.qmul.ac.uk/~andrea/avss2007_d.html) and Frame 2000 of the sequence ABODA_video3 (http://imp.iis.sinica.edu.tw/ABODA)., https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6308643/pdf/sensors-18-04290.pdf

Sorina Smeureanu, Radu Tudor Ionescu, "Real-Time Deep Learning Method for Abandoned Luggage Detection in Video", Figure 1. Static object detection (SOD) pipeline used in the first stage of our approach for abandoned luggage detection., Figure 3. Examples of abandoned luggage items detected by our approach based on SOD and CCNN trained with generated samples., arXiv:1803.01160v3, https://arxiv.org/pdf/1803.01160.pdf

Utkarsh Contractor, Chinmayi Dixit, Deepti Mahajan, ”CNNs for Surveillance Footage Scene Classification CS 231n Project”, Figure 3. Example image 1 from the i-LIDS dataset, Figure 4. Example image 2 from the i-LIDS dataset, Figure 8. Sample 1 of undetected abandoned luggage, Figure 10. Sample 3 of undetected abandoned luggage, arXiv:1809.02766v1, https://arxiv.org/pdf/1809.02766.pdf

Michael Ying Yang, Wentong Liao, Chun Yang, Yanpeng Cao, ”Security Event Recognition for Visual Surveillance”, Fig. 3: An example of experimental results on SERD., arXiv:1810.11348v1,  https://arxiv.org/pdf/1810.11348.pdf

 

マクニカのARIH(AI Research & InnovationHub)では、最先端のAI研究・調査・実装による評価をした上で最もふさわしいAI技術を組み合わせた知見を提供し、企業課題に対する最適解に導く活動をしています。
詳細は下記よりご覧ください。

関連記事

*テックブログAI女子部*
Introduction to Object Detection and ARIH's Use Cases | Object Detection Blog Series (Part 1)

*テックブログAI女子部*
AI in Sports: Soccer Analytics from Gameplay Videos | Object Detective Blog Series (Part 3)

*テックブログAI女子部*
Common challenges and possible solutions in Object Detection | Object Detective Blog Series (Part 4)