I want to put cutting-edge AI technology into practical use in business
Want to know the latest trends in object detection
I want to know the applicability of object detection in various industries such as manufacturing
Time needed to finish reading this article
less than 10 minutes
Dramatic progress in "object detection"
hello! I'm Tsucchi from AI Research & Innovation Hub. The theme of this blog is "object detection". Over the past few years, the technology of object detection has advanced tremendously with the power of deep learning, dramatically improving accuracy. In this blog, we will deliver points from overview to application examples and implementation about object detection that can be applied to various fields such as manufacturing, construction, civil engineering, and medicine!
table of contents
What is object detection?
Where can you use object detection?
Categorization of object detection methods
State-of-the-art object detection model implementation
Summary
1. What is object detection?
In a nutshell, object detection is a technique for detecting objects in images.
arXiv:1512.02325v5 Source: “SSD: Single Shot MultiBox Detector” Caption: “2 The Single Shot Detector (SSD) Fig. 1: SSD framework.” https://arxiv.org/pdf/1512.02325.pdf
When an image is input, it is a technology that attaches a bounding Box to indicate where in the image and what has been moved. (The bounding Box is the red line or blue line here.)
2. Where can you use object detection?
I don't know how you felt when you learned about object detection, but if it was the first time I heard the above story about object detection in general, where could it be used? I wonder. I wonder if I should see such things with my own eyes (laughs)
However, object detection is actually an amazing technology. I think it is no exaggeration to say that the words AI and artificial intelligence were created because of the development of object detection technology. This is because it has a wide range of application areas. In fact, there are examples in various industries such as manufacturing, construction, civil engineering, and medicine as fields where it is applied.
For example, in the manufacturing industry, there is an approach to substitute object detection instead of visually confirming defective parts. In the construction industry, it is possible to replace the task of the site supervisor checking whether a helmet is worn or not with object detection. Parts can be replaced with object detection. All of them are efforts to automate the work that was visually confirmed.
3. CATEGORY OF OBJECT DETECTION METHOD
If you are doing machine learning, you may have heard about it, but there are R-CNN, YOLO, SSD, etc. for object detection methods. Relatively new ones include Pelee and M2Det. I think that those who know this area are considerable mania (lol)
Example of object detection by Pelee (seethis blogfor details)
*We are working on a lot of the latest object detection models, and we have written several blogs about Pelee before, so please take a look! (Pelee related articles are posted at the end of this blog)
However, I think that categorizing object detection methods is rather difficult. Currently, the three patterns of R-CNN, YOLO, and SSD are often used for object detection methods, and the papers circulating in the world are derived from these.
The R-CNN system is an architecture developed by Microsoft, and R-CNN, Fast R-CNN, Faster R-CNN, etc. are famous. Faster R-CNN achieves 10 times faster learning efficiency by making Fast R-CNN end-to-end. By the way, the period until Faster R-CNN is announced after Fast R-CNN is several months. At that speed, technology in the AI field is developing.
End-to-end in object detection is a model that learns the relationship between input and output with one model without combining multiple models. In R-CNN and Fast R-CNN, learning was performed according to each purpose. Each purpose includes object position estimation (bounding Box attachment), class classification (object classification), and so on.
RPN (Refion Proposal Network) that replaces candidate region detection processing with selective search
arXiv:1506.01497v3 Source: “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks” Caption: “Figure 2: Faster R-CNN is a single, unified network for object detection. The RPN module serves as the 'attention' of this unified network.” https://arxiv.org/pdf/1506.01497.pdf
Although it is a serial processing configuration in which "identification" is performed after "detection", there was a problem that the processing speed was slow due to this. Therefore, in YOLO (You Only Look Once), we reduced image recognition to a regression problem and realized detection and identification at the same time. As a result, the processing is faster and the entire image can be seen, so the background is no longer falsely detected. Even now, YOLO continues to be improved, and the latest source code has been uploaded up to version 3. However, it is not good at detecting small objects, so the accuracy is not high compared to others.
YOLO, which treats object detection as a regression problem
arXiv:1506.02640v5 Source: “You Only Look Once: Unified, Real-Time Object Detection” Caption: “Figure 2: The Model.” https://arxiv.org/pdf/1506.02640.pdf
Since YOLO continues to improve, it is not possible to make a single comparison, but in terms of accuracy and processing speed, SSD is positioned between Faster R-CNN and YOLO.
Many of the source codes are open to the public, so please try them when you have time.
4. Latest Object Detection Model Implementation
We also implement the latest papers. There was also a period where the latest architecture was built over several months. This time, based on that experience, I will introduce M2Det adopted at AAAI2019.
M2Det is an object detection model developed by a team from Peking University, Alibaba and Temple University. The code implemented by the M2Det paper author can be found here (https://github.com/qijiezhao/M2Det), so please give it a try. The implementation is done in Pytorch. The table below shows the accuracy (AP) and processing speed of M2Det, and you can see that M2Det has overwhelmingly high performance in both processing speed and accuracy.
Accuracy (AP) and processing speed of M2Det compared to other methods
arXiv:1811.04533v3 Source: “M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network” Caption: “Figure 5: Speed (ms) vs. accuracy (mAP) on COCO test-dev.” https://arxiv.org/pdf/1811.04533.pdf
As a network, feature extraction is performed by a simple feature extractor (VGG, ResNet, etc.) called Backbone Network, sent to Multi-Level Feature Pyramid Network (MLFPN), and finally object detection and object recognition are performed by Prediction Layer. , and so on.
The fun of M2Det lies in the multiple pyramid structures inside the architecture called Multi-Level Feature Pyramid Network (MLFPN). MLFPN consists of three architectures. ・Feature Fusion Module (FFM) ・Thinned U-Shape Module (TUM) ・Scale-wise Future Aggregation Module (SFAM)
Overview of M2Det architecture
Source: “M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network” Caption: “Figure 2: An overview of the proposed M2Det(320 × 320).” https://arxiv.org/pdf/1811.04533.pdf
I won't go into details about the network here, but it's a really deep architecture. Please refer to the current paper.
5. Summary
This time, we have delivered an overview of object detection, an application example, and finally an implementation that you can easily try.
Learning a new network properly at the mathematical level is really a lot of learning. However, it depends on how much you need to understand at the formula level. It is also important to “experience” rather than “understand”.
Various excellent source codes have been released, so why not experience it by implementing it once?
Sources of AI papers featured in this article / Reference Lists
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. “SSD: Single Shot MultiBox Detector”. arXiv:1512.02325v5, https://arxiv.org/pdf/1512.02325.pdf
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks”. arXiv:1506.01497v3, https://arxiv.org/pdf/1506.01497.pdf
Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi. “You Only Look Once: Unified, Real-Time Object Detection”. arXiv:1506.02640v5, https://arxiv.org/pdf/1506.02640.pdf
Qijie Zhao. “M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network” Web, Retrieved from https://github.com/qijiezhao/M2Det
Qijie Zhao, Tao Sheng, Yongtao Wang, Zhi Tang, Ying Chen, Ling Cai and Haibin Ling. "M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network". https://arxiv.org/pdf/1811.04533.pdf
Macnica 's AI Research & Innovation Hub (ARIH) provides knowledge that combines the most appropriate AI technologies based on evaluations of cutting-edge AI research, investigations, and implementation, and works to lead companies to optimal solutions to their problems. Please see below for details.