物体検出とDeep Learning　～入門から応用まで～ - スマートマニュファクチャリング

This article is recommended for those who

I want to put cutting-edge AI technology into practical use in business
Want to know the latest trends in object detection
I want to know the applicability of object detection in various industries such as manufacturing

Time needed to finish reading this article

10分未満

Dramatic progress in "object detection"

こんにちは！AI Research & Innovation Hubのつっちーです。
今回のブログテーマは「物体検出」です。
物体検出という技術はここ数年、Deep Learningの力によりものすごい勢いで進歩し劇的に精度が向上しています。
本ブログでは、製造業・建設業・土木業・医療など様々な分野に応用可能な物体検出について、概要から応用事例、実装までのポイントをお届けします！

目次
1.　物体検出って何？
2.　どこに物体検出なんて使えるの？
3.　物体検出手法のカテゴリ分け
4.　最新の物体検出モデル実装
5.　まとめ

1. What is object detection?

In a nutshell, object detection is a technique for detecting objects in images.

arXiv:1512.02325v5
出典： “ SSD: Single Shot MultiBox Detector ”
キャプション：“ 2 The Single Shot Detector (SSD) Fig. 1: SSD framework.”
https://arxiv.org/pdf/1512.02325.pdf

画像を入力したときに、画像のどこに何が移っているかのバウンディングボックスをつけて出力する技術で、人間の目で確認していた作業を代替する技術といえるでしょう。
（バウンディングボックスはここでいう、赤い線や青い線です。）

2. Where can you use object detection?

I don't know how you felt when you learned about object detection, but if it was the first time I heard the above story about object detection in general, where could it be used? I wonder.
I wonder if I should see such things with my own eyes (laughs)

However, object detection is actually an amazing technology.
I think it is no exaggeration to say that the words AI and artificial intelligence were created because of the development of object detection technology.
This is because it has a wide range of application areas.
In fact, there are examples in various industries such as manufacturing, construction, civil engineering, and medicine as fields where it is applied.

For example, in the manufacturing industry, there is an approach to substitute object detection instead of visually confirming defective parts.
In the construction industry, it is possible to replace the task of the site supervisor checking whether a helmet is worn or not with object detection. Parts can be replaced with object detection.
All of them are efforts to automate the work that was visually confirmed.

3. CATEGORY OF OBJECT DETECTION METHOD

機械学習をやっている方なら聞いたことがあるかと思いますが、物体検出手法にはR-CNN、YOLO、SSDなどがあります。
比較的新しいものだと、PeleeやM2Detなどがありますね。
この辺を知っている方は相当なマニアだと思います（笑）

Peleeによる物体検出例（詳細はこちらのブログを参照）

※弊社では、最新の物体検出モデルに多く取り組んでおり、以前もPeleeについてはいくつかブログを書いておりますので、ぜひこちらもご覧ください！（Pelee関連記事は本ブログの最後に掲載しています）

ただ、物体検出手法のカテゴライズは地味に難しいのではないかと思っています。
現在、物体検出手法にはR-CNN、YOLO、SSDの３パターンが多く使われており、世に出まわっている論文はこれらの派生形です。

The R-CNN system is an architecture developed by Microsoft, and R-CNN, Fast R-CNN, Faster R-CNN, etc. are famous.
Faster R-CNN achieves 10 times faster learning efficiency by making Fast R-CNN end-to-end.
By the way, the period until Faster R-CNN is announced after Fast R-CNN is several months.
At that speed, technology in the AI field is developing.

End-to-end in object detection is a model that learns the relationship between input and output with one model without combining multiple models.
In R-CNN and Fast R-CNN, learning was performed according to each purpose.
Each purpose includes object position estimation (bounding Box attachment), class classification (object classification), and so on.

RPN (Refion Proposal Network) that replaces candidate region detection processing with selective search

arXiv:1506.01497v3
出典： “ Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks ”
キャプション：“ Figure 2: Faster R-CNN is a single, unified network for object detection. The RPN module serves as the ‘attention’ of this unified network.”
https://arxiv.org/pdf/1506.01497.pdf

「検出」の後に「識別」を行うような直列の処理構成なのですが、これによって処理速度が遅いという問題がありました。
そこで、YOLO（You Only Look Once）では、画像認識を回帰問題に落とし込み、検出と識別を同時に行うことを実現しました。
それによって、処理が速くなり、画像全体も見渡すことができるので、背景を誤検知することはなくなりました。
現在でも、YOLOは改良が続けられており、最新のものではversion3までソースコードがアップロードされています。
ただし、小さい物体の検出が得意ではないため、精度は他と比較すると高くはありません。

YOLO, which treats object detection as a regression problem

arXiv:1506.02640v5
出典： “ You Only Look Once:Unified, Real-Time Object Detection ”
キャプション：“ Figure 2: The Model. ”
https://arxiv.org/pdf/1506.02640.pdf

YOLOは改良が続いているので、一重に比較をすることはできませんが、精度と処理速度の二つの面で、Faster R-CNNとYOLOの真ん中にあるのがSSDの位置づけです。

Many of the source codes are open to the public, so please try them when you have time.

4. Latest Object Detection Model Implementation

We also implement the latest papers.
There was also a period where the latest architecture was built over several months.
This time, based on that experience, I will introduce M2Det adopted at AAAI2019.

M2Det is an object detection model developed by a team from Peking University, Alibaba and Temple University.
The code implemented by the M2Det paper author can be found here (https://github.com/qijiezhao/M2Det), so please give it a try.
The implementation is done in Pytorch.
The table below shows the accuracy (AP) and processing speed of M2Det, and you can see that M2Det has overwhelmingly high performance in both processing speed and accuracy.

Accuracy (AP) and processing speed of M2Det compared to other methods

arXiv:1811.04533v3
出典： “ M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network ”
キャプション：“ Figure 5: Speed (ms) vs. accuracy (mAP) on COCO test-dev. ”
https://arxiv.org/pdf/1811.04533.pdf

ネットワークとしては、Backbone Networkと呼ばれる単純な特徴抽出器（VGGやResNet等）で特徴抽出を行い、Multi-Level Feature Pyramid Network（MLFPN）に流し、最終的にPrediction Layerで物体検出および物体認識を行う、といった内容になっています。

The fun of M2Det lies in the multiple pyramid structures inside the architecture called Multi-Level Feature Pyramid Network (MLFPN).
MLFPN consists of three architectures.
・Feature Fusion Module (FFM)
・Thinned U-Shape Module (TUM)
・Scale-wise Future Aggregation Module (SFAM)

Overview of M2Det architecture

出典： “ M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network ”
キャプション：“ Figure 2: An overview of the proposed M2Det(320 × 320). ”
https://arxiv.org/pdf/1811.04533.pdf

ここでは、ネットワークについての詳細は書きませんが、本当にDeepなアーキテクチャとなっています。
ぜひ、現論文を参考にしてみてください。

5. Summary

This time, we have delivered an overview of object detection, an application example, and finally an implementation that you can easily try.

Learning a new network properly at the mathematical level is really a lot of learning.
However, it depends on how much you need to understand at the formula level.
It is also important to “experience” rather than “understand”.

Various excellent source codes have been released, so why not experience it by implementing it once?

Sources of AI papers featured in this article / Reference Lists

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. “SSD: Single Shot MultiBox Detector”. arXiv:1512.02325v5, https://arxiv.org/pdf/1512.02325.pdf

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks”. arXiv:1506.01497v3, https://arxiv.org/pdf/1506.01497.pdf

Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi. “You Only Look Once: Unified, Real-Time Object Detection”. arXiv:1506.02640v5, https://arxiv.org/pdf/1506.02640.pdf

Qijie Zhao. “M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network” Web, Retrieved from https://github.com/qijiezhao/M2Det

Qijie Zhao, Tao Sheng, Yongtao Wang, Zhi Tang, Ying Chen, Ling Cai and Haibin Ling. "M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network". https://arxiv.org/pdf/1811.04533.pdf

マクニカのARIH（AI Research & Innovation Hub）では、最先端のAI研究・調査・実装による評価をした上で最もふさわしいAI技術を組み合わせた知見を提供し、企業課題に対する最適解に導く活動をしています。
詳細は下記よりご覧ください。

Click here for details on ARIH