サイト内検索

This article is recommended for those who

I want to try the latest object detection system! technical staff

Time needed to finish reading this article

3分(手元で動かすには20分かかります。)

Introduction

Hello! This is Tsuchiya from Macnica AI Research Center!

First Tech Blog: Implementing Pelee on Jetson TX2 In the first installment, we implemented the latest object detection system Pelee, but this time we will implement it to the point where it moves in real time.
Please note that if you have not completed the first implementation, you will not be able to implement it in real time.

Ported Caffe's webcam input processing to Pelee

https://github.com/Robert-JunWang/Peleeのリポジトリには、物体検出の情報をテキスト出力するだけのスクリプトしかないため、リアルタイムに物体検知状況を見ることができません。
そこで、Caffeのexamplesに入っているssd_pascal_webcam.pyをベースに、Peleeで同じようにWEBカメラ入力動画で物体検出している様子をリアルタイムに出力するスクリプトを作成し、動作させてみました。
実行すると下記のような出力になるかと思います。

 

写真は一枚目が人の物体検出で、二枚目がペットボトルの物体検出です。
ペットボトルの物体検出は難しいことで有名ですが、うまく物体検出ができていないことがわかります。

verify performance

It seems fast enough, but I'd like to do a performance comparison here.

In the same way, when comparing VGGNet, another real-time object detection system using a web camera, it can be confirmed that, contrary to expectations, the frame rate of PeleeNet is only about half that of VGGNet's SSD.
However, it is premature to judge that PeleeNet is inferior in performance based on this alone.
Let's take a look at the CPU/GPU usage of Jetson TX2 when performing object detection with VGGNet/PeleeNet.

First, from VGGNet.

nvidia@tegra-ubuntu:~$ sudo ./tegrastats RAM 2312/7851MB (lfb 1125x4MB) cpu [0%@2013,0%@2035,0%@2034,0%@2013,0%@2012,0%@2015] EMC 14%@1866 APE 150 GR3D 0%@1300 RAM 2323/7851MB (lfb 1125x4MB) cpu [34%@2000,6%@2035,0%@2034,7%@2015,10%@2051,9%@2043] EMC 30%@1866 APE 150 GR3D 0%@1300 RAM 2323/7851MB (lfb 1125x4MB) cpu [19%@1996,0%@2035,0%@2034,14%@1996,25%@1995,19%@1997] EMC 38%@1866 APE 150 GR3D 0%@1300 RAM 2323/7851MB (lfb 1125x4MB) cpu [19%@2036,0%@2034,0%@2036,5%@2035,29%@2034,20%@2035] EMC 40%@1866 APE 150 GR3D 9%@1300 RAM 2323/7851MB (lfb 1125x4MB) cpu [23%@2034,0%@2035,0%@2035,11%@2036,9%@2035,33%@2035] EMC 42%@1866 APE 150 GR3D 98%@1300 ・・・省略・・・ RAM 2323/7851MB (lfb 1125x4MB) cpu [18%@1996,0%@2035,0%@2036,31%@1996,13%@1997,10%@1997] EMC 43%@1866 APE 150 GR3D 0%@1300 RAM 2323/7851MB (lfb 1125x4MB) cpu [22%@2035,0%@2035,0%@2035,27%@2035,16%@2035,8%@2034] EMC 44%@1866 APE 150 GR3D 0%@13000 RAM 2323/7851MB (lfb 1125x4MB) cpu [20%@2019,0%@2035,0%@2036,21%@2010,20%@2020,18%@2016] EMC 43%@1866 APE 150 GR3D 78%@1300 RAM 2324/7851MB (lfb 1125x4MB) cpu [13%@2035,0%@2035,0%@2035,22%@2034,15%@2035,20%@2035] EMC 43%@1866 APE 150 GR3D 99%@1300 RAM 2324/7851MB (lfb 1125x4MB) cpu [10%@2034,0%@2034,0%@2036,20%@2036,21%@2033,20%@2034] EMC 44%@1866 APE 150 GR3D 99%@1300 RAM 2324/7851MB (lfb 1125x4MB) cpu [31%@1996,0%@2034,0%@2035,28%@1996,3%@1996,13%@1996] EMC 44%@1866 APE 150 GR3D 99%@1300

Next is PeleeNet.

nvidia@tegra-ubuntu:~$ sudo ./tegrastats [sudo] password for nvidia: RAM 2218/7851MB (lfb 1137x4MB) cpu [0%@2021,0%@2036,0%@2035,0%@2015,0%@2014,0%@2009] EMC 15%@1866 APE 150 GR3D 98%@1300 RAM 2218/7851MB (lfb 1137x4MB) cpu [2%@2035,0%@2034,97%@2035,10%@2034,13%@2035,11%@2035] EMC 16%@1866 APE 150 GR3D 0%@1300 RAM 2218/7851MB (lfb 1137x4MB) cpu [6%@2035,0%@2035,96%@2036,1%@2035,26%@2035,8%@2035] EMC 16%@1866 APE 150 GR3D 92%@1300 RAM 2218/7851MB (lfb 1137x4MB) cpu [3%@2034,0%@2035,98%@2036,1%@2035,21%@2035,10%@2035] EMC 15%@1866 APE 150 GR3D 0%@1300 RAM 2218/7851MB (lfb 1137x4MB) cpu [5%@1997,0%@2035,98%@2035,8%@1996,10%@1996,22%@1997] EMC 16%@1866 APE 150 GR3D 61%@1300 RAM 2218/7851MB (lfb 1137x4MB) cpu [2%@1998,0%@2035,98%@2033,10%@2007,14%@2012,11%@2012] EMC 16%@1866 APE 150 GR3D 0%@1300 ・・・省略・・・ RAM 2219/7851MB (lfb 1137x4MB) cpu [6%@2001,97%@2035,0%@2036,8%@2007,5%@2009,19%@2010] EMC 15%@1866 APE 150 GR3D 0%@1300 RAM 2219/7851MB (lfb 1137x4MB) cpu [5%@1997,97%@2035,0%@2034,14%@2010,6%@2008,13%@2010] EMC 15%@1866 APE 150 GR3D 0%@1300 RAM 2220/7851MB (lfb 1137x4MB) cpu [7%@2012,97%@2035,0%@2035,12%@2012,10%@2016,10%@2027] EMC 15%@1866 APE 150 GR3D 0%@1300 RAM 2219/7851MB (lfb 1137x4MB) cpu [3%@2018,97%@2035,0%@2035,17%@2010,14%@2008,12%@2015] EMC 15%@1866 APE 150 GR3D 97%@1300 RAM 2219/7851MB (lfb 1137x4MB) cpu [7%@2001,97%@2034,0%@2034,5%@2012,13%@2012,12%@2014] EMC 15%@1866 APE 150 GR3D 0%@1300 RAM 2219/7851MB (lfb 1137x4MB) cpu [6%@2000,98%@2035,0%@2035,8%@2014,2%@2012,21%@2017] EMC 15%@1866 APE 150 GR3D 0%@1300 RAM 2219/7851MB (lfb 1137x4MB) cpu [6%@1998,98%@2034,0%@2034,9%@2011,1%@2012,21%@2011] EMC 15%@1866 APE 150 GR3D 0%@1300

Looking at the utilization rate when performing real-time inference using a webcam, we can confirm that the GPU utilization rate of VGGNet and PeleeNet is significantly different.

consideration

In VGGNet, the GPU is almost 100% used up, while in PeleeNet, the utilization rate of a specific CPU core has risen to 100%,
There were many times when the utilization rate of the GPU core did not increase.

From this, we found that PeleeNet has some kind of bottleneck in the processing that uses the CPU, and it is not possible to draw out the performance of Jetson TX2.
Therefore, unfortunately, in order to perform real-time inference with Pelee at high speed, we have found that it is necessary to eliminate this bottleneck in addition to simply moving things in the repository.
I'd really like to find the bottleneck, but I think it's best to stop digging deeper than this and wait for the authors of the paper to make additional additions.

Summary

Following the first step, we implemented Pelee. In the first round, we were able to confirm that very high-speed inference was possible, but we discovered that there was a bottleneck when implementing inference in real time.

"Even if there is code, even if the environment is different, the system that implements the paper will suddenly become extremely difficult, so be careful."
I would like to conclude the Pelee implementation article with a final piece of advice. Thank you for joining us.

As a slightly more serious disclaimer, please note that Macnica does not take any responsibility for the use of the above content.

Related article

*テックブログ*
PeleeをJetson TX2に実装する 第一弾

*テックブログ*
最先端のリアルタイム物体検出システム~Pelee~