20190326-arih-column-thum.jpg

This article is recommended for those who

I want to try the latest object detection system! technical staff

Time needed to finish reading this article

3 minutes (It takes 20 minutes to move by hand.)

Introduction

Hello! This is Tsuchiya from Macnica AI Research Center!

First Tech Blog: Implementing Pelee on Jetson TX2 In the first installment, we implemented the latest object detection system Pelee, but this time we will implement it to the point where it moves in real time.
Please note that if you have not completed the first implementation, you will not be able to implement it in real time.

Ported Caffe's webcam input processing to Pelee

The https://github.com/Robert-JunWang/Peleerepository only has a script that outputs object detection information as text, so you can't see the object detection status in real time.
Therefore, based on ssd_pascal_webcam.py included in Caffe examples, I created a script that outputs in real time the state of object detection in a video input from a web camera in the same way with Pelee, and ran it.
If you run it, you will get output like below.

The first photo shows human object detection, and the second photo shows PET bottle object detection.
Object detection of PET bottles is notorious for being difficult, but you can see that object detection is not successful.

verify performance

It seems fast enough, but I'd like to do a performance comparison here.

In the same way, when comparing VGGNet, another real-time object detection system using a web camera, it can be confirmed that, contrary to expectations, the frame rate of PeleeNet is only about half that of VGGNet's SSD.
However, it is premature to judge that PeleeNet is inferior in performance based on this alone.
Let's take a look at the CPU/GPU usage of Jetson TX2 when performing object detection with VGGNet/PeleeNet.

First, from VGGNet.

nvidia@tegra-ubuntu:~$ sudo ./tegrastats RAM 2312/7851MB (lfb 1125x4MB) cpu [0%@2013,0%@2035,0%@2034,0%@2013,0%@2012,0%@2015] EMC 14%@1866 APE 150 GR3D 0%@1300 RAM 2323/7851MB (lfb 1125x4MB) cpu [34%@2000,6%@2035,0%@2034,7%@2015,10%@2051,9%@2043] EMC 30%@1866 APE 150 GR3D 0%@1300 RAM 2323/7851MB (lfb 1125x4MB) cpu [19%@1996,0%@2035,0%@2034,14%@1996,25%@1995,19%@1997] EMC 38%@1866 APE 150 GR3D 0%@1300 RAM 2323/7851MB (lfb 1125x4MB) cpu [19%@2036,0%@2034,0%@2036,5%@2035,29%@2034,20%@2035] EMC 40%@1866 APE 150 GR3D 9%@1300 RAM 2323/7851MB (lfb 1125x4MB) cpu [23%@2034,0%@2035,0%@2035,11%@2036,9%@2035,33%@2035] EMC 42%@1866 APE 150 GR3D 98%@1300 ・・・省略・・・ RAM 2323/7851MB (lfb 1125x4MB) cpu [18%@1996,0%@2035,0%@2036,31%@1996,13%@1997,10%@1997] EMC 43%@1866 APE 150 GR3D 0%@1300 RAM 2323/7851MB (lfb 1125x4MB) cpu [22%@2035,0%@2035,0%@2035,27%@2035,16%@2035,8%@2034] EMC 44%@1866 APE 150 GR3D 0%@13000 RAM 2323/7851MB (lfb 1125x4MB) cpu [20%@2019,0%@2035,0%@2036,21%@2010,20%@2020,18%@2016] EMC 43%@1866 APE 150 GR3D 78%@1300 RAM 2324/7851MB (lfb 1125x4MB) cpu [13%@2035,0%@2035,0%@2035,22%@2034,15%@2035,20%@2035] EMC 43%@1866 APE 150 GR3D 99%@1300 RAM 2324/7851MB (lfb 1125x4MB) cpu [10%@2034,0%@2034,0%@2036,20%@2036,21%@2033,20%@2034] EMC 44%@1866 APE 150 GR3D 99%@1300 RAM 2324/7851MB (lfb 1125x4MB) cpu [31%@1996,0%@2034,0%@2035,28%@1996,3%@1996,13%@1996] EMC 44%@1866 APE 150 GR3D 99%@1300

Next is PeleeNet.

nvidia@tegra-ubuntu:~$ sudo ./tegrastats [sudo] password for nvidia: RAM 2218/7851MB (lfb 1137x4MB) cpu [0%@2021,0%@2036,0%@2035,0%@2015,0%@2014,0%@2009] EMC 15%@1866 APE 150 GR3D 98%@1300 RAM 2218/7851MB (lfb 1137x4MB) cpu [2%@2035,0%@2034,97%@2035,10%@2034,13%@2035,11%@2035] EMC 16%@1866 APE 150 GR3D 0%@1300 RAM 2218/7851MB (lfb 1137x4MB) cpu [6%@2035,0%@2035,96%@2036,1%@2035,26%@2035,8%@2035] EMC 16%@1866 APE 150 GR3D 92%@1300 RAM 2218/7851MB (lfb 1137x4MB) cpu [3%@2034,0%@2035,98%@2036,1%@2035,21%@2035,10%@2035] EMC 15%@1866 APE 150 GR3D 0%@1300 RAM 2218/7851MB (lfb 1137x4MB) cpu [5%@1997,0%@2035,98%@2035,8%@1996,10%@1996,22%@1997] EMC 16%@1866 APE 150 GR3D 61%@1300 RAM 2218/7851MB (lfb 1137x4MB) cpu [2%@1998,0%@2035,98%@2033,10%@2007,14%@2012,11%@2012] EMC 16%@1866 APE 150 GR3D 0%@1300 ・・・省略・・・ RAM 2219/7851MB (lfb 1137x4MB) cpu [6%@2001,97%@2035,0%@2036,8%@2007,5%@2009,19%@2010] EMC 15%@1866 APE 150 GR3D 0%@1300 RAM 2219/7851MB (lfb 1137x4MB) cpu [5%@1997,97%@2035,0%@2034,14%@2010,6%@2008,13%@2010] EMC 15%@1866 APE 150 GR3D 0%@1300 RAM 2220/7851MB (lfb 1137x4MB) cpu [7%@2012,97%@2035,0%@2035,12%@2012,10%@2016,10%@2027] EMC 15%@1866 APE 150 GR3D 0%@1300 RAM 2219/7851MB (lfb 1137x4MB) cpu [3%@2018,97%@2035,0%@2035,17%@2010,14%@2008,12%@2015] EMC 15%@1866 APE 150 GR3D 97%@1300 RAM 2219/7851MB (lfb 1137x4MB) cpu [7%@2001,97%@2034,0%@2034,5%@2012,13%@2012,12%@2014] EMC 15%@1866 APE 150 GR3D 0%@1300 RAM 2219/7851MB (lfb 1137x4MB) cpu [6%@2000,98%@2035,0%@2035,8%@2014,2%@2012,21%@2017] EMC 15%@1866 APE 150 GR3D 0%@1300 RAM 2219/7851MB (lfb 1137x4MB) cpu [6%@1998,98%@2034,0%@2034,9%@2011,1%@2012,21%@2011] EMC 15%@1866 APE 150 GR3D 0%@1300

Looking at the utilization rate when performing real-time inference using a webcam, we can confirm that the GPU utilization rate of VGGNet and PeleeNet is significantly different.

consideration

In VGGNet, the GPU is almost 100% used up, while in PeleeNet, the utilization rate of a specific CPU core has risen to 100%,
There were many times when the utilization rate of the GPU core did not increase.

From this, we found that PeleeNet has some kind of bottleneck in the processing that uses the CPU, and it is not possible to draw out the performance of Jetson TX2.
Therefore, unfortunately, in order to perform real-time inference with Pelee at high speed, we have found that it is necessary to eliminate this bottleneck in addition to simply moving things in the repository.
I'd really like to find the bottleneck, but I think it's best to stop digging deeper than this and wait for the authors of the paper to make additional additions.

Summary

Following the first step, we implemented Pelee. In the first round, we were able to confirm that very high-speed inference was possible, but we discovered that there was a bottleneck when implementing inference in real time.

"Even if there is code, even if the environment is different, the system that implements the paper will suddenly become extremely difficult, so be careful."
I would like to conclude the Pelee implementation article with a final piece of advice. Thank you for joining us.

As a slightly more serious disclaimer, please note that Macnica does not take any responsibility for the use of the above content.