First step to implement Pelee on Jetson TX2

AI business menu

AI business
HOME

What is macnica.ai

What AI Can Do

Products/Services

Seminar

Document List

What is macnica.ai

Strengths of macnica.ai

industry/theme/librarySearch by

Case Study

Blog

Glossary

This article is recommended for those who

I want to try the latest object detection system! technical staff

How long does it take to finish reading this article

10 minutes (it takes 1 hour to move by hand.)

Introduction

Hello! This is Tsuchiya from Macnica AI Research Center!

For those of you who are having a hard time with pollen rolling in today, I'm going to post a blog about how to implement a state-of-the-art object detection algorithm.

I've been blogging about cutting-edge theories lately, but I was thinking, ``It's not interesting if it's just the theory, so I have to give an implementation...''
"I can't write about the case here, so what should I do?
That's right, let's move Pelee, which was the accepted paper for NeurIPS 2018♪", so I tried it!

``The code has just been dropped on Github, so let's do this! It's a paper that was selected for NeurIPS 2018, a top AI conference, and it would be cool to run it!'' There is a very, very dangerous pitfall to implement .

The pitfall is that the man-hours required for implementation are not estimated, and the man-hours may be infinite.

This is because, for example, if you are implementing a paper from three years ago, if there is something that cannot be done, the source for the solution is available on the internet, but Pelee was released only six months ago. ,,
If an error occurs, it can be quite difficult to debug.
So (?), this time I tearfully asked Mr. Yamaoka of Macnica AI Research Center, who is always working in the seat behind me, to implement it.

Mr. Yamaoka is a muscular engineer with experience in software → embedded → machine learning. I'm sure Mr. Yamaoka can do it right away! That's what I thought, but I was surprised to find that they proceeded smoothly as expected. smile

This time, Mr. Yamaoka, a muscular engineer, will reveal the implementation method that repeated trial and error!

Mr. Yamaoka said, "The implementation was very difficult, so for people like me who have had to run Pelee on edge devices, I think it would be a good idea to post this effort on a blog." This is a blog that has been decided to post from. Thank you, Mr. Yamaoka.

We are probably the only company that explains the real-time object detection system Pelee, which is a crystal of cutting-edge AI technology (← maybe a little exaggerated?).
If you have a Jetson TX2, please try it!
If you don't have it, I think it would be interesting to watch the video posted in the second installment.

Due to the amount of text in this project, I divided it into two blogs.

In the first step, until Pelee infers a large number of images
In the second installment, we will connect Pelee and a web camera and perform real-time object detection.

Now let's talk about the implementation.

The way to move object detection of web camera images with Pelee
Build caffe environment with Jetson TX2
Prepare images for inference
Inference of SSD using VGGNet
Running Pelee on Jetson TX2
Summary

The way to move object detection of web camera images with Pelee

Based on the original paper, I actually performed real-time object detection with a web camera.
Since the original paper runs in the Caffe environment, I decided to run it in Caffe on Jetson TX2 as well.
Also, since Pelee's object detection is only an extension of SSD, it is necessary to make SSD work as well.

Finally, if you can detect objects with Pelee, the purpose of the first step is achieved.
In order to realize these, follow the contents of README.md in the following Github repository and proceed with the setup.

https://github.com/Robert-JunWang/Pelee

The specific setup flow is as follows.

Set up Caffe from https://github.com/weiliu89/caffe/tree/ssd and build an environment in which VGGNet's SSD operates.
Download the PASCAL VOC 2007 and 2012 datasets.
Generate an LMDB file using the downloaded dataset.
Download the trained model and run inference.

Build caffe environment with Jetson TX2

In order to actually build an environment in which Pelee works, we need to install Caffe first.
Before that, we will perform a spell to bring out the performance of the Jetson TX2's CPU/GPU core.

nvidia@tegra-ubuntu:~$ nvidia@tegra-ubuntu:~$ sudo ./jetson_clocks.sh [sudo] password for nvidia: nvidia@tegra-ubuntu:~$ nvidia@tegra-ubuntu:~$ sudo nvpmodel -m 0 nvidia@tegra-ubuntu:~$

In the Jetson TX2 home directory, the repository
https://github.com/weiliu89/caffe.git
Clone and switch to the ssd branch.

git clone https://github.com/weiliu89/caffe.git cd caffe/ git checkout ssd

Make the necessary settings based on the sample configuration.

cp Makefile.config.example Makefile.config vi Makefile.config

Here, it is necessary to make a lot of corrections to Makefile.config.
*Press the button above to start downloading

Jetson TX2 is an 8-core CPU, so add j8 to the make option.

nvidia@tegra-ubuntu:~/caffe$ nvidia@tegra-ubuntu:~/caffe$ make -j8 PROTOC src/caffe/proto/caffe.proto CXX src/caffe/layers/dummy_data_layer.cpp CXX src/caffe/layers/elu_layer.cpp CXX src/caffe/layer.cpp CXX src/caffe/layers/data_layer.cpp CXX src/caffe/layers/window_data_layer.cpp CXX src/caffe/layers/threshold_layer.cpp CXX src/caffe/layers/inner_product_layer.cpp CXX src/caffe/layers/memory_data_layer.cpp CXX src/caffe/layers/detection_evaluate_layer.cpp CXX src/caffe/layers/argmax_layer.cpp CXX src/caffe/layers/cudnn_tanh_layer.cpp CXX src/caffe/layers/sigmoid_cross_entropy_loss_layer.cpp CXX src/caffe/layers/lstm_layer.cpp CXX src/caffe/layers/deconv_layer.cpp CXX src/caffe/layers/split_layer.cpp ・・・省略・・・ CXX/LD -o .build_release/tools/train_net.bin CXX/LD -o .build_release/tools/upgrade_solver_proto_text.bin CXX/LD -o .build_release/tools/get_image_size.bin CXX/LD -o .build_release/examples/cifar10/convert_cifar_data.bin CXX/LD -o .build_release/examples/mnist/convert_mnist_data.bin CXX/LD -o .build_release/examples/siamese/convert_mnist_siamese_data.bin CXX/LD -o .build_release/examples/cpp_classification/classification.bin CXX/LD -o .build_release/examples/ssd/ssd_detect.bin nvidia@tegra-ubuntu:~/caffe$ nvidia@tegra-ubuntu:~/caffe$ nvidia@tegra-ubuntu:~/caffe$ make py CXX/LD -o python/caffe/_caffe.so python/caffe/_caffe.cpp touch python/caffe/proto/__init__.py PROTOC (python) src/caffe/proto/caffe.proto nvidia@tegra-ubuntu:~/caffe$

Build a test program to verify the built caffe.

nvidia@tegra-ubuntu:~/caffe$ make test -j8 CXX src/caffe/test/test_rnn_layer.cpp CXX src/caffe/test/test_gradient_based_solver.cpp CXX src/caffe/test/test_random_number_generator.cpp CXX src/caffe/test/test_upgrade_proto.cpp CXX src/caffe/test/test_infogain_loss_layer.cpp CXX src/caffe/test/test_concat_layer.cpp CXX src/caffe/test/test_spp_layer.cpp CXX src/caffe/test/test_image_data_layer.cpp CXX src/caffe/test/test_data_transformer.cpp ・・・省略・・・ LD .build_release/src/caffe/test/test_split_layer.o LD .build_release/src/caffe/test/test_tile_layer.o LD .build_release/src/caffe/test/test_deconvolution_layer.o LD .build_release/src/caffe/test/test_tanh_layer.o LD .build_release/src/caffe/test/test_reduction_layer.o LD .build_release/src/caffe/test/test_im2col_layer.o LD .build_release/src/caffe/test/test_scale_layer.o LD .build_release/src/caffe/test/test_batch_norm_layer.o LD .build_release/src/caffe/test/test_bias_layer.o LD .build_release/cuda/src/caffe/test/test_im2col_kernel.o CXX/LD -o .build_release/test/test_all.testbin src/caffe/test/test_caffe_main.cpp nvidia@tegra-ubuntu:~/caffe$

nvidia@tegra-ubuntu:~/caffe$ make test -j8make runtest -j8 .build_release/tools/caffe caffe: command line brew usage: caffe <command> <args> commands: train train or finetune a model test score a model device_query show GPU diagnostic information time benchmark model execution time Flags from tools/caffe.cpp: -gpu (Optional; run in GPU mode on given device IDs separated by ','.Use '-gpu all' to run on all available GPUs. The effective training batch size is multiplied by the number of devices.) type: string default: "" -iterations (The number of iterations to run.) type: int32 default: 50 -level (Optional; network level.) type: int32 default: 0 -model (The model definition protocol buffer text file.) type: string default: "" -phase (Optional; network phase (TRAIN or TEST). Only used for 'time'.) type: string default: "" -sighup_effect (Optional; action to take when a SIGHUP signal is received: snapshot, stop or none.) type: string default: "snapshot" -sigint_effect (Optional; action to take when a SIGINT signal is received: snapshot, stop or none.) type: string default: "stop" -snapshot (Optional; the snapshot solver state to resume training.) type: string default: "" -solver (The solver definition protocol buffer text file.) type: string default: "" -stage (Optional; network stages (not to be confused with phase), separated by ','.) type: string default: "" -weights (Optional; the pretrained weights to initialize finetuning, separated by ','. Cannot be set simultaneously with snapshot.) type: string default: "" *** Error in `.build_release/tools/caffe': free(): invalid pointer: 0x000000000043b110 *** *** Aborted at 1549368479 (unix time) try "date -d @1549368479" if you are using GNU date *** PC: @ 0x0 (unknown) *** SIGABRT (@0x3e900006ec5) received by PID 28357 (TID 0x7f600c3000) from PID 28357; stack trace: *** @ 0x7f7a4384e0 ([vdso]+0x4df) @ 0x7f78f0e528 gsignal Makefile:526: recipe for target 'runtest' failed make: *** [runtest] Aborted (core dumped) nvidia@tegra-ubuntu:~/caffe$

That's an error.
If you get an error like this, install libtcmalloc-minimal4.

nvidia@tegra-ubuntu:~/caffe$ sudo apt-get install libtcmalloc-minimal4
[sudo] password for nvidia:
Reading package lists... 0%Reading package lists... 100%Reading package lists... Done
Building dependency tree... 0%Building dependency tree... 0%Building dependency tree... 50%Building dependency tree... 50%Building dependency tree
Reading state information... 0%Reading state information... 0%Reading state information... Done
The following packages were automatically installed and are no longer required:
apt-clone archdetect-deb dmeventd dmraid dpkg-repack gir1.2-timezonemap-1.0 gir1.2-xkl-1.0 gstreamer1.0-plugins-bad-videoparsers kpartx kpartx-boot libappstream3 libass5
libavresample-ffmpeg2 libbs2b0 libdebian-installer4 libdevmapper-event1.02.1 libdmraid1.0.0.rc16 libflite1 libgstreamer-plugins-bad1.0-0 liblockfile-bin liblockfile1 liblvm2app2.2
liblvm2cmd2.02 libmircommon5 libparted-fs-resize0 libpostproc-ffmpeg53 libreadline5 libsodium18 libtbb-dev libtbb2 libzmq5 lockfile-progs lvm2 os-prober pmount python3-icu python3-pam
rdate ubiquity-casper ubiquity-ubuntu-artwork
Use 'sudo apt autoremove' to remove them.
The following NEW packages will be installed:
libtcmalloc-minimal4
0 upgraded, 1 newly installed, 0 to remove and 716 not upgraded.
Need to get 96.0 kB of archives.
After this operation, 383 kB of additional disk space will be used.
0% [Working]0% [Connecting to ports.ubuntu.com] 0% [Waiting for headers] Get:1 http://ports.ubuntu.com/ubuntu-ports xenial-updates/main arm64 libtcmalloc-minimal4 arm64 2.4-0ubuntu5.16.04.1 [96.0 kB]
11% [1 libtcmalloc-minimal4 13.8 kB/96.0 kB 14%]35% [1 libtcmalloc-minimal4 41.8 kB/96.0 kB 44%] 100% [Working] Fetched 96.0 kB in 1s (57.2 kB/s)
Selecting previously unselected package libtcmalloc-minimal4.
(Reading database ... (Reading database ... 5%(Reading database ... 10%(Reading database ... 15%(Reading database ... 20%(Reading database ... 25%(Reading database ... 30%(Reading database ... 35%(Reading database ... 40%(Reading database ... 45%(Reading database ... 50%(Reading database ... 55%(Reading database ... 60%(Reading database ... 65%(Reading database ... 70%(Reading database ... 75%(Reading database ... 80%(Reading database ... 85%(Reading database ... 90%(Reading database ... 95%(Reading database ... 100%(Reading database ... 188854 files and directories currently installed.)
Preparing to unpack .../libtcmalloc-minimal4_2.4-0ubuntu5.16.04.1_arm64.deb ...
Unpacking libtcmalloc-minimal4 (2.4-0ubuntu5.16.04.1) ...
Processing triggers for libc-bin (2.23-0ubuntu3) ...
Setting up libtcmalloc-minimal4 (2.4-0ubuntu5.16.04.1) ...
Processing triggers for libc-bin (2.23-0ubuntu3) ...
nvidia@tegra-ubuntu:~/caffe$

I ran runtest again, but the test failed with the following error in my environment. The test code seems to be wrong, so I just proceed to the next step.

nvidia@tegra-ubuntu:~/caffe$ make runtest -j8 .build_release/tools/caffe caffe: command line brew usage: caffe <command> <args> commands: train train or finetune a model test score a model device_query show GPU diagnostic information time benchmark model execution time Flags from tools/caffe.cpp: -gpu (Optional; run in GPU mode on given device IDs separated by ','.Use '-gpu all' to run on all available GPUs. The effective training batch size is multiplied by the number of devices.) type: string default: "" -iterations (The number of iterations to run.) type: int32 default: 50 -level (Optional; network level.) type: int32 default: 0 -model (The model definition protocol buffer text file.) type: string default: "" -phase (Optional; network phase (TRAIN or TEST). Only used for 'time'.) type: string default: "" -sighup_effect (Optional; action to take when a SIGHUP signal is received: snapshot, stop or none.) type: string default: "snapshot" -sigint_effect (Optional; action to take when a SIGINT signal is received: snapshot, stop or none.) type: string default: "stop" -snapshot (Optional; the snapshot solver state to resume training.) type: string default: "" -solver (The solver definition protocol buffer text file.) type: string default: "" -stage (Optional; network stages (not to be confused with phase), separated by ','.) type: string default: "" -weights (Optional; the pretrained weights to initialize finetuning, separated by ','. Cannot be set simultaneously with snapshot.) type: string default: "" .build_release/test/test_all.testbin 0 --gtest_shuffle Cuda number of devices: 1 Setting to use device 0 Current device id: 0 Current device name: NVIDIA Tegra X2 Note: Randomizing tests' orders with a seed of 10715 . [==========] Running 2361 tests from 309 test cases. [----------] Global test environment set-up. [----------] 6 tests from RNNLayerTest/0, where TypeParam = caffe::CPUDevice<float> [ RUN ] RNNLayerTest/0.TestForward [ OK ] RNNLayerTest/0.TestForward (2972 ms) [ RUN ] RNNLayerTest/0.TestGradientNonZeroContBufferSize2 [ OK ] RNNLayerTest/0.TestGradientNonZeroContBufferSize2 (319 ms) ・・・省略・・・ [ RUN ] CPUBBoxUtilTest.TestBBoxSize [ OK ] CPUBBoxUtilTest.TestBBoxSize (0 ms) [ RUN ] CPUBBoxUtilTest.TestGetPriorBBoxes [ OK ] CPUBBoxUtilTest.TestGetPriorBBoxes (0 ms) [ RUN ] CPUBBoxUtilTest.TestMatchBBoxLableOneBipartite [ OK ] CPUBBoxUtilTest.TestMatchBBoxLableOneBipartite (0 ms) [ RUN ] CPUBBoxUtilTest.TestDecodeBBoxesCenterSize [ OK ] CPUBBoxUtilTest.TestDecodeBBoxesCenterSize (0 ms) [ RUN ] CPUBBoxUtilTest.TestOutputBBox F0205 21:28:29.267797 28997 test_bbox_util.cpp:279] Check failed: out_bbox.xmax() == 50. (50 vs. 50) *** Check failure stack trace: *** @ 0x7fa24c9718 google::LogMessage::Fail() @ 0x7fa24cb614 google::LogMessage::SendToLog() @ 0x7fa24c9290 google::LogMessage::Flush() @ 0x7fa24cbeb4 google::LogMessageFatal::~LogMessageFatal() @ 0x965af8 caffe::CPUBBoxUtilTest_TestOutputBBox_Test::TestBody() @ 0xa55244 testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0xa4d4ec testing::Test::Run() @ 0xa4d628 testing::TestInfo::Run() @ 0xa4d6e8 testing::TestCase::Run() @ 0xa4e848 testing::internal::UnitTestImpl::RunAllTests() @ 0xa4eb5c testing::UnitTest::Run() @ 0x53b4e8 main @ 0x7fa04f58a0 __libc_start_main Makefile:526: recipe for target 'runtest' failed make: *** [runtest] Aborted (core dumped) nvidia@tegra-ubuntu:~/caffe$ nvidia@tegra-ubuntu:~/caffe$

Although there is an error, you can actually move the SSD ahead of it.

Prepare images for inference

Download and unzip the VOC 2007 and VOC 2012 datasets.

The dataset is
http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
will be

wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar

Let's take a look inside.

nvidia@tegra-ubuntu:~/data$ ls -l
total 2842496
-rw-rw-r-- 1 nvidia nvidia 451020800 Jan 1 22:27 VOCtest_06-Nov-2007.tar
-rw-rw-r-- 1 nvidia nvidia 460032000 Jan 1 22:27 VOCtrainval_06-Nov-2007.tar
-rw-rw-r-- 1 nvidia nvidia 1999639040 Jan 30 12:24 VOCtrainval_11-May-2012.tar
nvidia@tegra-ubuntu:~/data$
nvidia@tegra-ubuntu:~/data$
nvidia@tegra-ubuntu:~/data$ tar -xvf VOCtest_06-Nov-2007.tar
VOCdevkit/
VOCdevkit/VOC2007/
VOCdevkit/VOC2007/Annotations/
VOCdevkit/VOC2007/Annotations/000001.xml
VOCdevkit/VOC2007/Annotations/000002.xml
VOCdevkit/VOC2007/Annotations/000003.xml
・・・省略・・・
VOCdevkit/VOC2007/SegmentationObject/009788.png
VOCdevkit/VOC2007/SegmentationObject/009817.png
VOCdevkit/VOC2007/SegmentationObject/009889.png
VOCdevkit/VOC2007/SegmentationObject/009899.png
VOCdevkit/VOC2007/SegmentationObject/009901.png
nvidia@tegra-ubuntu:~/data$
nvidia@tegra-ubuntu:~/data$

Extract VOCtrainval_06-Nov-2007.tar.

nvidia@tegra-ubuntu:~/data$ nvidia@tegra-ubuntu:~/data$ tar -xvf VOCtrainval_06-Nov-2007.tar VOCdevkit/ VOCdevkit/VOC2007/ VOCdevkit/VOC2007/Annotations/ VOCdevkit/VOC2007/Annotations/000005.xml VOCdevkit/VOC2007/Annotations/000007.xml VOCdevkit/VOC2007/Annotations/000009.xml VOCdevkit/VOC2007/Annotations/000012.xml VOCdevkit/VOC2007/Annotations/000016.xml VOCdevkit/VOC2007/Annotations/000017.xml ・・・省略・・・ VOCdevkit/VOC2007/SegmentationObject/009817.png VOCdevkit/VOC2007/SegmentationObject/009889.png VOCdevkit/VOC2007/SegmentationObject/009899.png VOCdevkit/VOC2007/SegmentationObject/009901.png nvidia@tegra-ubuntu:~/data$ nvidia@tegra-ubuntu:~/data$

Next, let's expand VOCtrainval_11-May-2012.tar.

nvidia@tegra-ubuntu:~/data$ nvidia@tegra-ubuntu:~/data$ tar -xvf VOCtrainval_11-May-2012.tar VOCdevkit/ VOCdevkit/VOC2012/ VOCdevkit/VOC2012/Annotations/ VOCdevkit/VOC2012/Annotations/2007_000027.xml VOCdevkit/VOC2012/Annotations/2007_000032.xml ・・・省略・・・ kit/VOC2012/SegmentationObject/2011_003240.png VOCdevkit/VOC2012/SegmentationObject/2011_003246.png VOCdevkit/VOC2012/SegmentationObject/2011_003255.png VOCdevkit/VOC2012/SegmentationObject/2011_003256.png VOCdevkit/VOC2012/SegmentationObject/2011_003271.png nvidia@tegra-ubuntu:~/data$

here,
Execute create_list.sh and create_data.sh to generate LMDB files for training and inference.

nvidia@tegra-ubuntu:~/caffe$ ./data/VOC0712/create_list.sh Create list for VOC2007 trainval... Create list for VOC2012 trainval... Create list for VOC2007 test... I0130 19:55:25.554071 9533 get_image_size.cpp:61] A total of 4952 images. I0130 19:55:28.213536 9533 get_image_size.cpp:100] Processed 1000 files. I0130 19:55:30.687409 9533 get_image_size.cpp:100] Processed 2000 files. I0130 19:55:33.161638 9533 get_image_size.cpp:100] Processed 3000 files. I0130 19:55:35.535889 9533 get_image_size.cpp:100] Processed 4000 files. I0130 19:55:37.793275 9533 get_image_size.cpp:105] Processed 4952 files. nvidia@tegra-ubuntu:~/caffe$

nvidia@tegra-ubuntu:~/caffe$ ./data/VOC0712/create_data.sh
/home/nvidia/caffe/build/tools/convert_annoset --anno_type=detection --label_type=xml --label_map_file=/home/nvidia/caffe/data/VOC0712/../../data/VOC0712/labelmap_voc.prototxt --check_label=True --min_dim=0 --max_dim=0 --resize_height=0 --resize_width=0 --backend=lmdb --shuffle=False --check_size=False --encode_type=jpg --encoded=True --gray=False /home/nvidia/data/VOCdevkit/ /home/nvidia/caffe/data/VOC0712/../../data/VOC0712/test.txt /home/nvidia/data/VOCdevkit/VOC0712/lmdb/VOC0712_test_lmdb
I0130 19:55:49.333920 9554 convert_annoset.cpp:122] A total of 4952 images.
I0130 19:55:49.335022 9554 db_lmdb.cpp:35] Opened lmdb /home/nvidia/data/VOCdevkit/VOC0712/lmdb/VOC0712_test_lmdb
I0130 19:55:55.834028 9554 convert_annoset.cpp:195] Processed 1000 files.
I0130 19:56:01.903102 9554 convert_annoset.cpp:195] Processed 2000 files.
I0130 19:56:08.014868 9554 convert_annoset.cpp:195] Processed 3000 files.
I0130 19:56:14.119681 9554 convert_annoset.cpp:195] Processed 4000 files.
I0130 19:56:19.831797 9554 convert_annoset.cpp:201] Processed 4952 files.
/home/nvidia/caffe/build/tools/convert_annoset --anno_type=detection --label_type=xml --label_map_file=/home/nvidia/caffe/data/VOC0712/../../data/VOC0712/labelmap_voc.prototxt --check_label=True --min_dim=0 --max_dim=0 --resize_height=0 --resize_width=0 --backend=lmdb --shuffle=False --check_size=False --encode_type=jpg --encoded=True --gray=False /home/nvidia/data/VOCdevkit/ /home/nvidia/caffe/data/VOC0712/../../data/VOC0712/trainval.txt /home/nvidia/data/VOCdevkit/VOC0712/lmdb/VOC0712_trainval_lmdb
I0130 19:56:22.483315 9572 convert_annoset.cpp:122] A total of 16551 images.
I0130 19:56:22.484650 9572 db_lmdb.cpp:35] Opened lmdb /home/nvidia/data/VOCdevkit/VOC0712/lmdb/VOC0712_trainval_lmdb
I0130 19:56:29.785678 9572 convert_annoset.cpp:195] Processed 1000 files.
I0130 19:56:36.772826 9572 convert_annoset.cpp:195] Processed 2000 files.
I0130 19:56:43.898202 9572 convert_annoset.cpp:195] Processed 3000 files.
I0130 19:56:50.940439 9572 convert_annoset.cpp:195] Processed 4000 files.
I0130 19:56:57.937795 9572 convert_annoset.cpp:195] Processed 5000 files.
I0130 19:57:04.921504 9572 convert_annoset.cpp:195] Processed 6000 files.
I0130 19:57:11.781653 9572 convert_annoset.cpp:195] Processed 7000 files.
I0130 19:57:18.983180 9572 convert_annoset.cpp:195] Processed 8000 files.
I0130 19:57:25.933579 9572 convert_annoset.cpp:195] Processed 9000 files.
I0130 19:57:32.910079 9572 convert_annoset.cpp:195] Processed 10000 files.
I0130 19:57:39.977454 9572 convert_annoset.cpp:195] Processed 11000 files.
I0130 19:57:46.941737 9572 convert_annoset.cpp:195] Processed 12000 files.
I0130 19:57:53.976258 9572 convert_annoset.cpp:195] Processed 13000 files.
I0130 19:58:00.762612 9572 convert_annoset.cpp:195] Processed 14000 files.
I0130 19:58:07.732848 9572 convert_annoset.cpp:195] Processed 15000 files.
I0130 19:58:14.741941 9572 convert_annoset.cpp:195] Processed 16000 files.
I0130 19:58:18.569044 9572 convert_annoset.cpp:201] Processed 16551 files.
nvidia@tegra-ubuntu:~/caffe$

For now, the dataset preparation is complete.

Inference of SSD using VGGNet

配布されている学習済みモデルmodels_VGGNet_VOC0712_SSD_300x300.tar.gzを使って、Jetson TX2での推論を行います。

When you unzip models_VGGNet_VOC0712_SSD_300x300.tar.gz, there are model files under models/VGGNet/VOC0712. Place this directory structure in the models directory directly under Caffe.

To run the inference, run the script score_ssd_pascal.py under examples/ssd. Since console output is output during execution, it is convenient because it can be left in a file as a log if you redirect it.

Run python examples/ssd/score_ssd_pascal.py.

nvidia@tegra-ubuntu:~/caffe$ python examples/ssd/score_ssd_pascal.py > run_score_ssd_pascal212.log

Then display the log.

nvidia@tegra-ubuntu:~/caffe$ cat run_score_ssd_pascal2.log I0211 16:47:11.902698 10532 caffe.cpp:217] Using GPUs 0 I0211 16:47:11.910640 10532 caffe.cpp:222] GPU 0: NVIDIA Tegra X2 I0211 16:47:12.520488 10532 solver.cpp:63] Initializing solver from parameters: train_net: "models/VGGNet/VOC0712/SSD_300x300_score/train.prototxt" test_net: "models/VGGNet/VOC0712/SSD_300x300_score/test.prototxt" test_iter: 619 test_interval: 10000 base_lr: 0.001 display: 10 max_iter: 0 lr_policy: "multistep" gamma: 0.1 momentum: 0.9 weight_decay: 0.0005 snapshot: 0 snapshot_prefix: "models/VGGNet/VOC0712/SSD_300x300/VGG_VOC0712_SSD_300x300" solver_mode: GPU device_id: 0 debug_info: false train_state { level: 0 stage: "" ・・・省略・・・ 120000.caffemodel I0211 16:47:14.007650 10532 net.cpp:761] Ignoring source layer mbox_loss I0211 16:47:14.016592 10532 caffe.cpp:251] Starting Optimization I0211 16:47:14.016697 10532 solver.cpp:294] Solving VGG_VOC0712_SSD_300x300_train I0211 16:47:14.016718 10532 solver.cpp:295] Learning Rate Policy: multistep I0211 16:47:14.471560 10532 solver.cpp:332] Iteration 0, loss = 1.36019 I0211 16:47:14.471647 10532 solver.cpp:433] Iteration 0, Testing net (#0) I0211 16:47:14.484737 10532 net.cpp:693] Ignoring source layer mbox_loss I0211 17:02:26.484411 10532 solver.cpp:546] Test net output #0: detection_eval = 0.776861 I0211 17:02:26.486299 10532 solver.cpp:337] Optimization Done. I0211 17:02:26.486354 10532 caffe.cpp:254] Optimization Done.

It took about 15 minutes and 15 seconds to run. When converted to frame rate, it becomes about 23fps.

For details,
Number of verification images: 4952 + 16551
Seconds to process: 915 seconds

That's pretty fast, isn't it?

Running Pelee on Jetson TX2

Up to this point, you can finally get Pelee to work.
Clone Pelee's repository in your home directory and set a symbolic link under caffe's examples.

gitclone https://github.com/Robert-JunWang/Pelee.git ln -sf `pwd`/Pelee ~/caffe/examples/pelee

Let's take a look at the files under examples.

nvidia@tegra-ubuntu:~$ cd caffe/
nvidia@tegra-ubuntu:~/caffe$ cd examples/
nvidia@tegra-ubuntu:~/caffe/examples$ ls -l
total 9144
-rw-rw-r-- 1 nvidia nvidia 813348 Feb 5 19:48 00-classification.ipynb
-rw-rw-r-- 1 nvidia nvidia 376291 Feb 5 19:48 01-learning-lenet.ipynb
-rw-rw-r-- 1 nvidia nvidia 480501 Feb 5 19:49 02-fine-tuning.ipynb
-rw-rw-r-- 1 nvidia nvidia 452886 Feb 5 19:49 brewing-logreg.ipynb
drwxrwxr-x 2 nvidia nvidia 4096 Feb 5 19:49 cifar10
-rw-rw-r-- 1 nvidia nvidia 1063 Feb 5 19:49 CMakeLists.txt
-rw-rw-r-- 1 nvidia nvidia 1730512 Feb 5 19:49 convert_model.ipynb
drwxrwxr-x 2 nvidia nvidia 4096 Feb 5 19:48 cpp_classification
-rw-rw-r-- 1 nvidia nvidia 702461 Feb 5 19:48 detection.ipynb
drwxrwxr-x 2 nvidia nvidia 4096 Feb 5 19:48 feature_extraction
drwxrwxr-x 2 nvidia nvidia 4096 Feb 5 19:49 finetune_flickr_style
drwxrwxr-x 2 nvidia nvidia 4096 Feb 5 19:48 finetune_pascal_detection
drwxrwxr-x 2 nvidia nvidia 4096 Feb 5 19:48 hdf5_classification
drwxrwxr-x 2 nvidia nvidia 4096 Feb 5 19:48 imagenet
drwxrwxr-x 2 nvidia nvidia 4096 Feb 5 19:49 images
-rw-rw-r-- 1 nvidia nvidia 898446 Feb 5 19:49 inceptionv3.ipynb
drwxrwxr-x 2 nvidia nvidia 4096 Feb 5 19:49 mnist
drwxrwxr-x 2 nvidia nvidia 4096 Feb 5 19:48 net_surgery
-rw-rw-r-- 1 nvidia nvidia 583251 Feb 5 19:48 net_surgery.ipynb
-rw-rw-r-- 1 nvidia nvidia 1539559 Feb 5 19:48 pascal-multilabel-with-datalayer.ipynb
lrwxrwxrwx 1 nvidia nvidia 18 Feb 11 15:02 pelee -> /home/nvidia/Pelee ★
drwxrwxr-x 3 nvidia nvidia 4096 Feb 5 19:49 pycaffe
drwxrwxr-x 2 nvidia nvidia 4096 Feb 5 19:48 siamese
drwxrwxr-x 2 nvidia nvidia 4096 Feb 5 19:49 ssd
-rw-rw-r-- 1 nvidia nvidia 805121 Feb 5 19:49 ssd_detect.ipynb
-rw-rw-r-- 1 nvidia nvidia 892397 Feb 5 19:49 ssd.ipynb
drwxrwxr-x 2 nvidia nvidia 4096 Feb 5 19:49 videos
drwxrwxr-x 2 nvidia nvidia 4096 Feb 11 10:57 VOC0712
drwxrwxr-x 3 nvidia nvidia 4096 Feb 5 19:48 web_demo
nvidia@tegra-ubuntu:~/caffe/examples$

Next, check the contents of Pelee's directory.

nvidia@tegra-ubuntu:~/caffe/examples$ cd paelee
nvidia@tegra-ubuntu:~/caffe/examples/pelee$
nvidia@tegra-ubuntu:~/caffe/examples/pelee$ ls -l
total 696
-rw-rw-r-- 1 nvidia nvidia 583997 Feb 11 15:01 detect_eval.ipynb
-rw-rw-r-- 1 nvidia nvidia 18391 Feb 11 15:01 eval_voc.py
-rw-rw-r-- 1 nvidia nvidia 17490 Feb 11 15:01 feature_extractor.py
-rw-rw-r-- 1 nvidia nvidia 5221 Feb 11 15:01 layer_utils.py
-rw-rw-r-- 1 nvidia nvidia 11357 Feb 11 15:01 LICENSE
drwxrwxr-x 3 nvidia nvidia 4096 Feb 11 15:01 model
-rw-rw-r-- 1 nvidia nvidia 5831 Feb 11 15:01 peleenet.py
-rw-rw-r-- 1 nvidia nvidia 3545 Feb 11 15:01 README.md
drwxrwxr-x 2 nvidia nvidia 4096 Feb 11 15:01 samples
drwxrwxr-x 2 nvidia nvidia 4096 Feb 11 15:01 tools
-rw-rw-r-- 1 nvidia nvidia 19988 Feb 11 15:01 train_coco.py
-rw-rw-r-- 1 nvidia nvidia 19779 Feb 11 15:01 train_voc.py
nvidia@tegra-ubuntu:~/caffe/examples/pelee$

Download the trained model used for Pelee verification and place it under cafee/models.
Items marked with * are used this time.

nvidia@tegra-ubuntu:~/caffe/models$ ls -l total 11088 drwxrwxr-x 2 nvidia nvidia 4096 Feb 5 19:49 bvlc_alexnet drwxrwxr-x 2 nvidia nvidia 4096 Feb 5 19:49 bvlc_googlenet drwxrwxr-x 2 nvidia nvidia 4096 Feb 5 19:49 bvlc_reference_caffenet drwxrwxr-x 2 nvidia nvidia 4096 Feb 5 19:49 bvlc_reference_rcnn_ilsvrc13 drwxrwxr-x 2 nvidia nvidia 4096 Feb 5 19:48 finetune_flickr_style drwxrwxr-x 3 nvidia nvidia 4096 Feb 11 15:22 pelee ★ drwxrwxr-x 2 nvidia nvidia 4096 Feb 12 20:57 pelee_coco ★ drwxrwxr-x 2 nvidia nvidia 4096 Feb 12 20:57 pelee_coco_voc ★ drwxrwxr-x 2 nvidia nvidia 4096 Feb 12 20:58 pelee_voc ★ drwxrwxr-x 3 nvidia nvidia 4096 Feb 11 13:31 VGGNet drwxr-xr-x 2 nvidia nvidia 4096 Feb 12 20:58 voc nvidia@tegra-ubuntu:~/caffe/models$

I specified PASCAL VOC 07+12 from the distributed models and ran it.

nvidia@tegra-ubuntu:~/caffe$ python examples/pelee/eval_voc.py --weights=models/pelee_voc/pelee_304x304_acc7094.caffemodel args: Namespace(arch='pelee', batch_size=8, image_size=304, kernel_size=1, lr=0.005, posfix='', run_soon=True, step_value=[80000, 100000, 120000], weight_decay=0.0005, weights='models/pelee_voc/pelee_304x304_acc7094.caffemodel') I0212 21:24:18.716425 1733 caffe.cpp:217] Using GPUs 0 I0212 21:24:18.726912 1733 caffe.cpp:222] GPU 0: NVIDIA Tegra X2 I0212 21:24:19.560145 1733 solver.cpp:63] Initializing solver from parameters: train_net: "models/pelee/VOC0712/SSD_304x304_score/train.prototxt" test_net: "models/pelee/VOC0712/SSD_304x304_score/test.prototxt" test_iter: 619 test_interval: 2000 base_lr: 0.005 display: 10 max_iter: 0 lr_policy: "multistep" gamma: 0.1 momentum: 0.9 weight_decay: 0.0005 snapshot: 0 snapshot_prefix: "models/pelee/VOC0712/SSD_304x304/pelee_SSD_304x304" solver_mode: GPU device_id: 0 debug_info: false train_state { level: 0 stage: "" } snapshot_after_train: false test_initialization: true average_loss: 10 ・・・省略・・・ I0212 21:24:34.341764 1733 upgrade_proto.cpp:80] Successfully upgraded batch norm layers using deprecated params. I0212 21:24:34.349417 1733 net.cpp:761] Ignoring source layer mbox_loss I0212 21:24:34.353690 1733 caffe.cpp:251] Starting Optimization I0212 21:24:34.353741 1733 solver.cpp:294] Solving pelee_SSD_304x304_train I0212 21:24:34.353754 1733 solver.cpp:295] Learning Rate Policy: multistep I0212 21:24:40.287559 1733 solver.cpp:332] Iteration 0, loss = 1.68553 I0212 21:24:40.287647 1733 solver.cpp:433] Iteration 0, Testing net (#0) I0212 21:24:40.326001 1733 net.cpp:693] Ignoring source layer mbox_loss I0212 21:28:22.629385 1733 solver.cpp:546] Test net output #0: detection_eval = 0.70909 I0212 21:28:22.633003 1733 solver.cpp:337] Optimization Done. I0212 21:28:22.633077 1733 caffe.cpp:254] Optimization Done. nvidia@tegra-ubuntu:~/caffe$

This execution time was completed in 4 minutes and 4 seconds (244 seconds), and 88 fps in FPS conversion,
It's not as fast as the VOC 2007 result documented at https://github.com/Robert-JunWang/Pelee, but it seems to be quite fast.

That being said, compared to the SSD's 915 seconds, 244 seconds is about 1/4.
A valuable proof was obtained.

Summary

For the time being, I was able to implement Object Detection by Pelee so far.
You can see that the processing is quite fast.

In the next blog, following this, I will post a blog that actually operates real-time object detection by connecting a web camera as real-time processing.

As a slightly more serious disclaimer, please note that Macnica does not take any responsibility for the use of the above content.