I tried running YOLOv5 with Qualcomm Snapdragon

Try running the object detection algorithm YOLOv5 on an edge device

Let's run YOLOv5, an object detection algorithm, on TurboX™ C610 Open Kit equipped with Qualcomm® QCS610 SoC. YOLO (You Only Look Once) is one of the most famous object detection algorithms due to its speed and accuracy, and YOLOv5 is the latest object detection model based on PyTorch.

In the procedure introduced here, the GStreamer plug-in (qtimlesnpe) provided by Qualcomm is used to perform AI inference processing using the hardware accelerator (SNPE: Snapdragon Neural Processing Engine) in the SoC.

Related link: Qualcomm® QCS610 SoC

TurboX™ C610 Open Kit


Equipment used:

・Linux PC (Ubuntu 18.04)

・TurboX™ C610 Open Kit (LCD, Camera)

・USB cable (Type-A to Type-C)

Advance preparation

Regarding the Linux PC environment setup, it is assumed that the following has been completed in advance, and this time we will introduce the subsequent steps.

・Install Ubuntu 18.04 (Reference link: Install Ubuntu using WSL2)

・Install Miniconda3 (Reference link: Install Miniconda3 on Ubuntu)

・Download and build LE SDK for TurboX™ C610 Open Kit, write image to TurboX™ C610 Open Kit

For the LE SDK version, LE1.0.CS.r002007.1 is used in this procedure.

・Download Qualcomm Neural Processing SDK for AI (aka SNPE SDK)

For the SNPE SDK version, snpe-1.58.0_3160 is used in this procedure.

Environment setup

Export trained model to ONNX

Follow the steps below on your Linux PC to create a virtual environment with conda, install the necessary packages, and export the YOLOv5 PyTorch training model to ONNX.

hostPC$ conda create --name snpe_yolov5 python=3.6.8
hostPC$ conda activate snpe_yolov5
hostPC$ pip install onnx==1.6.0
hostPC$ pip install onnx-simplifier==0.2.6
hostPC$ pip install onnxoptimizer==0.2.6
hostPC$ pip install onnxruntime==1.1.0
hostPC$ pip install numpy==1.16.5
hostPC$ pip install protobuf==3.17.3
hostPC$ conda install pytorch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0 cpuonly -c pytorch
hostPC$ pip install torch==1.10.0
hostPC$ pip install torchvsion==0.11.1
hostPC$ mkdir snpe_yolov5
hostPC$ cd snpe_yolov5
hostPC$ git clone https://github.com/ultralytics/yolov5.git
hostPC$ cd yolov5
hostPC$ git checkout v6.0
hostPC$ pip install -r requirements.txt
hostPC$ pip install coremltools>=4.1 onnx==1.6.0 scikit-learn==0.19.2
hostPC$ python ../yolov5/export.py --weights ../yolov5/yolov5n.pt --optimize --opset 11 --simplify
hostPC$ ../

Conversion from ONNX to DLC format

Follow the steps below to set up SNPE SDK on your Linux PC and convert from ONNX to DLC format.

hostPC$ unzip snpe-1.58.0_3160.zip hostPC$ source snpe-1.58.0.3160/bin/dependencies.sh hostPC$ source snpe-1.58.0.3160/bin/check_python_depends.sh hostPC$ export ANDROID_NDK_ROOT=/<Android NDKを展開したPATH>/android-ndk-r17c/ hostPC$ export ONNX_DIR=/<Miniconda3のインストールPATH>/miniconda3/envs/snpe_yolov5/lib/python3.6/site-packages/onnx/ hostPC$ cd snpe-1.58.0.3160/ hostPC$ source bin/envsetup.sh -o $ONNX_DIR hostPC$ cd ../yolov5 hostPC$ snpe-onnx-to-dlc -i ../yolov5/yolov5n.onnx --out_node 326 --out_node 379 --out_node 432

<Supplement>

・DLC (Deep Learning Container): A file format for use with Snapdragon NPE (Neural Processing Engine) Runtime.

・Android NDK download link: android-ndk-r17c, PATH is specified for SNEP SDK setup,
We will not use the Android NDK itself in this procedure.

・Since SNPE does not currently support 5D operator, when converting from ONNX to DLC format, output nodes before 5D Reshapre
specified. As shown in the figure below, in the case of yolov5n.onnx used in this procedure, 326 (Conv_198), 379 (Conv_232), 432 (Conv_266)
It was. Regarding the output nodes, Netron (Neural network model visualization tool).

Fig. Confirmation result with Netron


Once the conversion to DLC format is complete, exit the running conda virtual environment.

hostPC$ conda deactivate

Enabling qtimlesnpe and adding Post-Processing for YOLO

Using LE SDK for TurboX™ C610 Open Kit, enable qtimlesnpe (GStreamer plugin) and add Post-Processing for YOLO.

hostPC$ cd /<LE SDKのダウンロードPATH>/apps_proc/poky/meta-qti-bsp/conf/distro/
hostPC$ vi qti-distro-fullstack-virtualization-debug.conf
以下の一行をqti-distro-fullstack-virtualization-debug.confの末尾に追加しファイル保存: DISTRO_FEATURES_append = " qti-snpe"
hostPC$ cd /<LE SDKのダウンロードPATH>/apps_proc/poky/meta-qti-ml-prop/recipes/snpe-sdk/ hostPC$ mkdir files hostPC$ cp snpe-1.58.0_3160.zip /<LE SDKのダウンロードPATH>/apps_proc/poky/meta-qti-ml-prop/recipes/snpe-sdk/files/ hostPC$ cd /<LE SDKのダウンロードPATH>/apps_proc/poky/meta-qti-ml-prop/recipes/snpe-sdk/files/ hostPC$ unzip snpe-1.58.0_3160.zip hostPC$ mv snpe-1.58.0_3160 snpe hostPC$ cd /<LE SDKのダウンロードPATH>/apps_proc/poky/meta-qti-ml-prop/recipes/snpe-sdk/files/snpe/lib/aarch64-oe-linux-gcc8.2/ hostPC$ rm libatomic.so.1

Post-Processing to be added is written in the form of yolov5n.patch in this procedure.

Please contact us if you would like to obtain yolov5n.patch.

hostPC$ cp yolov5n.patch /<LE SDKのダウンロードPATH>/apps_proc/src/vendor/qcom/opensource/gst-plugin-qti-oss/
hostPC$ cd /<LE SDKのダウンロードPATH>/apps_proc/src/vendor/qcom/opensource/gst-plugin-qti-oss/
hostPC$ patch -p1 < yolov5n.patch

qtimlesnpe rebuild

Follow the steps below to rebuild qtimlesnpe (GStreamer plugin).

hostPC$ export SHELL=/bin/bash
hostPC$ cd /<LE SDKのダウンロードPATH>/apps_proc/
hostPC$ MACHINE=qcs610-odk-64 DISTRO=qti-distro-fullstack-virtualization-debug
hostPC$ source poky/qti-conf/set_bb_env.sh
hostPC$ bitbake -c install gstreamer1.0-plugins-qti-oss-mle

Setting up the TurboX™ C610 Open Kit

Connect the PC and TurboX™ C610 Open Kit with a USB cable, and use adb to push the necessary files to the TurboX™ C610 Open Kit.


- Transfer of qtimlesnpe related library.

hostPC$ adb root
hostPC$ adb disable-verity
hostPC$ adb reboot
hostPC$ adb wait-for-device
hostPC$ adb root
hostPC$ adb remount
hostPC$ adb shell mount -o remount,rw /
hostPC$ cd /<LE SDKのダウンロードPATH>/apps_proc/build-qti-distro-fullstack-virtualization-debug/tmp-glibc/work/aarch64-oe-linux/gstreamer1.0-plugins-qti-oss-mle/1.0-r0/build/mle_engine/
hostPC$ adb push libEngine_MLE.so /usr/lib/
hostPC$ cd /<LE SDKのダウンロードPATH>/apps_proc/build-qti-distro-fullstack-virtualization-debug/tmp-glibc/work/aarch64-oe-linux/gstreamer1.0-plugins-qti-oss-mle/1.0-r0/build/mle_gst_snpe/
hostPC$ adb push libgstqtimlesnpe.so /usr/lib/gstreamer-1.0/

- Transfer of models, labels, test video files, and config files.

hostPC$ adb push yolov5n.dlc /data/misc/camera/
hostPC$ adb push coco_labels.txt /data/misc/camera/
hostPC$ adb push test.mp4 /data/misc/camera/
hostPC$ cd /<LE SDKのダウンロードPATH>/apps_proc/src/vendor/qcom/opensource/gst-plugins-qti-oss/gst-plugin-mle/mle_gst_snpe/
hostPC$ adb push mle_snpeyolov5n.config /data/misc/camera/

<Supplement>

・Download link for label data: coco_labels.txt

・Regarding the contents of mle_snpeyolov5n.config, modify the value of output_layers according to the model to be used.
Corrected to Conv_198(326), Conv_232(379), Conv_266(432) in this procedure.

AI inference processing by object detection algorithm YOLOv5

Let's run YOLOv5 object detection on TurboX™ C610 Open Kit.
Here are some command examples for running GStreamer.

(1). Save the result of AI inference processing in mp4 file format.

(Open Kit)# gst-launch-1.0 filesrc location=/data/misc/camera/test.mp4 ! qtdemux name=demux demux. ! queue ! h264parse ! qtivdec ! video/x-raw\(memory:GBM\) ! qtimlesnpe config=/data/misc/camera/mle_snpeyolov5n.config postprocessing=yolov5detection ! qtioverlay bbox-color=0x00FFFFFF ! queue ! omxh264enc control-rate=max-bitrate target-bitrate=6000000 interval-intraframes=29 periodicity-idr=1 ! queue ! h264parse ! mp4mux ! queue ! filesink location=/data/misc/camera/output.mp4

When the process is completed, a video file called output.mp4 will be created in the directory specified by filesink location. The label and bounding Box of the detection result are drawn as shown in the video file below.


(2). AI inference processing is performed on the input video from the camera in real time, and the detection results are overlaid and output to the LCD.

(Open Kit)# export XDG_RUNTIME_DIR=/dev/socket/weston (Open Kit)# mkdir -p $XDG_RUNTIME_DIR (Open Kit)# chmod 0700 $XDG_RUNTIME_DIR (Open Kit)# weston --tty =1 --idle-time=0 & (Open Kit)# gst-launch-1.0 qtiqmmfsrc ! video/x-raw\(memory:GBM\), format=NV12, width=1280, height=720, framerate=30/1, camera=0 ! qtimlesnpe config=/data/misc/camera/mle_snpeyolov5n.config postprocessing=yolov5detection ! qtioverlay bbox-color=0x00FFFFFF ! qtivtransform rotate=1 ! waylandsink async=true fullscreen=true


This time, I tried running YOLOv5, an object detection algorithm, on TurboX™ C610 Open Kit equipped with Qualcomm® QCS610 SoC.

We would like to introduce various examples in the future.

When customers develop "edge devices capable of executing camera + AI inference processing", there are increasing opportunities to consider Qualcomm's Snapdragon platform as a leading candidate. Qualcomm has many high-performance SoC lineups for IoT devices/embedded devices, so please contact us if you would like more information.

Inquiry / Quotation

For product inquiries and development kit estimates, please use the link below.

To Qualcomm manufacturer information Top