This article is a total of 71 articles from the online seminar "Let's create a lost property detection system that applies object detection - Algorithm selection, Jetson implementation, and deployment -" held on April 26, 27, and 28, 2021. After receiving your questions, we will carefully select the most frequently asked questions from our customers and introduce the answers.

 

Click here to view the entire event

 

We received 45 questions in the Jetson implementation version. In this seminar, we built an application based on NVIDIA Jetson Xavier NX, so we received many questions about the basic operation of Jetson, as well as DeepStreamSDK and TensorRT provided by NVIDIA.

In the deployment section, we received 26 questions, and it was an impression that they were asking about problems they had while developing using Docker and CUDA.

 

I couldn't write the answers to all the questions on this page, but I hope that it will be helpful for future development.

Jetson implementation

What is the performance difference between AGX Xavier and Xavier NX?

Depending on the model, AGX Xavier has roughly twice the performance of Xavier NX.

Please refer to Jetson's benchmark link below.

https://developer.nvidia.com/embedded/jetson-benchmarks

If you have already installed a past version of the SDK, is it possible to update it to the version of the SDK you told me about this time?

If it is a version that supports docker, I think it is possible by using docker.

However, in order to avoid trouble, we recommend that you reinstall the new JetPack version if you are using an older JetPack version.

Is there a specific way to measure the distance in the depth direction? What kind of accuracy can be obtained?

We believe that it may be possible to estimate depth from stereo images using a stereo camera, or to combine camera images with information from sensors that can measure distance, such as millimeter-wave radar.

However, the accuracy and other factors will change depending on the camera and sensor used, the distance to the object to be measured, etc., so verification through PoC is necessary.

When running the algorithm adopted this time on Jetson Xavier NX, it was said that there are about 4 camera streams. Does that mean that images from 4 cameras can be processed simultaneously?

yes. As you know.

What is the difference in processing speed between converting to TensorRT and not converting?

In this lost property detection system, we use only the data converted to TensorRT format, and we have not checked the operation without conversion.

For reference, please refer to the table linked below for a simple comparison between PyTorch and TensorRT.

https://github.com/NVIDIA-AI-IOT/torch2trt

Models that run on Jetson are recognized to work only with TensorRT, but is that correct? Also, if it is possible to convert to TensorRT, basically any model other than object detection will work?

Even models that have not been converted to TensorRT format can run on Jetson.

The reason for converting the model to TensorRT format is to optimize the model for Jetson and operate at high speed.

Also, we believe that any model that can be converted to TensorRT format will work on Jetson.

I'm assuming that Jetson uses models trained on GPUs, etc., but is there any difficulty in not working due to the difference between x86 and ARM?

I don't think there is any particular problem. This is because there is no difference between x86 and Jetson in terms of TensorRT API.

However, there may be cases where it does not work due to factors such as insufficient performance, so we think that it is necessary to thoroughly check the operation.

Is YOLOV4 accelerated by using DeepStramSDK like the source below? https://github.com/NVIDIA-AI-IOT/yolov4_deepstream/tree/master/deepstream_yolov4

This time, DeepStramSDK is not used, and TensorRT is used to optimize the model and speed it up.

Is it always necessary to convert from ONNX to TensorRT on the actual machine?

yes. Conversion from ONNX format to TensorRT must be done on the actual machine.

The reason is that the format is slightly different for each platform when converting from ONNX format to TensorRT.

This time, Yolo → ONNX → TensorRT conversion was performed on Jetson.

Deployment

Previously, there were Docker and nvidia-Docker, and the latter was GPU compatible. Now, I have heard that Docker can support GPUs, but if the hardware (GPU itself) supports it, is it possible to introduce different versions of CUDA and CuDnn for each container?

In this seminar, the explanation was limited to the case of Jetson.
The situation is different for Jetson and for NVIDIA GPU cards (NVIDIA dGPU).

 

[For Jetson]
Please use the Docker that is automatically installed by JetPack.
At present (JetPack 4.5.1), only the versions of CUDA and cuDNN included in the host-side JetPack can be used from within the container due to the mechanism for referring to CUDA and cuDNN on the host side from within the Docker container.
Please note that this system is current and is subject to change.

 

 

[When using NVIDIA GPU card (NVIDIA dGPU) in Linux/AMD64(x86-64) environment]
Docker natively supports GPUs since version 19.03. By installing the NVIDIA Container Toolkit after installing this official Docker, you can use NVIDIA GPU cards from within the Docker container.

For detailed installation instructions, see NVIDIA Container Toolkit: Installation Guide.

To install multiple versions of CUDA and cuDNN, please use the CUDA images published by NVIDIA NGC. There are multiple container images categorized by tags that indicate version combinations.

Can I use different versions of CUDA and CuDNN on one Jetson?

As mentioned above, at the moment (JetPack 4.5.1), only the versions of CUDA and cuDNN installed on the host side can be used.

Can I use it in an environment with less memory such as Nano? Is it possible that Docker is using memory and can't run your model?

Works on Jetson Nano too. Section 1 - NVIDIA Deep Learning Institute's Getting Started with AI on Jetson Nano course
We will also use Docker to set up the teaching material environment. Please try.

Is the image running with docker run? container? Is not it?

Generate a Docker container from the specified Docker image and start that Docker container.

Is it possible to use it without connecting to the network?

A network connection is required when fetching a Docker image from a registry, or when installing a library from the internet into a Docker base image and building a Docker image.

Do you have any tips/methods for finding the image that is suitable for what you want to do from among the many available images?

Start by looking in the NVIDIA NGC catalog. Docker images for Jetson can be found by searching for the keyword "L4T".

Many databases, web servers, and various middleware are published on Docker Hub. The ones with the most downloads and stars are the most popular. I think that the popular ones are easy to use because there is a lot of information on how to use them on the Internet.

Is it safe to assume that introducing CUDA inside the Docker image will not break CUDA on the system? I understand that it's better not to do it on Jetson, but I'm not sure if it's okay to do it on other machines. .

I will answer assuming that the NVIDIA GPU card (NVIDIA dGPU) is used in a Linux/AMD64(x86-64) environment.
Volume mounts do not break the host's CUDA environment unless the host's CUDA-related directories are shared with the Docker container. Not only in the case of CUDA, volume mounting has the risk of accidentally overwriting or deleting files on the host side, so please use it with caution.
When introducing multiple versions of CUDA environment, we recommend using the CUDA image published by NVIDIA NGC rather than installing CUDA in the Docker container.

Is it possible to save a new image based on a large number of publicly available images, add your own settings to it, and generate a container from it?

yes. You can use it that way too.
You can convert a container into an image with the docker commit command and save it as a file with the docker save command.

Since it is an all-in-one configuration, I think that various services are running with Docker, but when considering a configuration that is a little closer to the actual production, is it better to let only inference be performed on Jetson?

For applications that do inference processing that is also CPU intensive, you are right, I think it's better to use Jetson only for inference. Also, if you have Jetsons in multiple locations and want to centrally manage them from a single dashboard, I think it would be easier to manage the dashboard service on a separate computer.

In the application introduced in this seminar, all services were executed on a single Jetson, so it could be managed with Docker Compose. A compatible platform is required.

There is also an on-demand video for those who want to look back on the seminar.

We have carefully selected and introduced some of the questions and answers. The main part of the seminar is also preparing an on-demand video. I would appreciate it if you could refer to it in the future.

Sample code is posted on GitHub

Details of the applications introduced in this seminar are posted on the GitHub repository.
https://github.com/MACNICA-CLAVIS-NV/abandoned_object_detection

You can download the sample code from the link.