In the previous [Necessary Knowledge for Developing Object Detection Application] series, Part 1, I introduced the procedure for running the human detection sample application published by NVIDIA on Jetson.

 

In fact, there may be requests to add datasets or modify the network model or image display application. For that purpose, we need a mechanism to realize a series of AI development processes from learning to inference as shown below. Know-how in various specialized fields is required. Specifically, the following flow will be implemented.

 

Dataset preparation: input data preparation, label data preparation

Learning: network model preparation, optimal weight settings (parameters)

Inference: High-speed inference execution on low-power devices, creating applications that run at expected performance

 

NVIDIA provides the NVIDIA TAO TOOLKIT as a system that provides total support for application development from learning to inference. In Part 2, we will deepen your understanding of the NVIDIA TAO TOOLKIT, which is the key to development, and introduce recommended hardware and the operation check of the sample application of NVIDIA TAO TOOLKIT.

NVIDIA TAO TOOLKIT

NVIDIA TAO TOOLKIT is a toolkit that supports the creation of advanced AI applications through transfer learning using trained models.

NVIDIA TAO TOOLKIT has functions for various use cases such as object detection, image classification, segmentation, natural language processing, etc. This time, we will use an example of object detection. We will also introduce "AUGMENTATION" and "PRETRAINED MODEL", which are characteristic functions in the following transfer learning flow. The explanation after "TRAINING" will be explained in the next article.

 

1: AUGMENTATION (data extension)

2: PRETRAINED MODEL (learned model)

3: TRAINING Explained in Episode 4

4: PRUNE (model weight reduction by pruning) Explained in Episode 5

5: QUANTIZE (Model optimization by quantization) Explained in Episode 5

6: MODEL EXPORT (generation of inference model) Explained in Episode 5

Source: NVIDIA

What is transfer learning in the first place?

As a preface, as mentioned in the learning flow introduced at the beginning, specialized knowledge is required to realize a highly accurate network model. Various network models have been developed in the field of deep learning, and many have been published. optimization is important.

This is done in the learning process, and network model learning is often compared to human learning.

 

In order for humans to learn what they see, we need a lot of experience and a good teacher. be able to guess. Even in neural networks, in order to tune the parameters from the initial state to the parameters that can demonstrate performance, it is necessary to perform a huge amount of experience (learning) with a lot of data and correct teachers (correct labels), and an environment that realizes this learning preparation and execution is a challenge.

 

Transfer learning is a method to quickly and efficiently generate a model that demonstrates performance in a target use case by utilizing a trained model that has already been generated by performing a large amount of training, rather than learning from a network model in the initial state. It's a good way to do it.

 

Specifically, when implementing a model that identifies the type of dog, we can consider using a trained model that identifies the type of cat. It is possible to apply the learning results of common characteristics of animals such as the shape of the face and the shape of four legs, and to re-learn the unique differences necessary to distinguish between dog breeds, so that the model can be realized efficiently. There is a nature.

AUGMENTATION (data augmentation)

In deep learning, learning is performed by comparing the output result obtained for the input data with the correct label. It can be expected that correct inference results can be obtained from data that is similar to the data used during training, but a phenomenon occurs in which correct inference results cannot be obtained for inputs with large differences.

For example, the same image data may be inferred as a different input image just by rotating it or changing the color tone. Humans naturally see and learn the same thing under various environmental conditions (angles, colors, etc.), but computers need to prepare input data with such diverse variations during learning.

 

By using the AUGMENTATION function of NVIDIA TAO TOOLKIT, it is possible to automatically generate data for various environmental conditions such as rotation and color shades from images in existing datasets, and perform learning under various environmental conditions. becomes.

Examples of AUGMENTATION

PRETRAINED MODEL

In deep learning, the network model is the part that corresponds to the human brain and is extremely important. As explained in transfer learning, not only the model structure but also the parameter adjustment of the model nodes greatly affects the inference performance. At NVIDIA, not only general models but also models that can be used with NVIDIA TAO TOOLKIT, various network models including parameters learned and optimized with huge data sets to support various use cases as shown below. are preparing.

For example, models that are good at detecting people include PeopleNet, which is trained using a large amount of human data.

 

Model

Network architecture

Use cases

DashCamNet

DetectNet_v2-ResNet18

Detect cars, people, street signs, and bicycles

FaceDetect-IR

DetectNet_v2-ResNet18

face detection

PeopleNet

DetectNet_v2-ResNet34

Detect people, bags and faces

Traffic Cam Net

DetectNet_v2-ResNet18

Detect cars, people, road signs, and motorcycles

Vehicle MakeNet

ResNet18

Classification of car models

VehicleTypeNet

ResNet18

Vehicle type classification

Source: NVIDIA

Operation check of NVIDIA TAO TOOLKIT sample application

Using the recommended hardware below, run the NVIDIA TAO TOOLKIT3.0 sample application published at this link.

(Since this is an application created with the conventional NVIDIA TRANSFER LEARNING TOOLKIT, it is necessary to modify some commands when executing.)

Recommended hardware for NVIDIA TAO TOOLKIT

In NVIDIA TAO TOOLKIT training, it is recommended to prepare a hardware environment equipped with a large amount of memory and a high-performance GPU as shown below in order to perform network training of a huge number of nodes. Please see this link for details.

NVIDIA A100
NVIDIA Tesla V100

This is an implementation example of an application that incorporates the following two-stage inference processing into a video stream captured by a camera connected to Jetson.

・Inference processing 1: Hands are detected (Images of the hands are cut out in later processing.)

・Inference processing 2: Identify the shape of the extracted hand and judge the following gestures.

(Thumbs Up, Fist, Stop, Ok, Two, Random)

Source: NVIDIA

In this execution result, we can confirm that both hands are detected by the green bounding Box in inference process 1, and that the right hand is identified as OK and the left hand as Stop in inference process 2. In Part 1, we introduced an execution example of a person detection application, but this time, after recognizing hands, more advanced inference processing such as determining the meaning of the gesture is possible. NVIDIA TAO TOOLKIT makes it easy to configure inference using multiple models, including network models.

 

In inference process 1, the network model was generated by performing transfer learning with NVIDIA TAO TOOLKIT, and in inference process 2, the existing model file (GestureNet) was used as is. In the next article, I will focus on inference processing 1, and break it down into the dataset preparation, learning, and inference phases, and introduce the flow of development until inference processing 1 is executed.

From the next time, we will thoroughly unravel this sample app application!

In this article, I introduced what the NVIDIA TAO TOOLKIT is, and even introduced the operation of hand detection and gesture recognition applications using the NVIDIA TAO TOOLKIT. How was it?

From the next time, I will finally explain the flow of building the environment necessary for learning and preparing the dataset while unraveling the sample application introduced in this article.

If you are interested, please click the button below to read the rest of the series.

If you are considering introducing AI, please contact us.

For the introduction of AI, we offer selection and support for hardware NVIDIA GPU cards and GPU workstations, as well as face recognition, wire analysis, skeleton detection algorithms, and learning environment construction services. If you have any problems, please feel free to contact us.

 

*The execution examples introduced in the article are subject to change due to future software and hardware updates.

In addition, we cannot accept detailed inquiries regarding sample programs and source codes. Please note.