[GPU programming using NVIDIA CUDA] Episode 1: Mechanism of GPU Computing and Parallel Processing

First of all, do you know what "CUDA" is? CUDA (Compute Unified Device Architecture) is a GPU program development environment developed by NVIDIA. By using CUDA, you can use program descriptions like C language to perform high-speed parallel arithmetic processing using multiple arithmetic units of GPU.

This parallel computing technology is based on deep learning, which has been attracting attention in the AI field in recent years. object detection It is used for various applications such as: In this series, I would like to introduce the mechanism of parallel arithmetic processing using GPU through the introduction of programming examples using CUDA.

Episode 1: Mechanism of GPU Computing and Parallel Processing
Chapter 2 Structure of a CUDA program
Episode 3 Visualization of CUDA program execution resources
Episode 4 CUDA Execution Model
Episode 5 CUDA programming with python

Data Science Data Science
Data Science
machine learning machine learning
machine learning
Computational Chemistry Computational Chemistry
Computational Chemistry

What are GPUs?

GPU is an abbreviation for Graphics Processing Unit, and is a processor that performs arithmetic processing for image rendering such as 3D graphics. In 3D graphics, each object is represented by a combination of multiple plane plates (polygons). For example, in the example below, an apple is made up of a collection of triangular polygons. (If the granularity of this polygon is finer, it can be displayed as a smoother 3D model.)

An apple made up of triangular polygons

In 3D graphics, an enormous amount of matrix calculation processing is performed at high speed for conversion and rotation of the polygon coordinates that make up each object, and images are created one after another, and the GPU executes these matrix calculations in parallel at the same time. It is an architecture with a large number of arithmetic units so that it can be used.

An apple made up of triangular polygons

In this way, GPUs are equipped with a mechanism that can simultaneously execute a huge amount of calculations, so they can be used not only for graphics processing, but also as GPU Computing and as HPC (High-performance computing) technology in various fields such as AI and scientific computing. are also attracting attention.

Applications of GPU Computing

GPU Computing is used in various fields, and one of its applications is AI processing (deep learning).

Deep learning is an AI technology that uses deep neural networks to make computers perform human intellectual behavior, and various application examples such as the following are being researched.

・Image recognition

・Object detection

・Natural language processing, etc.

 

For example, in image recognition processing, input image data is input to a neural network, and the input image is classified based on the output of the operation results. (reference link)

Specifically, it is known that the pixel data string of the input image is input to the network, and the calculation result of matrix operation at each node is output as a probability.

For example, you can input a cat image data string as follows and determine that it is "cat" assigned to out[0] with the highest probability among the output nodes.

AI processing

In the field of deep learning, there is a known method of improving performance by increasing the number of network layers. was Since neural networks have a large number of independent node matrix operations, expectations are rising for GPU technology, which has a mechanism that can execute parallel arithmetic processing using a large number of arithmetic units.

Frameworks and development tools are becoming more abundant in the deep learning area, as shown in the example linked here, and the need to directly write the CUDA coding part is disappearing. In this series, I would like to introduce the mechanism of GPU programming using a simple sample code example published by NVIDIA, which allows you to directly check the coding part of CUDA.

Run the sample code

Now let's actually run the sample code.

Information about sample code using CUDA seems to be ready for both Windows and Linux as described in this link.

This time, I will introduce how to run the Linux version demo using Jetson Xavier NX Developer Kit.

 

 

Jetson Xavier NX Developer Kit

Follow the steps in this link to create a bootable SD card and start Jetson.

(This time, we are introducing an execution example using JetPack 4.5.1.)

CUDA is already installed on the boot SD card, so

You can build and run the folder of the sample code you want to run.

Some execution examples are introduced below.

Execution example 1 (smokeParticles)

$ cp -r /usr/local/cuda-10.2/samples/ ./cuda_samples $ cd ~/cuda_samples/5_Simulations/smokeParticles $ make $ ./smokeParticles
Execution example 1

Execution example 2 (simpleGL)

$ cd ~/cuda_samples/2_Graphics/simpleGL $ make $ ./simpleGL
Execution example 2

Execution example 3 (oceanFFT)

$ cd ~/cuda_samples/5_Simulations/oceanFFT $ make $ ./oceanFFT
Execution example 3

Execution example 4 (postProcessGL)

$ cd ~/cuda_samples/3_Imaging/postProcessGL $ make $ ./postProcessGL
Execution example 4

Next time, I will explain the contents of the sample code

In this article, I introduced the mechanism of GPU Computing and parallel arithmetic processing, as well as the flow of how to run the CUDA sample code.

Next time, I will introduce the programming mechanism using CUDA by explaining the program description of the sample code.

 

Contact Us