Embedded AI Introductory Blog (5) "Looking at the configuration of a neural network model" ~ Implementation of human detection AI with low power consumption and small FPGA ~

In the previous article, I examined the contents of the training data used in Lattice's human detection reference design (Object Counting).

 

What is the content of the neural network model this time? I would like to investigate.
But what exactly is a neural network model? That's what it feels like...

What is a neural network model?

Recently, AI has become a hot topic, and along with that, we often hear the term “neural network model”. If you look it up on the internet, you'll find many sites that give more accurate and detailed explanations than mine, so I think it's a good idea to look at those for more details, but roughly speaking, it's something like the following: It's like a thing

 

  • A mathematical model of the structure of a neural network composed of connected nerve cells (neurons) in the human brain
  • A number of things called layers are connected to form one neural network model (the reason why it is called Deep Neural Network or Deep Learning)
  • Each layer has a parameter called weight
  • A certain data is given as an input to the neural network model, and calculation is performed while multiplying the weight value in each layer to calculate the feature value of the given input data.
  • There is an answer (label) that is set with the input data, and when the model is trained, the weights are adjusted by going backwards through the network model so that the difference between the value calculated by the model and the actual answer is close ( error back propagation)

 

Also, there are several types of neural networks. Some popular ones include:

  • CNN (Convolutional Neural Network)
  • RNN (Recurrent Neural Network: Recurrent neural network. Used for forecasting using time series data)
  • LSTM (Long short-term memory)
  • GAN (Generative Adversarial Network: Adversarial generative network. It consists of two networks, the Generator and the Discriminator, and the Discriminator determines the success or failure of what the Generator generates.)

What is CNN (Convolutional Neural Network)?

Lattice supports CNN in the above. In CNN, there is a filter containing weight information called a kernel, and basically the data consists of 3 vertical x 3 horizontal or 5 vertical x 5 horizontal. This kernel is applied to the input data, multiplication and addition are performed, and feature values are extracted. Since this process is called convolution, the neural network that uses convolution is called CNN.

It is very similar to the noise removal process that is often done in image processing.

Due to the above mechanism, CNN is very popular in the field of image recognition.

In addition to Convolution, CNN also has a process called Pooling, which has a mechanism that makes it easier to extract features regardless of where the object is in the image by thinning out the data. The figure below is an example of Max Pooling (a pooling method that adopts the maximum value within the kernel range) when the kernel size is 2x2 and the stride (amount to shift the kernel) is 2. From the original input data, we can see that the neighboring features are inherited while the output data is reduced by a factor of 4.

Functions such as Convolution and Pooling are called layers, and by connecting these, a neural network model is constructed. Looking at the example of the reference design for human detection this time, it seems that the neural network model is configured in the following form.

RD-02207 CrossLink-NX Human Counting Using MobileNet v1/v2 Reference Design User Guide
Fig4.2 Quoted from Human Counting MobileNet v2 Training Topology

In fact, the configuration of such a neural network model seems to be described as follows on the Python code.
(In "mobilenetV2_crosslink_nx_vnv_training.zip (neural network model training environment)" downloaded in the second session
Described in \\src\nets\squeezeDet.py _add_forward_graph())

I've become very academic, so my head has become quite a flower garden.

After staring at Python code and documentation, I was able to get a vague understanding of how a neural network model would be constructed. It's still nothing but...
However, in the case of this reference design, was a person detected? In addition, I think that it will be possible to infer the coordinate information of the person if it can be detected, but in what form will the results be output from the neural network model?

Next time I'd like to explore that area a little more.

Inquiry

Please feel free to contact us if you have any questions about the evaluation board or sample design, or if there is anything you would like us to cover in this blog!

AI-related information