Embedded AI Introductory Blog (4) "I used OIDv4 to see how learning data works" ~Implementing human detection AI with low power consumption and small FPGA~

In the previous article, we built a Linux environment for training the neural network model used in Lattice's human detection reference design (Object Counting).


Since I did something I'm not used to, my HP has already been cut considerably...

In the previous article, I learned that Google's Open Image Dataset V4 (=OIDv4) was used in Lattice's reference design, and I knew it would be very useful, but I didn't have a Linux environment or a Python execution environment. Therefore, it was not possible to execute download commands. However, it seems that I can finally confirm this, so I will do it immediately.

Download Open Image Dataset v4

First, download Open Image Dataset v4 (hereafter referred to as OIDv4). You can download it with the command below or directly from the Github site.

$ git clone https://github.com/EscVM/OIDv4_ToolKit.git

Looking at the downloaded contents, there is a python code called main.py, and it seems that training data can be downloaded by executing this.

Also, in the text file called requirements.txt included in the download, it seems that the libraries required for using OIDv4 are described, so install these libraries as well.

However, only opencv-python among the above could not be installed with conda install. However, if opencv is installed, there seems to be no problem, so I will proceed for the time being.

Since this time it is a person detection, training data of "person" is required, but it seems that it is defined in the class "Person" in OIDv4. It seems that you can obtain the learning data of the Person class by executing main.py as follows.

When you execute the command, you will be asked to select Yes or No several times on the way, but proceed with Yes for all.

The download started and after a while the download completed. The total download capacity is about 2.1GB, and it took quite a while to download. I used a computer with Intel Corei7 and 16GB of memory, but in my environment it took about 30 minutes to complete the download.

The downloaded learning data was saved in the following folder structure.

Relationship between jpg files (learning images) and txt files (labels)

A lot of jpg and txt files were downloaded in the Person and Label folders. In this case, 6436 pieces of data are downloaded. A txt file with the same name as the jpg file name is stored.

I will pick one up and take a look at the contents. First of all, when you open the jpg file, it is a picture of two men standing side by side.

On the other hand, when you open a txt file with the same name as this jpg file, it has the following contents.

When the above coordinate information is superimposed as a rectangle on the jpg image, it is as follows. You can see that it shows the coordinate information where people exist.

In this way, we use a large number of pairs of the object to be detected, the txt file that describes the coordinate information of the object, and the image data (jpg file) with the same name, and execute the learning process of the neural network model. That's what I'm going to do. It seems that "annotating" means linking information such as the type of object in the image and its coordinates to the data such as the image that is the basis of learning.

It is very difficult to manually annotate such a large amount of data, but if you have pre-annotated data like this, your work efficiency will increase considerably. I think that it is quite difficult to collect the data itself in the first place, so it is very convenient.

In addition, this time we use the data of the person (Person), but in addition to the common Cat, Dog, Car, ice cream, Beer, Dinosaur, etc., there are 600 in total as Open Image Dataset V4 It seems that the type of thing object is prepared. By replacing the data set, it seems to be able to detect various objects.

However, now that we understand how training data works, what kind of neural network should be prepared to enable things like human detection? Next time, while looking at the Python code for training prepared by Lattice, what is a neural network model? I would like to find out.

Inquiry

Please feel free to contact us if you have any questions about the evaluation board or sample design, or if there is anything you would like us to cover in this blog!

AI-related information