5分でわかる姿勢推定モデルと応用事例 - スマートマニュファクチャリング

This article is recommended for those who

I want to put cutting-edge AI technology into practical use in business
I want to know the outline of the posture estimation model
I want to know application examples of pose estimation models

Time needed to finish reading this article

5 minutes

Introduction

Hello, I'm Makky from Macnica AI Women's Club!
AI continues to grow exponentially.
In China, payment methods using QR codes are no longer the latest, and facial recognition payment methods are starting to spread.
Image recognition, like this face recognition, is more familiar and easy to imagine in daily life, but AI is also widely used in the manufacturing industry for abnormality detection and other purposes.

Therefore, this time, I would like to briefly explain the "posture estimation" model, which is one of image recognition, and then introduce the algorithm.

What kind of model is the posture estimation model?

This pose estimation model is also known as "human body detection". This model learns human joint points from still images, and can also detect human poses connecting joint points in real time from still images and videos. Model.

Autonomous autonomous driving technology is easy to imagine as a use case of the attitude estimation model, but there are many other cases that utilize the merits of attitude estimation.
For example, conventional posture estimation models may overlap people or hide parts of the body depending on the angle of the image. Currently, it is difficult to correctly detect human joint points from images.
However, recent pose estimation models do not use high-performance cameras, estimate depth (depth), and can accurately detect overlapping human joint points.
These technological advances have enabled pose estimation models to be used in a variety of fields, from applications in the fields of sports and security to analysis of flow lines at event venues and factories.

However, AI in the field of image recognition, including pose estimation models, has a large learning cost (learning time, data required for learning), and it is not easy to improve accuracy.
However, just as any AI model requires accuracy, what is also required in the field of image recognition is "detection accuracy", and this is not limited to object detection, which is often known as image recognition, but pose estimation as well.

Learn more about pose estimation models

さて学習コストが低く、精度の高い姿勢推定モデルについてご紹介...といきたいところですが、まずは一般的にどんなタイプが存在するか説明します。
姿勢推定モデルは大きく２つのタイプ、ボトムアップ型もしくはトップダウン型に分類されます。
これらは人間の関節点を検出するための計算順をタイプ別に分けたものになります。

bottom-up

The bottom-up type is a model generated by an algorithm that follows the steps below.

1: Identify all key points in the image
2: Match and connect for each person

By identifying all the key points (human joint points) that exist in the image at the beginning, it is characterized by being easier to reduce the calculation cost during learning than the top-down method described later.
However, after extracting the keypoints, it is necessary to perform a huge amount of pattern matching in order to match the optimal keypoints for each person. Therefore, it is difficult to improve the accuracy of matching, such as false detection of overlapping parts of people.

top down type

The top-down type is a model that detects the joint points of a person and estimates the pose using the following procedure.

1: Detect people with object detection algorithm
2: Estimate pose for each person

Since the pose is estimated for each person detected in step 1, even images in which people overlap can be estimated with higher accuracy.
However, as you can imagine from this procedure, the two processes of human detection and pose estimation are performed, so the calculation time is also a problem.

fast! high! Posture estimation model

I have explained two types, the bottom-up type and the top-down type, but of course there are models that have the advantages of both types.

さっそく今回は「学習コストが比較的低く」「より高い精度を求めることができる」良いところ取りの姿勢推定モデルの中から、Pose Proposal NetworkというDeep Learningモデルをご紹介します。

論文URL：http://taikisekii.com/PDF/Sekii_ECCV18.pdf

Pose Proposal Networkは前章でご紹介したタイプで分けるならばトップダウン型に分類され、物体検知アルゴリズムを使用して人物を検出した後に、各人物ごとの関節点を検出し、その関節点を結び合わせることで姿勢推定を行います。
トップダウン型のため計算量が多い手法ではありますが、

✓	Base the object detection algorithm of the deep learning network for detecting people on YOLO v3, which has "fast learning and high accuracy"
✓	Based on OpenPose, which ``simplifies the network structure'' and ``highly accurate detection of joint points,'' which performs learning by matching the connection information between joint points in addition to the coordinates of the joint points in order to detect the joint points of a person. to do

With these two innovations, it is possible to reduce the learning computation cost to some extent.

出典：“Pose Proposal Networks”,
キャプション：“Fig. 2. Pipeline of our proposed approach. Pose proposals are generated by parsing RPs of person instances and parts into individual people with limb detections (cf. § 3).”
http://taikisekii.com/PDF/Sekii_ECCV18.pdf

Operation impression of Pose Proposal Network

I actually used the Pose Proposal Network algorithm and proceeded with learning.

The dataset used is the “MPII Human Pose” described in the paper, which includes 24,000 image data and over 40,000 annotation data (coordinate data at joint points of people).
This time, we used YOLOv3 based on Mobilnet v2 and ResNet as the network structure for object detection.

When I actually ran the training for model generation, I felt that the network was light, that is, the training time was relatively short.
Furthermore, the combination of YOLOv3 and OpenPose does not make the network more complex, so the size of the model itself is smaller than models trained with other algorithms.

Possibilities of Posture Estimation Models Seen from Application Cases

Now that we have a general understanding of what pose estimation models are, let's consider the possibilities of pose estimation models from specific application examples.
As I mentioned earlier, this model has been applied to various fields, but it is specifically used for the following purposes.

・Detection of pedestrians in the field of autonomous driving
・Use for movement analysis and scoring methods in the world of sports and dance
・For security purposes, monitor for suspicious movements of people
・Characteristic analysis of group behavior from posture information of multiple people (flow line analysis, etc.)

These application examples will be further developed, and in the future it will be used for behavior analysis and applied to robots.
It is also possible that it will become easier to apply AI to more familiar issues. For example, it is conceivable that AI will replace humans in many of the tasks of monitoring video for long periods of time to ensure safety.
Furthermore, the pose estimation model is characterized by its ability to detect joint points such as the head, arms, hips, and knees in detail, and can be used for product development and the succession of craftsmanship.

Summary

今回は、アイデア次第で活用の幅が広がる姿勢推定モデルについてご紹介しました。
アイデア次第...、私にとっては簡単なようで難しいポイントです。
いざという時に柔軟な考え方ができるように、幅広いジャンルの論文や事例を読んだり、さまざまな人の意見を聞いたり、はたまたファンタジー映画を観て空想に浸ったり...。
日ごろから脳のさまざまな部位を使っていこうと思いました！

Sources of AI papers featured in this article / Reference Lists
Taiki Sekii. “Pose Proposal Networks” http://taikisekii.com/PDF/Sekii_ECCV18.pdf

マクニカのARIH（AI Research & InnovationHub）では、最先端のAI研究・調査・実装による評価をした上で最もふさわしいAI技術を組み合わせた知見を提供し、企業課題に対する最適解に導く活動をしています。
詳細は下記よりご覧ください。

Click here for details on ARIH