Introduction

Due to the spread of the COVID-19 infection, our lifestyle has changed significantly.
If your lifestyle changes, your attitude towards life will also change.
Personally, I feel that there has been a change in society's awareness of "healthcare" over the past year. I myself manage more biodata on my device than before, and have become health conscious.

Therefore, this time, I, AI Girls Club Makky, will deliver the theme of "Gaze Estimation" related to "gaze", which is one of the data that can be obtained from humans.

This article is recommended for those who

  • I would like to know an example of using gaze data
  • I want to know application examples of gaze estimation models

Time needed to finish reading this article

5 minutes

What kind of task is gaze estimation?

Gaze estimation is the task of predicting where a person is looking when a person's entire face is given as an image or video.
Gaze is one of the important nonverbal communication cues and contains a wealth of information about human intentions. For this reason, even before the use of deep learning, many studies have been conducted to acquire and utilize eye gaze.
Deep learning is now being applied to gaze estimation tasks, yielding more robust and accurate results than traditional methods.

Gaze estimation methods using deep learning include those that estimate the gaze vector in 3D and those that estimate the gaze position in 2D.

3D line-of-sight vector estimation is used to predict the line-of-sight vector, and for example, it is used to prevent distracted driving while driving.
Since 2D gaze position estimation predicts the gaze position on a two-dimensional plane (horizontal and vertical coordinates), it has been used for engagement surveys using the gaze point and recently for controller control.

Gaze Estimation Technology Utilized to Improve Life

Gaze measurement has been tackled using eye tracking technology even before deep learning was utilized.
However, with the use of deep learning and recent social changes, there has been an increase in research and commercial efforts that make use of gaze data. It is now possible to collect line-of-sight data using smartphones, and it can be said that it has become a familiar task that is more closely related to our lives.

In the following, we will introduce research and case studies related to gaze estimation.

Livelihood support

Understanding how migrants interact with the objects around them is necessary for effective assisted living environments.

For example, in the world of nursing care, there is a term called IADL (Instrumental Activities of Daily Living). It is an index that shows whether you can do it on your own.

Internationally, there is a project to analyze this IADL pattern and the mobility of patients requiring assistance to assist clinicians in assessing the health status of people in assisted living settings such as nursing homes. This project also utilizes the data obtained by estimating the gaze of patients regarding their mobility, and we are developing a model that can estimate the gaze direction even in wide-area images, and we are trying to use the gaze data for IADL analysis.

Source: Gaze Estimation for Assisted Living Environments
Caption: Figure 4. Examples of gaze.
Figure 7. Examples of results for our gaze.
https://arxiv.org/pdf/1909.09225.pdf

NET(ours) represents the gaze direction predicted by the model created in the target project,
It can be seen that it points in the right direction more than the other models.

education

In recent years, many studies and case studies that utilize gaze data to understand human behavior and situations have been published.

For example, when working on a multiple-choice* reading task, an increased fixation time was found on the part of the text most relevant to answering the question.
It seems that the automatic reading comprehension framework that makes use of this result has succeeded in improving the performance of reading comprehension more than before.
(*Method of giving several options and having them choose the appropriate one from them)

In this way, human gaze data is considered to be closely related to cognitive activities expressed as ``understanding things,'' especially with the recent spread of online classes and remote work. There is also, I think that the possibility of utilization in the field of education is expanding.

Therefore, referring to several papers that have already been published, we created a demo that uses a web camera to measure the degree of concentration during a task.

Although it is necessary to take into account the results of considering the individual's learning characteristics and the effects of the tasks they are engaged in, I feel that this demonstration will show us the usefulness of the system in the future.

entertainment

I think that the number of people watching movies, dramas, and animations has increased as they spend more time at home. I also like movies and watch one movie every day, so I will introduce examples related to familiar movies.

Recently, a new dataset was released aimed at studying visual gaze and fixation patterns during movie watching.

In addition to point-of-regard and saliency data (the property of spatially eliciting attention from visual stimuli in a bottom-up manner), film-specific features such as camera movements and angles, framing sizes, and temporal positions of cuts and edits We also offer
Therefore, it shows the possibility that the model can grasp the features of high-level movies and the relationship between the features and the line of sight.

Source: Where to look at the movies : Analyzing visual attention to understand movie editing
Caption: Figure 1: Examples of different camera angles. Figure 2: The nine framing sizes.
Figure 3: Examples of saliency heatmaps created from the collected fixation points.
https://arxiv.org/pdf/2102.13378.pdf

Until now, even state-of-the-art gaze point prediction models have struggled to capture and use high-level movie features.

Movies contain non-static information (such as the director's creative choice of camera angles and shots), sometimes more important than visual information in capturing attention.

Actually, looking at the example of "The Shawshank Redemption" (It's a masterpiece, I love it) below, the line of sight of GroundTruth (actual, correct line of sight data) is moved to the poster as the camera moves, and ACLNet, There are results and differences of Zhang, DeepGazeII, MSINet.
These models were trained on datasets that are not suitable for learning movie features, and appear to indicate that they fail to extract important temporal features of movies.

Source: Where to look at the movies : Analyzing visual attention to understand movie editing
Caption: Figure 9: An example of failure case in ShawshankRedemption.
https://arxiv.org/pdf/2102.13378.pdf

We believe that this work could help extract contextual and high-level cinematic information from video, which we believe will have significant benefits in multiple areas of image processing, such as video compression for streaming and automatic video summarization. You can

lastly

今回は3つ、デモと論文より視線に関わる事例と研究をご紹介いたしました。

Gaze, which is closely related to the field of Brain Tech, will likely be used more as one of the sources of multimodal information in the future.
Recently, gaze data are used in various academic fields such as sociology, biology, and medicine, such as the relationship between eye contact and communication, and the relationship between gaze and brain activity in natural stimulation.

I cannot help but look forward to the fact that such research will be utilized in our comfortable lives in the near future.

 

■ Sources of content and papers introduced on this page / References

Philipe A. Dias, Damiano Malafronte, Henry Medeiros, Francesca Odone,“Gaze Estimation for Assisted Living Environments ”,Figure 4. Examples of gaze.,Figure 7. Examples of results for our gaze.,
https://arxiv.org/pdf/1909.09225.pdf

Alexandre Bruckert, Marc Christie, Olivier Le Meur, “Where to look at the movies : Analyzing visual attention to understand movies
editing”,Figure 1: Examples of different camera angles.,Figure 2: The nine framing sizes.,Figure 3: Examples of saliency heatmaps created from the collected fixation points.,Figure 9: An example of failure case in ShawshankRedemption.,
https://arxiv.org/pdf/2102.13378.pdf

 

Click here for examples of papers

Case of SPACE Co., Ltd.
AI utilization case of Aisin AW Co., Ltd.
Case of Aisin AW Industries Co., Ltd.

Related article

* Tech Blog AI Women's Club *
JSAI2020 Pick up from about 900 papers! -Explanability and systemization of AI-

* Tech Blog AI Women's Club *
Posture estimation model and application examples that can be understood in 5 minutes