Glossary

AI Glossary

A-E

Adversarial Attack

Adversarial Attack, also known as Adversarial Example, is the misrecognition as B by adding a certain perturbation (small correction) to the image recognized as A by artificial intelligence. A famous example is adding noise to an image of a panda, causing an image that looks like a panda to humans to be recognized as a gibbon by AI. In addition to altering the data in this way, physical alterations to road signs can also make it difficult to recognize "stop" signs as stop signs. Even if new defense methods are developed to prevent adversarial attacks, the cat-and-mouse game continues that new attack methods that pass them are generated. AI is growing day by day, but an adversarial attack on a autonomous driving car could lead to misrecognition and an accident, so a model that is robust against attacks is needed.

AGI

Artificial general intelligence (AGI) refers to artificial intelligence that "has human-like sensibilities and thinking circuits" as opposed to specialized AI. It is “strong AI” as described by the American philosopher John Searle, who proposed the definition of weak AI and strong AI.
AGIs have the ability to feel and think like humans. For example, it is possible to understand emotions such as laughing when happy, crying when sad, etc., and can not only imitate it, but also take different actions after thinking in one's own way. It is an artificial intelligence that is close to the human mind while possessing abilities that exceed those of ordinary humans.
In order to realize AGI, it is necessary to realize an AI engine that can handle a wide range of tasks, not just simple tasks or specific tasks, and that can adapt to complex situations. It will embody the grand vision of “making AI engines think as quickly as humans, react and solve problems instantly, and handle almost any task a human can do.”

AI

AI is an abbreviation of Artificial Intelligence, and is an academic field for artificially reproducing human intelligence by computer.
However, there is no clear definition, and the meaning of the word is not unified among researchers and experts.
The term "Artificial Intelligence" was coined at the Dartmouth Conference in 1956.
Recently, it has become possible to apply it to various businesses, and it is attracting attention.
There are two types of AI: strong AI and weak AI. Strong AI can think like humans, and weak AI is made for a specific purpose.
Strong AI does not yet exist in the world and is a field that is being researched.

AI speaker

An AI speaker is a type of smart home appliance equipped with AI (artificial intelligence) technology. Smart home appliances are home appliances that are connected to the Internet to improve convenience. The AI speaker is a convenient device that, when the user verbally asks the AI speaker about tomorrow's weather forecast or how to prepare food, will look it up on the Internet and answer by voice.
AI speakers have three important features: "there is an AI assistant on the cloud," "there is no AI in the AI speaker itself," and "it can have conversations." The main unit does not perform information processing or calculation processing, and all information processing and calculation processing rely on the AI assistant in the server computer called the cloud on the other side of the net.
In addition to a microphone that picks up the user's voice and a speaker that outputs the voice of the answer, the AI speaker has a "front-end processing" device and a "network processing" device built-in. Front-end processing converts the user's voice into data that can be sent over the Internet. It also converts the content played from the AI speaker into a voice that humans can understand. In network processing, it is responsible for sending and receiving data.

Auto Encoder

An autoencoder is one of the methods of unsupervised machine learning using a neural network. It appeared for the purpose of dimensionality reduction and feature extraction, but in recent years it is also used as a generative model. Autoencoders are used to obtain feature representations with a reduced amount of information, and are one of the effective techniques for dimensionality reduction.
Since the autoencoder network has a structure in which the dimensionality of the input data is lowered once and then returned to the output again, simple copying from the input to the output is impossible. In the learning process of the autoencoder, the weight of each edge is adjusted so that the input and output match. Through this learning, a network is formed that extracts only the important information necessary for restoration from the data and efficiently generates the original data from them. In this way, the first half of the autoencoder acquires the functions of dimensionality reduction and feature extraction, and the second half acquires the function of generating data using low-dimensional information as a source.
There are four types of autoencoders:

  • (1) Multilayer autoencoder
  • (2) Convolutional autoencoder
  • (3) Variational autoencoder
  • ④Conditional variational autoencoder

Auto-ML

Auto-ML (Automated Machine Learning) is a technology that automates time-consuming repetitive tasks (data preprocessing, application method selection, model verification). Machine learning requires preprocessing to classify raw data, selection of algorithms, optimization of hyperparameters, and other tasks, and these tasks are a major hassle in machine learning. By automating this work, it has become possible to efficiently perform data analysis and to eliminate human intuitional biases, leading to support for engineers who have little experience with machine learning. Auto-ML applications include sales, financial services, and healthcare. Auto-ML is offered by Google, Microsoft, IBM and others.

CNN

Convolutional Neural Networks (CNNs for short) are one of the most commonly used neural networks for recognizing patterns and objects in images in computer vision. A major feature is that data is converted using filters in the convolution layer.
Currently, most of the neural networks used for image recognition and motion detection are called convolutional neural networks, and unlike general neural networks, CNNs are made up of convolutional and pooling layers.
A feature map is created based on the features extracted in the convolution layer, and the feature map is summarized in the pooling layer. Divide the feature map into small windows and take the maximum value within the divided windows.
This pooling makes it possible to detect features in an image without worrying about changes in the exact position of the features.

CPU

A CPU (Central Processing Unit) is the part responsible for all the processing of a computer. It receives data from the mouse, keyboard, hard disk, memory, peripherals, etc. and controls their movements. Every computer has one. Since the CPU can execute many types of instructions, it can easily handle a wide variety of applications, and is good at responding flexibly (conditional branching) according to the situation.
In order to improve performance, CPU vendors and developers are devising multi-core CPUs that have multiple processor cores and layering accelerators, which are hardware that supports faster processing. By giving priority to specific applications and preferentially allocating resources to processing, it is possible to increase the processing speed of computing and the speed at which the CPU reads data from memory. Still, in general, the CPU can't outrun the processing power of the GPU.

cross validation

Cross validation is a technique in statistics that divides sample data, first analyzes part of it, tests the remaining part, and verifies and confirms the validity of the analysis itself. Cross validation is a method of evaluating the "goodness" of data analysis methods, and is often used to evaluate the goodness of machine learning and deep learning methods.
Cross validation is necessary when performing machine learning on relatively small datasets for the following reasons.
(1) If the entire dataset is used for learning, its generalization performance cannot be measured.
(2) It cannot be said that the trained learner is not biased unless the data to be learned and verified is crossed.
Cross-validation procedure
Divide the data → use the analysis method on "part of the data" (training) → evaluate the "goodness" of the analysis method on the remaining part (test) → move "part of the data", 2 and 3 Repeat → Evaluate the “goodness” of the analysis method based on the results of multiple tests

Data Augmentation

Data Augmentation is a technology that increases the dataset by augmenting the existing data when the dataset required for deep learning does not exist sufficiently to enable sufficient learning. By using this technology, it is possible to reduce the manual work required for development and significantly improve the quality of data. On the other hand, over-learning can also occur, in which the original data cannot be processed properly. Therefore, when amplifying data, it is important that the augmented data does not deviate too far from the raw data. There are seven main types of data augmentation in the field of image processing: 1. Flip on the horizontal or vertical axis, 2. Move horizontally or vertically, 3. Rotate, 4. Enlarge or reduce, 5. Crop, 6. .Add Gaussian noise,7.Random color manipulation). There are mainly four types of data extension in the field of natural language assistance (1. Synonym replacement, 2. Random synonym insertion, 3. Random word movement, 4. Random word deletion).

Django

Django, pronounced "Django", is an open source framework for web development released in 2005. A framework is software that contains functions that are often used when developing an application. By introducing a framework, it is possible to develop a wide range of web applications and to proceed with development efficiently.
Django is the most popular Python web framework and is well suited for large-scale web application development. Django has web functions such as sitemaps, user authentication, and RSS feeds, and is characterized by being devised to simplify web system development. It is also used in the famous web applications of Instagram and Pinterest.
The advantages of using Django during development are as follows.

  • ①High-speed operation
  • ② Full-stack framework (equipped with many useful functions)
  • (3) Secure design
  • ④ Ease of maintenance
  • (5) Freely selectable platform
  • ⑥ Low learning cost

drop out

DropOut is a technique for preventing over-learning and improving accuracy by performing learning while inactivating a certain percentage of nodes when learning a neural network. Dropout randomly drops the output of a specific layer to 0 during learning, enabling correct recognition even if some data is missing. This prevents overestimation of some local features of the image and improves the robustness of the model. When performing inference after learning, all the units in the network are used, and the output is multiplied by the ratio of the units eliminated during learning.
It is said that the reason for Dropout's high performance is that it is an approximation of a method called "ensemble learning."

End-to-end learning

End-to-end learning is a machine learning system that required multiple stages of processing from the input data to the output of the results. It learns by replacing. In other words, it is a learning method in which only the input and output are passed and all the processes that occur along the way are learned.
Taking OCR as an example, the general structure is to divide the intermediate processing from the input image into detailed tasks and finally reach character recognition, but end-to-end learning also learns all intermediate processing. However, it has the disadvantage of requiring a large number of datasets, so it is important to make good use of it.

F〜J

GANs

Generative Adversarial Networks (GANs) are a type of generative model that can learn features from data to generate non-existent data and transform it along the features of existing data. It corresponds to a generated model that is different from the classification/prediction model.
GANs are attracting attention as a method of unsupervised learning that learns features without giving correct data. Due to the flexibility of its architecture, it can be applied to a wide range of areas depending on the idea. Applied research and theoretical research are also progressing rapidly, and there are great expectations for future development.
Image generation is a well-known application of GAN, but it is also attracting attention as a technology that supplements deep learning in terms of data generation. It can be applied to deep learning, where lack of data tends to be a problem, by creating new data that includes features, instead of the conventional method of increasing data by tilting sample images or changing colors.

GPUs

GPU is an abbreviation for graphics processing unit.
In a personal computer, the brain that performs various calculations is the CPU, but when it comes to drawing graphics, the GPU assists the CPU.
3D graphics is a technology for rendering images that appear three-dimensional by adding depth to a two-dimensional screen, and requires calculation of coordinate positions and pixel data. The GPU performs this calculation. As you can see from recent movies and games, improvements in GPU and 3D graphics technology have made it possible to achieve very realistic images.
Many deep learning libraries and frameworks can be run on CPUs, but running them on GPUs can greatly improve processing speed. Therefore, the chip grade of the GPU is the most important, and then it is necessary to increase the memory capacity and number of GPUs according to the volume of learning data.

Grad CAM

Grad-CAM is an abbreviation for (Gradient-weighted Class Activation Mapping), and in a nutshell, it is a technology that visualizes important pixels by weighting gradients for predicted values. Used in the field of image recognition, it is a local explanation method that presents the basis for classification.
In Grad-CAM, the range that CNN is looking at for classification is displayed in a color map.
The idea of Grad-CAM is that the part that contributes greatly to the output value of the predicted class is important for classification prediction. is used.

K~O

Keras

Keras is a high-level neural network library written in python. It is characterized by being able to be implemented with short code even if you do not have expertise in machine learning or programming, and it is easier to write programs than other libraries such as TensorFlow and Theano. Keras features support for convolutional and recurrent neural networks in addition to standard neural networks.
Using keras, it is possible to automatically generate sentences, image recognition, and create bots, and even now, various companies such as Netflix and Uber use keras.

k-neighborhood method

The k-nearest neighbor method is one of the machine learning taxonomies and is considered to be the simplest of all algorithms. The k-nearest neighbor method is basically used for "supervised learning" that enables machine learning by giving data to a computer in advance.
A method used for classification, in which the given learning data is plotted on a vector space, and when unknown data is obtained, arbitrary k items are obtained in descending order of distance, and the data belongs to the majority vote inferring the class. If you have only two independent variables, you can plot the data in two dimensions, which makes it more intuitive.
The advantage of the k-nearest neighbor method is that it can be used for any kind of data, and the model can be constructed quickly. Applications include making a computer recognize handwritten characters and predicting customer purchasing intentions from data.

Matplotlib

Matplotlib is an external library that allows visualization of data. It is open source and available for anyone to use, whether for personal or commercial use.
Matplotlib allows you to draw many different types of graphs. It is mainly a two-dimensional graph, but it is also possible to draw a three-dimensional graph. It is often used in combination with NumPy, a numerical calculation library, and Jupyter Notebook can be used to graphically express data analysis results together with source code to create highly explanatory reports. Pandas can also visualize data, but Matplotlib can be used to achieve more complex displays.
In machine learning, many functions such as visualization of statistics, graphing of learning progress, and image output are used. It is also possible to draw histograms and scatterplots, and generate interactive graphs using JavaScript.

Meta-learning

Meta learning is a subfield of machine learning (ML) that applies automated learning algorithms to machine learning-related metadata. In 1979 Maudesley D. B. defined Meta Learning as "the process by which a learner becomes aware and, over time, gains control over perception, inquiry, learning and acquired growth habits". Humans can learn new things from little information. This is because humans flexibly understand information and identify necessary information. Conventional machine learning methods used large data sets for learning, and re-learned new small data sets to derive better results. Meta-learning, on the other hand, is a method of learning that improves the efficiency of subsequent learning by providing feedback on multiple learning results and learning processes, just like humans learn flexibly. Because Meta Learning learns/guides the learning algorithm itself, it is often paraphrased as "Learning to learn." Meta learning is a very important learning method for further development of AI.

Metric Learning

Metric learning (distance learning) is a method of learning feature values between data. By arranging data belonging to the same class closer together and data belonging to different classes farther away, it is possible to easily identify the data. The advantage of this method is that it is possible to learn feature quantities that take semantic distance into account, and it is also possible to select which feature quantity relationships are emphasized. A feature value is data that quantifies sweetness, sourness, saltiness, bitterness, and umami when targeting the sense of taste. For example, when focusing on the sweetness and sourness of foods A, B, and C of the same kind, if the distance between AB is closer than that between AC (AB<AC), B is considered to resemble A more than C. . In addition, since the similarity varies greatly depending on how the distance is measured, such as the Euclidean distance or the Manhattan distance, it is necessary to determine the measurement method in consideration of the purpose and problem setting. A face recognition system is an example of application of Metric Learning. It becomes possible to recognize an individual by learning and identifying the shape of the individual's face.

NAS(Neural Architecture Search)

NAS is one of the frameworks of Auto-ML (Automated Machine Learning) and is a technique for optimizing the architecture of artificial neural networks. By optimizing the architecture, it is possible to obtain performance equal to or better than artificial neural networks. The NAS is mainly composed of the following three parts.
① Search Space. Consider the target neural network architecture.
(2) Optimization method. Determining how to explore the Search Space to find better architectures.
③Evaluation method. Measure the quality of each architecture considered in the Optimization Method.

Numpy

NumPy (Nampai or Nampai) is a package that plays a big role in data analysis, machine learning, and scientific computing with Python. NumPy is open source and available for free for personal and commercial use.
Using NumPy makes it possible to easily process multidimensional arrays such as vectors and matrices, and it is also possible to perform processing at a very high speed compared to performing numerical calculations with Python alone. is.
Popular packages such as scikit-learn, SciPy, pandas, and tensorflow are based on NumPy. You can also be strong in handling.
NumPy is used in programs that need to process multidimensional arrays such as vectors and matrices faster and more efficiently than Python. can.

OpenAI Gym

OpenAI Gym is a platform provided by OpenAI, a non-profit organization that researches artificial intelligence (AI) led by Elon Musk. Several environments (games) such as CartPole problems and block breaking are prepared, and you can learn reinforcement learning.
In addition to providing a common interface for reinforcement learning "agents" and "environments", various "environments" that can be used for learning reinforcement learning tasks are provided. Since the interface between the simulation environment and the reinforcement learning algorithm has been established, even beginners can easily learn reinforcement learning.
The features of OpenAI Gym are as follows.
①Simple environmental interface
(2) Comparability
③ Reproducibility
④ Monitor progress

OpenCV

OpenCV is an open source library for C/C++, Java, Python, and MATLAB with functions such as image processing/image analysis and machine learning. Basically it can be used for free, and since it is distributed under the BSD license, it can be used not only for academic purposes but also for commercial purposes.
There are many functions that support computer image recognition, and the following are examples of functions that can be used specifically.

  • Tracking of recognized objects/objects
  • Camera calibration to detect camera position and user posture
  • Draw lines and text on images
  • Machine learning for computers to understand patterns
  • Read, save, output, etc. of used data
  • Currently, it is used for reading signboards and automatically recognizing images of traffic signs.

P~T

pandas

Pandas (pandas or pandas) is a Python library that provides functions to support data analysis using the "data frame format" that handles data manipulation quickly and efficiently. Pandas is open source and available for free for personal and commercial use.
Pandas makes it easy to perform data analysis tasks such as loading data, displaying statistics, and graphing. In addition, the main code is written in Cython or C language, so it can be processed very quickly compared to data analysis with Python alone.
Pandas is essential for data analysis due to its rich functionality when dealing with table data, such as general table calculations, statistical calculations, data formatting, and input/output in various formats such as csv. It's becoming Pandas is also used for financial data analysis applications because it has the most suitable time series analysis function for handling financial data.

PCA

Principle Component Analysis is a method of dimensionally compressing multidimensional data.
A large number of dimensions in a dataset increases the computational cost of both data analysis and machine learning, and makes it difficult to understand the data. As a solution, PCA is often used in the fields of statistics and machine learning.
Principal component analysis can reduce feature quantities in a data set by extracting feature quantities.
If we can reduce the dimensions to 3 or less, we can also visualize the data. Principal component analysis does not "select" features, but "extracts" new features. However, since "extraction" here is irreversible, you need to be aware that some information will be lost.
Principal component analysis PCA is used in various fields such as statistics, biology, and bioinformatics for the purpose of dimensionality reduction.

Python

Python is a programming language used in a wide range of scenes such as embedded development, web development, AI, and education. Developed in 1991 by a Dutchman named Guido van Rossum, in recent years it has become one of the most popular programming languages in the world, and has established itself as an essential language especially in AI development such as machine learning and deep learning. there is It has versatility, can create a wide range of types of programs, and can handle both small and large program development.
Python has three major features:
① Simple grammar
(2) Abundant and practical library
③ Web framework is developed

Pytorch

PyTorch is an open source machine learning library for Python. It was initially developed by Facebook's artificial intelligence research group, based on Torch, which is used in computer vision and natural language processing. Its popularity has skyrocketed in recent years due to its fast calculation speed and easy-to-read and easy-to-handle source code. Calculations are performed in the form of Tensors, which are similar to NumPy's ndarrays (n-dimensional arrays), making full use of the high-speed matrix calculations that GPU excels at. Another advantage is that the computational graphs required for computation are dynamically constructed in order to construct the neural network.
Many researchers implement and present the contents of recent papers using PyTorch, and PyTorch makes it easy to obtain implementation examples of typical deep learning methods.

Random Forest

Random forest is a machine learning technique that can be applied to problems such as classification and regression.
It is a type of ensemble learning, and it is called "random forest" because random sampling is performed from input data and multiple decision trees of weak learners are created.
It is one of the white Box models because it can visualize the importance of each feature.

ResNet

ResNet is a neural network model devised in 2015 by Kaiming He of Microsoft Research.
At the time of 2015, it was generally known that increasing the number of CNN layers in image recognition would enable the acquisition of higher-dimensional features, but there was the problem that performance deteriorated simply by stacking layers. . ResNet was proposed to solve it. ResNet solves this vanishing gradient problem by introducing a mechanism called shortcut connection, which directly adds the input of the front layer to the back layer. With a skip structure that bypasses the input to a certain layer and inputs to the back layer across the layers, the disappearance and divergence of the gradient is prevented, and a super multi-layer network is realized. It achieved a depth of 152 layers (22 layers even in the previous year's winner GoogLeNet) and became the winning model of ILSVCR in 2015.

RNN

A recurrent neural network (RNN) is a network designed to interpret time-series or continuous information. Reuse the activation function from other data points in the continuous data to produce the following outputs:
RNNs are mainly used in the field of natural language processing. Conventional neural networks do not assume that input values are independent of each other. This assumption is fine for image processing and the like, but it is not appropriate for continuous input values such as language. In RNN, we made it possible to remember the previous input by incorporating a loop in the middle layer. Thanks to this, it is now possible to handle time-series data such as natural language processing, and it is mainly used for machine translation, sentence generation, speech recognition, etc.
The architecture of RNNs is similar to traditional artificial neural networks and CNNs, except that they have a memory that acts as a feedback loop. Much like the human brain, newer information is given more weight in predicting sentences, especially in speech.

Scikit-Learn

scikit-learn is a Python machine learning library. scikit-learn is open source (BSD license), so anyone can use and redistribute it for free, regardless of personal or commercial use. can also Scikit-learn is still under active development, and it is easy to find information on the Internet.
With a full range of supervised and unsupervised learning algorithms available, and a wealth of sample datasets available, scikit-learn makes it possible to immediately try out machine learning programming.
Scikit-learn has been developed and improved in a fairly active user community, and the documentation is well maintained, so even beginners can start using Scikit-learn smoothly.

self-supervised learning

Self-supervised learning, a subset of unsupervised learning, is a means of training a computer to perform a task without humans providing (inputting) labeled data. is. Machines label the data, draw conclusions based on relevance and correlation through classification and analysis, and derive outputs and goals. For example, after hiding a part of the image, the unhidden image is used as input and the hidden part is predicted to learn the expression. The advantage of this approach is that it allows the system to decompose complex tasks into simpler tasks to reach the desired output, despite the lack of a labeled dataset. In supervised learning, humans label a large number of images to identify and output the input images, but self-supervised learning does not require human labeling, making work more efficient.

Semantic segmentation

Semantic Segmentation is a deep learning algorithm that associates a label or category to every pixel in an image and is used to recognize groups of pixels that form characteristic categories.
An advantage is that the image of the object can be divided into a plurality of regions at the pixel level. Even irregularly shaped objects can be clearly detected.
It is a problem of assigning some sort of class to each pixel of the input image, but even humans cannot guess what it is by looking at only one pixel. Therefore, it is important to classify each pixel while taking into account the information of surrounding pixels.
Examples of applications for Semantic Segmentation include autonomous driving, medical image processing, and industrial inspection.

Semi-supervised learning

Semi-supervised learning is an intermediate between supervised and unsupervised learning, machine learning that combines small labeled datasets and large unlabeled datasets. method. The characteristics of semi-supervised learning are that it maintains the merits of supervised learning, which are good learning efficiency and learning accuracy, while overcoming the demerit of low work efficiency. It is a learning method that overcomes the disadvantages of low learning efficiency and low learning accuracy while maintaining the point of low cost for handling. Semi-supervised learning has a wide range of applications, including voice analysis, Internet content classification, and protein base sequence classification.

SSD

SSD is one of the models often used in object detection, and stands for Single Shot MultiBox Detector. Faster and higher performance network than Faster RCNN, which appeared after Faster RCNN, and is used for object detection in images and videos.
The SSD has a structure in which the image is first passed through a Base Network such as VGG16, and then a feature map is created by changing the scale with an added convolution layer.
Regarding the mechanism of detection, a box called Prior or Prior Box is set in advance for each pixel on each feature map. To detect. Since the scale of each feature map is different, the closer the feature map is to the output layer, the coarser the map is and the larger the object can be captured.

Super Resolution

Super Resolution is a technology that converts low-resolution images and motions into high-resolution images and motions. Using deep learning, high-resolution is achieved by reconstructing detailed information that is missing from the target, such as low-resolution images, and repeating generation. It also removes noise, making it possible to generate highly accurate images and motions. The difference from up-conversion, which is a similar technology, is that up-conversion tends to have coarse details in images and motions, but in super-resolution technology, various details such as shadows, color contrast, and focus are also generated in detail. be. This technology is familiar to us in our daily lives, and is used in the generation of 4K video for TV, image generation in MRI, and image generation in optical microscopes.

SVMs

SVM (Support Vector Machine) is supervised learning that can be used for both classification and regression. It is known to have high generalization performance with the idea of margin maximization.
The basic idea of SVM is to find a decision boundary that, given a dataset of two-dimensional data with only two features, best divides the dataset according to class.
SVM has the advantage of low computational cost, but the disadvantage is that it is difficult to preprocess data, adjust parameters, and interpret results. However, SVM is a popular algorithm because of its "high discrimination ability and easy implementation for nonlinear discrimination".

Tensorflow

TensorFlow is a Python library for fast numerical analysis developed by Google. It has the feature of being able to meet the requirements of a system that can build and train neural networks. TensorFlow is a library that excels at working with Tensors (multi-dimensional arrays) and makes the most of CPU and GPU to optimize for machine learning. Unlike imperative programming, there is a flow of creating a graph called "data flow graph" and outputting the result by inputting data into the graph.
It enables efficient and easy implementation of machine learning algorithms and methods.
Even now, it is used for image recognition, image search such as photos in Google services, voice recognition technology, and it can also translate from English to Japanese.

U-Z

VGGMore

VGG is a network of VGG teams from the University of Oxford who came second in the 2014 ILSVRC. This model consists of a total of 16 layers: 13 convolution layers and 3 fully-connected layers. Keras and PyTorch implement VGG as a module, so you can use it easily. Recent papers often use VGG as the underlying model, so it is essential to understand it.
It is characterized by a structure in which 2 to 4 convolution layers with small filters are successively stacked and the size is halved with a pooling layer, which is repeated. It is possible to extract features better by convolving

XAIMore

Explainable AI: XAI (explainable AI) is a machine learning model in which the process leading to prediction results and inference results can be explained by humans, that is, AI itself. The concept originated from the research of DARPA in the United States, and refers to research on technology that makes predictions from models understandable to humans and sufficiently reliable.
It was devised to prevent the loss of the grounds for the prediction, even if it is highly accurate, because it is unknown what kind of calculation process the AI prediction result was obtained through.
For example, if an AI instructs the diagnosis and treatment details based on the data of test results received at a hospital, few people would be willing to trust the conclusions 100%. Many people feel a sense of refusal in the medical field, where their own health and lives are at stake, if the judgment process is a black Box and only conclusions are suddenly drawn. Therefore, it is becoming popular as a model that can explain the basis of prediction in an easy-to-understand manner.

YOLO

YOLO is an image recognition algorithm announced in 2016. By performing "detection" and "identification" at the same time, processing can be performed in real time and with high accuracy. YOLO was originally a slang term for "You only live once", and the author of YOLO, Joseph Redmon, called it "You Only Look Once". I named the model Kaze no Moji. The greatest feature of YOLO is that it directly calculates the object-likeness and position from the entire image using a convolutional neural network without using region scanning approaches such as sliding windows and region proposals.

A-ko

Ensemble learning

Ensemble learning is a method that can improve learning performance by making judgments using a plurality of machine learning results. To put it simply, it is a method of taking a majority vote, and is a method for improving the prediction ability for unlearned data by fusing what has been learned as individual learners.
As a learner that applies this, there is a random forest that averages a large number of decision trees learned from randomly sampled data.
Ensemble learning can be broadly divided into three types: bagging, boosting, and stacking. Bagging and boosting are the most famous.
However, it should be noted that performing ensemble learning on all data does not necessarily improve accuracy.

semantic analysis

Semantic analysis is the selection of the correct syntax tree using "meaning". Semantic analysis selects the correct syntax tree while examining the relationships between words based on a dictionary.
However, semantic analysis is a very difficult task for computers. Computers do not understand the concept of “meaning,” so it is necessary to make rules and convey (program) them. This is because there can be more than one semantic relationship indicated by the expression "B of". Some of these ambiguities cannot be interpreted without contextual information, or are not understood by context.
For these reasons, a system that can perform ``semantic analysis'' sufficiently well has not yet been completed, but it is possible to perform a certain degree of semantic analysis by giving each word a ``semantic'', which is the basic information of the meaning. know.

Anomaly detection

Anomaly detection refers to the use of data mining to identify observations, expected patterns, etc. that are inconsistent with other data in a dataset. In a nutshell, it is a technique for detecting data whose behavior differs from that of the majority of other data.
It can be applied to various fields such as intrusion detection systems, fraud detection, error detection, system health monitoring, sensor network event detection, ecosystem disturbance detection, etc. There are the following methods.
(1) Outlier detection: a method of detecting data points that normally do not occur
(2) Abnormal site detection: A method for detecting partial time series where abnormalities occur
(3) Change point detection: a method for detecting places where the pattern of time-series data changes abruptly

Ward Law

Ward's method is one of the methods of hierarchical cluster analysis. First, "clustering" simply means collecting aggregates of data by dividing them by function or category. There are two types of algorithms for clustering, one of which is hierarchical cluster analysis.
In Ward's method, when it is assumed that two clusters P and Q are combined. The sum of the squares of the distances between the center of gravity of the moved cluster and each sample in the cluster, L(P∪Q), and the sum of the squares of the distances between the center of gravity of the cluster and each sample in the original two clusters, Difference between L(P) and L(Q)
Δ = L(P∪Q)-L(P)-L(Q)
Merge clusters that minimize
Although the amount of computation is large, the classification sensitivity is quite good, so it is often used.

online learning

Online learning is a method of updating the parameters of a machine learning model with one piece of training data, and batch learning is a contrasting learning method.
The advantage is that it can be implemented with a small amount of memory, but the disadvantages include that the learning becomes unstable and that it is easy to react to outliers.

probability distribution

A probability distribution is a list of probabilities that data will appear.
For example, when a coin is flipped, there are only two pieces of data: {the coin is heads, the coin is tails}.
At this time, a non-cheating coin should be {heads: 50%, tails: 50%}.
A set of these probabilities is called a probability distribution.
There are many types of probability distributions.
Discrete or continuous, 1 variable or 2 variables, and more than one corresponding to the type of trial etc. for each of them.
Those that measure discrete values and states such as the number, presence or absence, and correctness are often treated as discrete types, while those that measure quantities such as weight, length, and strength are often treated as continuous types.
As a distribution that produces discrete data

  • binomial distribution
  • Poisson distribution
  • geometric distribution
  • discrete uniform distribution

and so on.
Also, for the continuous probability distribution,

  • normal distribution
  • exponential distribution
  • continuous uniform distribution
  • chi-square distribution

and so on.

overlearning

Overfitting means that the accuracy is high with the training data, but the accuracy rate is low with data that is different from the training data. A state in which a model is completely useless for other data. No matter how high the accuracy rate of training data prepared in advance is, it is meaningless because it is not useful in actual operation.
There are three ways to suppress over-learning.
① Increase the number of learning data
② Change the model to a simple one
③ Regularize

regression

Regression is the problem of predicting continuous values. Classification predicts categorical values, whereas regression deals with continuous values.
As types of regression analysis for solving regression problems, there are simple regression analysis and multiple regression analysis. When there is one explanatory variable, simple regression analysis is performed, and when there are multiple explanatory variables, multiple regression analysis is performed.
There are also two types of regression: linear regression and polynomial regression. Polynomial regression allows for more complex feature extraction.

learning curve

A learning curve is a graph showing the relationship between the number of training data samples and prediction performance. The learning curve can be used to determine whether the predictive model is overfitting or is under-learning and not being able to fit the training data.
The horizontal axis of the learning curve is the number of training data samples, and the vertical axis is the evaluation index. As evaluation indices, both indices evaluated using training data and indices evaluated using verification data are used.
You can also judge whether a better model can be obtained by increasing the number of samples. If both the validation score and the training score converge to lower values as the sample size increases, we can see that increasing the sample size has a greater effect. In that case, measures such as reviewing parameters and feature values are taken.

reinforcement learning

Reinforcement learning is a learning method that maximizes rewards by having agents act in a given environment and obtain rewards. Although it is a method that existed before the concept of deep learning was born, social implementation has progressed with the birth of deep reinforcement learning, which applies deep learning to reinforcement learning.
Reinforcement learning algorithms that are relatively easy to understand include Q-learning and the Monte Carlo method.

descriptive statistics

Descriptive statistics uses numerical values, tables, graphs, etc. to organize and describe the characteristics of data. Variation in the data exists without exception, so various statistical measures are required to express the characteristics of multiple populations.
Examples of uses include census, class test scores, and so on.
Here are some descriptive statistics techniques. First, input the data obtained by observation into Excel for the time being. At this point, it's just a list of data. However, in order to read what the data shows correctly and efficiently, you can understand the characteristics of the observed data by making tables and graphs, and calculating the average and standard deviation. be able to.

machine learning

Machine learning refers to a series of procedures for learning big data and grasping the characteristics of its data structure.
Originally, it started from research in the field of pattern recognition, and has become more practical for the following two reasons.
Many companies and research institutes are now able to collect huge amounts of data
Computing performance has improved, and calculations with complex algorithms have become faster

Learning methods include supervised learning, unsupervised learning, and reinforcement learning, and there are various types of data that can be handled, such as series data and images.

supervised learning

Supervised learning is one of the machine learning methods, and can be said to be a method in which input data and output data are prepared, and the output data is used as teacher data for learning.
As a specific example, supervised learning involves preparing an image of a dog and an image of a cat, labeling them as "dog or cat", and allowing the machine to learn. It is so named because the process by which an algorithm learns the relationship between input and output data is similar to the process by which a teacher teaches a student.
Supervised learning can handle problems such as classification and regression.

unsupervised learning

Unsupervised learning is one of the learning methods of machine learning, and learning is performed in a state where there is no correct label. Main algorithms include principal component analysis, clustering, and generative models.

clustering

Clustering (or cluster analysis) is one of statistical data analysis methods, and is a method of extracting similar groups among data based on data having only explanatory variables.
Types of clustering include split-optimal clustering and hierarchical clustering.
K-means is a typical example of split-optimal clustering. A cluster evaluation function is prepared in advance, and clusters are obtained so that the evaluation function is minimized (or maximized).
Hierarchical clustering is a method of performing clustering by regarding close data groups as similar clusters and dividing the clusters.

graph theory

Graph theory is a discipline that deals mathematically with interconnected networks. It is said that one of the origins is that in 1736, Euler showed a solution to the puzzle called "Königsberg problem".
In recent years, it has been attracting attention due to its wide range of applications in actual business areas, ease of conception, and the development of new analysis methods accompanying the development of machine learning techniques.
Graph theory has many applications and one of the most common applications is finding the shortest distance between cities. For example, when considering a railway route map, the problem is how stations are connected by routes. However, in many cases, the distances between stations, the subtle layout of the stations, the shape of the route, etc. are depicted differently from the actual geography. For users of route maps, information on how stations are connected is important. In this way, a graph is the concept of points abstracted by focusing on how they are connected and lines connecting them, and graph theory explores the various properties of graphs.

Morphological analysis

Morphological analysis is a technique that divides "natural language" into morphemes (minimum units of words in which words have meaning). To put it simply, it is a technique for dividing words into words that we commonly use in our daily lives.
There are two typical methods of morphological analysis, one based on grammatical regularity and the other based on probabilistic language.
There are few "absolute rules" in language, and there are flexible parts such as "grammatically, the meaning can be understood even if this part is replaced" or "this word and this word can have the same meaning". many. However, for the computer, the high degree of freedom appears ambiguous. In particular, Japanese has a higher degree of freedom than other languages, and it is said that morphological analysis is difficult.

missing value

A missing value is an input to an algorithm that lacks values for all or some features of an object.
There are three types of occurrence of missing values.
①MCAR (Missing Completely at Random) : Missing completely at random.
②MAR(Missing at random) :Data is missing depending on other features.
(3) MNAR (Missing not at random): Missing that follows a distribution with missing values.
Methods for dealing with missing values include ignoring missing values and imputing missing values. If the amount of training data is sufficient, it is possible to remove missing values from the dataset, but if the amount of data is small, simply deleting them will reduce the amount of data and waste the data. Therefore, it is necessary to complement the data in some way instead of deleting it.

decision tree learning

Decision tree learning is a machine learning method that creates a decision tree from data. Although it is one of the simplest algorithms for learning structures, it is one of the most widely used and proven methods in practice. is one. A classification tree and a regression tree are collectively called a decision tree.
As part of building a decision tree, we split the data into multiple categories to maximize the information gain. Then, for each category, again divide the data so that the information gain is maximized. A decision tree is created by repeating this process an appropriate number of times.
Decision tree analysis, which is an analysis using a decision tree, divides data step by step and outputs analysis results. is easy to interpret”

test

A test is an attempt to determine whether an assumption about the data holds at all.
Testing is a process of ``first making a hypothesis, probabilistically verifying what actually happened, and drawing a conclusion''. To draw conclusions, use the method of contradiction. The method of contradiction is a method of ``first setting a hypothesis, thinking under the condition that the hypothesis is correct, and judging that the hypothesis is wrong when a contradiction occurs''.
To emphasize the difference from estimation, estimation "assumes a distribution in the data and computes the parameters of that distribution", whereas testing "makes assumptions about the parameters of a distribution first and then determines whether those assumptions are true." We make decisions based on data.”
Furthermore, presumption is to calculate the parameters of a given distribution when the data follow a certain distribution. Smirnoff test).

cross-validation

Cross-validation is a method of obtaining a model with generalization performance that is not affected by specific data by dividing training data and validation data for evaluation and measuring performance. It can be used for classification and regression.
In the cross-validation method, data is divided into multiple data sets, one data set is used as test data, and the other data is used as learning data to build a model, so it is used when there is a certain amount of data.
Among cross-validations, K-fold cross-validation, which is often used, divides the data into K pieces and evaluates the accuracy rate by using one of them as test data and the remaining K-1 pieces as learning data. This is a method in which learning is performed K times so that all K pieces of data become test data once each, and the accuracy is averaged.

Parsing

Syntactic analysis is to find a structure that is convenient for semantic analysis such as a tree structure in sentences at the granularity of each word. We find structures such as subjects, predicates, noun phrases, and verb phrases that exist above parts of speech, and analyze the relationships between words obtained by morphological analysis. The most representative and general syntactic analysis is syntactic analysis based on sentence components (noun phrases, verb phrases, etc.) and parts of speech. It is constructed using a context-free grammar (a grammar that creates a tree by applying production rules such as "sentence → noun phrase verb phrase" "noun phrase → article noun | pronoun | .." backwards).
Also called dependency parsing, the result of syntactic parsing can be expressed as a "syntax tree".

Sato

optimisation

Optimization is to minimize (maximize) an objective function within constraints. Optimization problems include mathematical optimization and combinatorial optimization. Mathematical optimization and combinatorial optimization are determined by whether the solution is continuous or discontinuous. Mathematical optimization deals with continuous solutions, and combinatorial optimization deals with discontinuous solutions.
Mathematical optimization includes linear programming problems, convex quadratic programming problems, and semidefinite programming problems, and combinatorial optimization includes linear integer programming problems and second-order 0-1 integer programming problems.

sampling

Sampling refers to the operation of "taking a sample from a population or probability distribution" in statistical research. It is a very important technique in statistics and machine learning.
In other words, from a distribution p(z), get a sample Z(l)=(z1,..,zl) that follows the distribution. If a large number of samples can be obtained, even if it is difficult to directly calculate from the distribution, the pseudo data obtained from the learned probability distribution corresponds to what is called a sample of that probability distribution. So we can get the answer from the sample.

Curse of Dimension

The curse of dimensionality refers to the state in which the machine learning algorithms you are using will not be able to demonstrate sufficient performance, making it difficult to make good predictions on unknown data. The cause is that the number of dimensions of the dataset becomes too large, resulting in too many combinations of target datasets.
Even if you have a computer system for processing big data, if you do not pay attention to the curse of dimensionality, not only will the computational cost be enormous, but you will not be able to obtain sufficient training results and will not be able to properly handle unknown data. Problems such as being unable to respond will occur.
To solve the curse of dimensionality, feature creation or feature selection is required.

autonomous driving

Autonomous autonomous driving means "a system in which a machine autonomously controls a vehicle or mobile object without human intervention." autonomous driving, AI is used in various fields such as analysis of image data acquired by sensors such as cameras, and speech recognition in the field of HMI (Human Machine Interface) for communication between the driver and the system.
In particular, research and development is active in the field of image recognition and analysis. This is a core technology that analyzes in real time what is shown in the huge amount of image data that a autonomous driving car continuously acquires and how it behaves. The most important area is the ability to play a role.
Regarding safety standards, Japan defines the level of autonomous driving with reference to the draft of the US Department of Transportation NHTSA.
Level 0 No driving automation
Level 2 Partial driving automation
Level 3 Conditional driving automation
Level 4 advanced driving automation
Level 5 Full driving automation
Expectations are rising for future social implementation in the AI utilization field of autonomous driving.

singularity

Singularity (technical singularity) is a term that refers to the point at which technology such as AI can create intelligence that is smarter than humans, and artificial intelligence recursively creates even better artificial intelligence. It is a hypothesis that an overwhelmingly advanced intelligence that completely surpasses humans will be created by doing so. Technological innovation is progressing at an exponential speed, and it is said that “artificial intelligence will surpass human intelligence” in 2029 and “reach the singularity” in 2045.
Information technology and artificial intelligence will probably surpass the abilities of the human brain, and the scope of their coverage will extend to all fields such as perception, cognition, problem-solving ability, emotion, morality, and intelligence. ). An intellect slightly smarter than humans will be born and will continue to grow exponentially until it finally transcends humans, reaching the singularity.

natural language processing

Natural language processing is a series of technologies that allow computers to process natural language that humans use on a daily basis, and is a field of artificial intelligence and linguistics. Specifically, it refers to processing technology that analyzes the meaning of natural language, from "spoken language" used in communication such as words and sentences to "written language" such as papers.
As a preliminary step to language processing, we construct a machine-readable dictionary (a dictionary necessary for computers to understand vocabulary) and a corpus (a collection of documents that record and store language usage).
After that, it is processed based on the four steps of morphological analysis, syntactic analysis, semantic analysis, and contextual analysis.

deep learning

Deep learning is one method of machine learning, and it is achieving results in various industries because it is more flexible than other methods.
The reason it is "flexible" is that deep learning allows you to change the architecture of your model at will. There are actually various ways to create architectures, and DNN (Deep Neural Network), CNN (Convolutonal Neural Network), RNN (Recurrent Neural Network), etc. are well known.
Thanks to deep learning, the accuracy of image analysis, speech recognition, natural language processing, etc. is much higher than conventional methods, and it is being adopted in the real world.

dimensionality reduction

次元削減とは、多次元からなる情報を、その意味を保ったまま、それより少ない次元の情報に落とし込むことである。
次元削減を行う主な目的はデータの圧縮とデータの可視化である。
データはユークリッド構造を保持するが、次元の呪いは受けない。
通常、データ内のすべての分散ではなくほとんどの分散を記述することができる基底または数学的表現を選択することで、関連する情報を保持しながら、表現に必要な情報量を減らす。
次元削減特徴的な手法は下記の4種類がある。
①PCA (主成分分析)
②LSA(SDA) (潜在意味解析 (特異値分解))
③LDA (線形判別分析)
④ICA (独立成分分析)

inferential statistics

In inferential statistics, a sample is randomly selected from a population, and statistical quantities such as the sample mean and unbiased variance obtained from that sample are used to estimate the population population (population mean, population variance) is tested and estimated. Inferential statistics assumes that the target of analysis is a probability distribution, and that by increasing the number of samples randomly selected from the population and repeating trials infinitely, it is possible to infer a large and unknown population from the sample that is a part of the whole. thinking.
Inferential statistics can be further divided into estimation and testing. Estimation is to predict a specific value such as an average, and testing is to statistically determine whether a hypothesis about a population is correct.
Examples of its use include the average age of Japanese people, TV program ratings, and election exit polls.

Estimate

Inference is the work of using a machine learning model (=specific calculation formula/calculation method) to output data to input data.
Given the data at hand, inference is to find out what distribution (already examined) the data resembles and what shape it has.
Specifically, the normal distribution has the shape of a mountain, and the shape is determined by determining two parameters: the position of the top of the mountain (mean) and the spread of the mountain (variance). The role of estimation is to "calculate the parameters that define the distribution when the data follow a distribution".

generative model

A generative model is a method of assuming a probability distribution that generates observed data and estimating the probability distribution from the observed data. Simply put, it is a framework that focuses on the question "How was the current data created?" and attempts to model it (data generation process).
The greatest merit is that the data itself can be examined most closely, such as by sampling the data and by detecting novelty and outliers.
Generative models treat inputs as random variables. In other words, we consider the question, "How likely is it that an input generated from a certain probability distribution applies to class A?" At this time, if the probability distribution that the input follows can be obtained well, the input data can be generated in a pseudo manner using that probability distribution.

performance index

A performance evaluation index is a standard for comparing whether a classification model is superior to others, and includes the following.
(1) Accuracy rate: Calculates whether the judgment results match for all data
②Precision rate: Indicates the percentage of those predicted to be positive that are actually positive
(3) Recall rate: Shows the percentage of positive cases predicted to be positive among the truly positive cases.
However, there are various indicators that quantify and evaluate the performance of AI (machine learning) models. Among them, the one that expresses the indexes that are often used in classification as a table is called the “confusion matrix”. It is a matrix used to clarify the performance of AI (machine learning) models, and it is possible to evaluate the performance of AI (machine learning) models based on the confusion matrix classified into four areas.

Normalization

Normalization refers to shaping data according to a certain rule so that it can be easily handled in calculations and analyses. Normalization is performed because it is possible to create a learning model with higher accuracy by giving values unified by a common scale as input data, rather than learning by deep learning using data of various units as input.
For example, in image recognition AI, a method is used in which the pixel value used for learning data is divided by the maximum value (255) to keep the data value within the range [0, 1]. Another method is to convert to a value with a mean of 0 and a standard deviation of 1. This is done by subtracting the mean from the data and dividing by the standard deviation.

regularization

Normalization refers to shaping data according to a certain rule so that it can be easily handled in calculations and analyses.
Normalization is performed because it is possible to create a learning model with higher accuracy by inputting values standardized on a common scale as input data, rather than by inputting data of various units and learning by deep learning. Normalization eliminates data duplication and prevents data inconsistencies from occurring during updating.
In the example of image recognition, the actual method is to divide the pixel value used in the learning data by the maximum value (255) to keep the data value within the range [0, 1]. . Another method is to convert to a value with a mean of 0 and a standard deviation of 1. This is done by subtracting the mean from a given value and dividing by the standard deviation.

Explanatory variable

An explanatory variable is a variable that explains the objective variable, and is also called an “independent variable”. It can also be taken as the cause.
For example, if you run a restaurant and want to predict future sales on a monthly basis based on past sales information, you can imagine that daily sales are related to weather, temperature, and humidity. . Variables such as weather, temperature, and humidity that have some influence on desired sales are “explanatory variables”. A variable means "a number that can change from case to case", and refers to things that change from day to day, such as weather, temperature, and humidity.
To explain it with a formula, for example, if there is a function y = ax, x will be the explanatory variable.
To give another specific example, the objective variable is the result of "I will get promoted in the future / I will not get promoted in the future". On the other hand, explanatory variables are feature factors for predicting the objective variable. Characteristic factors for prediction, such as "good at greetings, bright, studious, good at sales", are called explanatory variables to predict whether a person will succeed or not succeed in the future.

correlation

Correlation means "similarity" that is "how similar" they are when there are two or more sources. Correlation can be divided into positive correlation, where the other variable increases when one variable increases, and negative correlation, where the other variable decreases when one variable increases. Furthermore, the index called the correlation coefficient expresses the strength of the "similarity" as a number ranging from "-1 to 1".
It is important to understand that machine learning reveals "correlation", not "causation". Correlation can show that multiple variables are related, but it is unclear whether they are truly related, so finding meaning and proving causation is the job of humans.

Typical value

A numerical value that represents the characteristics of data is called a representative value. In reality, the representative values are not necessarily numerical values, and there are representative values representing positions and representative values representing variations.
The three representative values representing the position are the mean, median, and mode, and the two representative values representing the variability are the variance and standard deviation.

  • (1) Average: The sum of all data values divided by the number of data. There are three types of averages.
    1. arithmetic mean (arithmetic mean)
      Generally speaking, the average refers to this, and is the value obtained by adding all data and dividing by the number of data.
    2. geometric mean (geometric mean)
      A value obtained by multiplying all data values and obtaining the radical root of the number of data. It is used when calculating the average rate, such as the price increase rate.
    3. harmonic mean
      It is used to calculate the average speed of two laps of a running course.
  • (2) Median: The value in the middle when the data are arranged in ascending order.
  • (3) Mode (Mode): The value with the most data.

turing machine

Turing machines are the underlying theory that influences modern computers. A model of a computing machine devised by mathematician Alan Turing in 1936, according to Church's localization, all computable problems can be computed by a Turing machine.
A Turing machine consists of a tape that is divided into squares and extends to the left and right infinitely, a finite control unit, and a head for reading and writing symbols on the tape. A Turing machine performs the following series of operations at discrete times such as 0, 1, 2, and so on. First, read the symbol of the square where the head is located. Depending on this symbol and the current state of the finite control unit (internal state), rewrite the square symbol, move the head one frame left or right, and transition the internal state. By appropriately defining the rules of such operations, various calculations can be made to be performed by the Turing machine. By appropriately defining the representation of numbers, it becomes possible to formulate the notion of computation of a function by a Turing machine.

data visualization

Data visualization is to express phenomena and phenomena that are difficult to confirm with numerical data alone in visible forms such as graphs, diagrams, and tables.
It can also be said that it is one step in the whole data analysis.
Data analysis can be broadly divided into four categories: collection, processing, tabulation, and visualization.
Visualizing and analyzing data has the following advantages.

  • (1) Reduction of work time
  • (2) Awareness through visualization
  • (3) Ease of sharing with others

Furthermore, a new awareness obtained from visualization creates a cycle in which new awareness is obtained by visualizing from a different angle and combining it with other data. Data analysis is precisely this “back and forth between data and awareness,” and the role of visualization is the key to turning that cycle.

data analysis

Data analysis is the extraction of meaningful information from data. Currently, the amount of data accumulated in companies and research institutes is enormous, so it is attracting more attention.
Data analysis is carried out in various industries, but currently, data on convenience store cash registers is also collected and data analysis is being performed on a daily basis. For example, data analysis may show that umbrellas sell better on rainy days, or beer sells more on hot summer days.

toy problem

Toy problem is a word that was born in the era of the first AI boom, and it is also a keyword that led to the end of the first AI boom.
It gradually became clear that artificial intelligence in this era could only solve toy problems (toy problems) with fixed rules, such as mazes and Othello. It was unexpectedly easy for a computer to find the optimal answer within a given set of rules, such as beating a human in chess or shogi. It was said that the problems that artificial intelligence could solve in the boom era were toy problems.
Toy problems are currently not very useful in the real world, but because they are simple and easy to understand, they are sometimes used for research purposes as standard examples for experimentally evaluating the performance of various algorithms.

feature extraction

Feature extraction refers to the operation of extracting more useful information from an object or data to be identified.
When multiple pieces of information are included in one piece of data, not all of the information is necessary, so it is necessary to improve the content of the prediction model handled by machine learning to be more effective by performing feature extraction processing. There is
Feature extraction is divided into three specific contents below.

  • ① Change meaningless information into valid information
    If it contains corrupted information, it should be replaced with the correct information.
  • (2) Non-linear conversion.
    The numbers should be represented by groups so that the characteristics of the numbers are easier to understand.
  • (3) Perform processing according to the content of the information.
    Here, processing is performed according to the characteristics of individual information.

It is necessary to fully understand the contents of the data and choose an appropriate feature extraction process.

Feature value

A feature quantity is a variable that quantitatively expresses the features of an object, and artificial intelligence can learn patterns by giving it a measure called the feature quantity. Furthermore, the accuracy of the result greatly changes depending on what is selected as the feature quantity.
In datasets, features are represented as columns.
For example, when predicting the rent from property conditions, the feature values include "occupied area", "building age", and "nearest station". Rents for rental properties are determined based on various conditions, so rents tend to rise as the area of the property increases, and conversely, as the age of the building increases, rents tend to decrease. In machine learning, we call the data that will be the “features” of the rent we want to predict “features.”

Na~ho

neural network

A neural network (NN) is a mathematical model of artificial neurons that express nerve cells (neurons) in the human brain and their connections, that is, a neural network.
The cranial nerves of animals are designed so that substances and energy present in the environment are captured by sensory organs and converted into signals, which are transmitted through synapses and neurons in the order of "input node → intermediate node → output node".
The human brain has a network structure of nerve cells called neurons. A connection site that transmits a signal from a neuron to another neuron is called a synapse, and neurons transmit electrical and chemical reaction signals from the synapse to exchange information. A neural network has a mechanism in which a signal that enters from the input layer propagates through various nodes and is transmitted to the output layer. This is the same mechanism by which signals propagate through neurons in nerve cells.

no free lunch theorem

The no free lunch theorem is defined as ``any algorithm that searches for the extremum of a cost function has the same performance on average when applied to all possible cost functions''. Simply put, in theory there is no such thing as a "one-size-fits-all" "supervised machine learning model" or "algorithm" that performs well on all problems.
The no free lunch theorem is often used as a counter-argument to trying to solve various problems with metaheuristic algorithms. In other words, if the problem changes, the algorithm should change, and as much as possible we should use foresight to come up with a solution that is specific to that problem.

perceptron

A perceptron is a type of artificial neuron or neural network invented in 1957 by psychologist and computer scientist Frank Rosenblatt. It is a model of vision and brain functions and performs pattern recognition. However, since the learning process is not self-explanatory, it is considered a black-Box algorithm.
As a concept, first, data is input to the input layer, and a feature amount, which is an index for recognizing the data, is input. The input is multiplied by the weight W corresponding to the connection strength between neurons, and the resulting value is input to the neurons in the output layer. The neurons in the output layer pass the sum of these inputs through the activation function and output the final result.
Since only a single perceptron can express a simple model, it is necessary to stack multiple layers of perceptrons in order to express a complex model, and this is called a multi-layer perceptron.

Uncle Barney's Rules

Uncle Barney's rule is that "the number of data required for learning in machine learning is 10 times the number of explanatory variables (parameters)".
"Uncle Bernie" is said to be Bernard Widrow, a professor at Stanford University. In his talk "ADALINE and MADALINE" at the 1987 IEEE conference, he proposed the "Uncle's Bernie Rule".

Hyperparameter

Hyperparameters are parameters that set the behavior of machine learning algorithms. Especially in deep learning, it corresponds to parameters that cannot be optimized by the gradient method.
Roughly speaking, it is the “setting” of machine learning algorithms. There is also a movement toward automating hyperparameter tuning, but in general, "humans" perform adjustments "manually."
For example, the learning rate, batch size, number of training iterations, and even the number of layers and channels in a neural network are hyperparameters. Furthermore, not only such numerical values, but also choices such as whether to use Momentum SGD or Adam for learning can be said to be hyperparameters.

pattern recognition

When people see a PET bottle, they can recognize it as a PET bottle because the PET bottle is defined as a transparent plastic object of a specific size.
In this way, humans have five senses, such as sight, hearing, touch, taste, and smell, and humans can pattern events by stimuli given by these senses.
Pattern recognition is a group of research technologies that match such information from multiple patterns to one, and is one of the research/development technology themes in artificial intelligence (AI).

batch learning

In machine learning, it is necessary to adjust parameters to optimal values during learning. Batch learning involves updating these parameters using all training data.
The advantage is that the sampling of data to be learned does not affect updating, but the disadvantage is that the memory capacity increases when a large amount of data is given.
For example, it is impossible to perform batch learning when an image is about 1 TB, so in that case, mini-batch learning is performed by dividing the batch into multiple batches.

parameter

A parameter is a term used in mathematics and statistics to indicate a variable value, parameter, parameter, argument, etc., but as a business term it is mainly used in the IT field such as computer programs. there is
In the IT field, external input data that affects the operation and processing results of software and systems are often expressed as parameters, and are sometimes called arguments in programming. You can indicate what setting values and limit values to create a machine learning prediction model.
The URL parameters that we often see are the data sent to the web server by the web browser, etc., written in a specific format at the end of the URL that specifies the destination.
(question mark) at the end of the URL followed by the contents in the format of "name=value". It can be reflected in the contents.

Parameter tuning

In machine learning model training, hyperparameter tuning is essential to create a good model. In order to achieve the desired accuracy and generalization performance, the model is repeatedly trained to find the optimal hyperparameters. It is important to explore well.
A long time ago, the mainstream of hyperparameter tuning was to "manually search for optimal parameters", but recently, "automation" has become popular. As for the three techniques of hyperparameter tuning, there are three main techniques for tuning hyperparameters.

  • ① Grid Search
  • ② Random Search
  • ③ Bayesian Optimization

outlier

An outlier is an observed value that has an abnormally large residual from an estimate of its true value. Among the outliers, those whose causes such as measurement errors and entry errors are known are sometimes called "abnormal values."
Since outliers can greatly affect analysis, outlier detection is desirable.
Outlier detection is a method of detecting data points as a unit of detection, and is a method used to detect data points that would normally be unthinkable.
Methods for judging include 1. a method using the Smirnov-Grubbs test 2. a method using the interquartile range (IQR).
As a countermeasure against outliers, it is necessary to consider whether they are abnormal values and to exclude those that can be identified as measurement errors or input errors, but there are also useful outliers, so it is not enough to remove everything. need to remember.

pattern matching

Pattern matching is an AI method that determines whether or not a specific pattern that has been specified in advance is matched when matching data. Pattern matching can handle a wide variety of data, from data such as symbols to complex data such as images and sounds. For example, in symbol pattern matching, it is possible to determine whether the character string "This is a pen" contains the character "Pen".

frame problem

The frame problem is one of the most important puzzles in artificial intelligence, and it shows that robots with limited information processing capacity cannot deal with all possible problems in reality.
It is a concept proposed by artificial intelligence researchers John McCarthy and Patrick J. Hayes in 1969, and is generally viewed as an unsolved problem even today.
When artificial intelligence makes a decision, it must select the information necessary for the current decision from all the information in the world. In other words, it is necessary to calculate the existence or non-existence of a relationship with things about all kinds of things in the world, including a huge number of things that are originally unrelated to things and should be ignored. As a result, the amount of computation in information processing increases explosively, and artificial intelligence, which has only a finite information processing capacity, stops functioning, resulting in the problem of inability to make decisions.

contextual analysis

Contextual analysis is to perform morphological and semantic analysis through multiple sentences. Simply put, it checks the connection of multiple sentences, and the problem of pronoun reference is also included in this contextual analysis. Contextual analysis is absolutely necessary even in things like understanding a story.
Contextual analysis is a very complicated task, not only because the target of analysis has become longer, but also because it is necessary to analyze the relationships between sentences.
For example, in the sentence ``The blue sky and the sunflower field are beautiful. However, just by changing the order of the sentences, it suddenly becomes impossible to understand what "it" refers to. To prevent this from happening, text analysis is necessary.

object detection

Object detection refers to capturing an image and detecting the position and category of a defined object in the image.
It is simple to think that it is a general term for the function of seeing and recognizing what the human eye is doing.
Tasks in object detection are roughly divided into two steps.

  • STEP1: Object detection Determine whether the target area is the background or an object
  • STEP2: Image recognition Determine what the object is if it is an object

For example, if the object detection model is to detect a dog in an image containing a dog, the dog will be surrounded by a border (bounding Box) and output as a dog.
As a representative algorithm for object detection

  • R-CNN (Region-based CNN)
  • YOLO (You only look once)
  • SSD (Single Shot Detector)

etc. exist.
Since object detection is a field of active research, it is expected that various algorithms will increase in the future.

classification

Classification is the problem of predicting labels for discrete values. Regression predicts continuous values, whereas classification deals with discrete values.
In general, classification is divided into binary classification and multi-value classification. In binary classification, an output value predicts two labels, and in multi-value classification, an output value predicts three or more labels.
For example, an AI model that prepares a label and an image of a dog or a cat and outputs "dog" when a dog image is entered is an AI model that can perform binary classification.

Bayesian statistics

Bayesian statistics does not necessarily require a sample and can somehow derive probabilities even with insufficient data. Probability includes objective probability (objective probability) and subjective probability (subjective probability), and Bayesian statistics is statistics that deals with subjective probability.
As a method, after first setting the "probability of occurrence of a certain event" (= prior probability), update the "probability of occurrence of a certain event" (= posterior probability) each time further information is obtained. It is to derive the probability (subjective probability) of the event that should occur originally.
Prior probability is "the probability that you assumed before you got the data", and it does not mean that you do not have the data at all. The prior and posterior probabilities are the probabilities before and after that "additional" data is obtained. This concept of updating probabilities from the obtained data is called Bayesian update.

Wow

multimodal learning

Multimodal learning is a method of learning by combining multiple modal information.
For example, when I thought about creating an AI that can guess the age of a child's boy,

    • Predicting from the "image" of a boy's face
    • Predicting from the "voice" of a boy's voice
    • than,

  • Predicting by the "image" of the boy's face and the "sound" of the voice

would be more accurate. In this example, “image” and “sound” are multiple modal information, and multimodal learning is a method of learning to predict something from multiple modal information such as “image” and “sound”. .

Ugly duckling theorem

The ugly duckling theorem states that an ugly duckling and an ordinary duckling are as similar as two ordinary ducklings.
An example will be given. Let A be an ugly duckling, and B and C be two ordinary ducklings. A and B have something in common, and so do A and C and B and C. There are some things that are common only to A and B that don't apply to C. Similarly, we can find common points between only A and C, and only between B and C. Thus, all combinations have a corresponding commonality and are all equally similar.
An important part of this theorem is that we consider all features to be of equal importance. In other words, it is to classify objectively and formally without subjectivity. This creates the problem that objective, general-purpose classification is not possible, so it is necessary to create an algorithm according to the classification problem to be solved.

target variable

The target variable is the variable that we want to predict, and is also called the "dependent variable" or "external criterion." It can also be taken as the result of things.
For example, if you run a restaurant and want to predict future sales on a monthly basis based on past sales information, you can imagine that daily sales are related to weather, temperature, and humidity. . At this time, sales, which is what we want to obtain, becomes the “objective variable”. A variable means "a number that can change from case to case", and refers to values that change from day to day, such as weather, temperature, and humidity.
To explain it with a formula, for example, if there is a function y = ax, y is the objective variable.
To give another specific example, the objective variable is the result of "I will get promoted in the future / I will not get promoted in the future". Characteristic factors for prediction, such as "good at greetings, bright, studious, good at sales", are called explanatory variables to predict whether a person will succeed or not succeed in the future.

quantization

Quantization is to approximately represent a continuous quantity such as an analog signal with a discrete value such as an integer. This is often done when converting signals from the natural world into digital data so that they can be processed and stored by a computer.
However, the actual quantization used in machine learning has a slightly different meaning, and refers to expressing quantities that were previously expressed with sufficient (= 32bit or 16bit) precision with a much smaller number of bits. Quantization has advantages such as speeding up calculation and saving memory.
One way to quantize is to simply take the weights of the trained model and apply the quantization function. It is also possible to just run the same function on activation during inference, but using this approach the model is less accurate. When inferring a training dataset with a trained model, we can see the mean and variance of the distribution for each layer activation and use this information to set the appropriate quantization parameters.

recommend

Similar product recommendation systems such as "people who saw this product also bought such products", which have recently become widely recognized on e-commerce sites, are a type of rule-based expert system. and is called a recommendation engine.
A recommendation engine helps customers find the information they want by proposing products that are related to the products that visitors to the site have purchased or seen and that stimulate their desire to purchase. It is a system that gives. In addition, there are two types of recommendations: one that makes recommendations based on the contents, and one that makes recommendations based on the visitor's browsing history, purchase history, and the like.
A recommendation engine has several patterns of technology as its base, and in many cases they are used in combination.
The basic techniques are as follows.

  • (1) Emphasis filtering
  • (2) Content-based filtering
  • ③Hybrid type