In recent years, AI, especially generative AI, has been attracting a lot of attention overseas. In this article, we will discuss the overseas trends in generative AI, the latest technological developments, use cases, and the impact that AI will have on business, technology, and society.
*This article is based on a lecture given at the Macnica Data・AI Forum 2024 Winter held in February 2024.
Five models that make up generative AI
Before OpenAI released ChatGPT in 2022, machine learning was primarily used for classification and prediction. However, recently, attention has been focused on generative AI, which can create new things. There are various methods for this, and here are five of them.
The first is Latent Diffusion, which is often used for image generation. This method adds a lot of noise to the data and then creates something similar but slightly different based on removing the noise. Stable Diffusion is a model that aims to achieve this with as high stability as possible.
The second is the Variational Autoencoder. The key to this method is Latent Space, which means "latent space" in Japanese. Latent Space refers to a lower-dimensional compressed space in which AI captures the essential characteristics and structure of data when looking at data from data. In other words, the AI understands the essence of the given data and creates something close to it.
The third is Generative Adversarial Networks, which is also often used for image generation. Adversarial is a term often used in the field of cybersecurity, and has a similar meaning to "attacker." This method is characterized by generating high-quality data by having a generative AI network and a generator and discriminator fight like spears and shields. The mission of the former is to generate more convincing data, while the latter is to classify whether something is real or fake with greater accuracy.
The fourth is Autoregressive Generative Models. These are used in the most familiar ChatGPT, Gemini, LLaMa, etc. This model predicts the next word in a sentence from the previous word, and creates a sentence by selecting the one with the highest probability. This allows you to create realistic sentences that look like they were written by a human.
The fifth is Multiagent Generative Systems. This is a technology in which multiple agents cooperate and learn from each other to generate new data and behavioral predictions. It is used in games, social behavior, economics, weather, and traffic predictions.
The new startups and applications emerging today are a combination of these five or specialize in one of them.
Types of AI used by companies
Next, we categorized the ways in which you can use this generative AI technology. In this table, the left side is closer to the consumer side, and the right side is closer to the developer side.
Consume, shown on the far left, allows you to purchase and use applications that have pre-implemented fine-tuned foundation models, and is used by many companies.
The Embed app on the right uses its own application. However, for generative AI, it borrows APIs from companies such as OpenAI and installs them. This is a method commonly used by SaaS vendors, and in Notion, which I use frequently, it allows Q&A while effectively summarizing user notes and information on the internet.
The next two, Extend 1 and 2, are used when the accuracy of commonly available models is somewhat low or they are not specialized for the intended use. By performing data retrieval and fine-tuning, higher quality models and applications specialized for your company can be created.
Build, on the far right, is the idea of building something ourselves from scratch using a foundation model. We will use our own model trained on a lot of data that we have to create our own application.
In other words, the further to the right you go in the table, the greater the cost, time, and data required, and the longer it takes to deliver value, but the more differentiated you can achieve. The diagram below summarizes the technology stack that is needed to advance these initiatives.
Of these, generative AI models in the center of the diagram are attracting the most attention. Among them, foundation models are said to have a market of 146 billion yen, and recently domain models specialized for finance, healthcare, and legal have also appeared. There is also a trend for companies to use products such as Databricks and AWS to create models that are more specialized for their industry. Currently, there is a need for a hub that can effectively consolidate the various models that exist in the world, and vendors such as Hugging Face are entering the market.
On top of that, generative AI engineering is also necessary to create value using generative models. This field includes Prompt Engineering, Vector DB, Fine-Tuning, API Orchestration, AI TRiSM, etc. The size of this market is generally said to be 16 billion, and every time the act of creating generative AI rather than buying it is democratized, the market grows, and more and more startups are emerging with the tools to facilitate this.
Cloud vendors such as Microsoft, AWS, and GCP are providing large-scale infrastructure at the bottom of the diagram, and this AI boom will likely lead to significant growth for GAFA. Also, because chip vendors such as NVIDIA are included here, I think the infrastructure for running AI will become even larger in the future.
For startups, the lead time to obtain GPUs is long and expensive, so some vendors are now offering models where they rent them from the cloud for computing, or even manufacturing chips specialized for generative AI.
The generative AI applications at the top of the diagram can be broadly divided into two types: horizontal and vertical, especially for enterprise applications. Vertical applications tend to be industry-specific. On the other hand, horizontal applications tend to be aimed at a variety of departments, such as marketing and customer service, regardless of industry.
Type of Application
This diagram further classifies the applications divided by Horizontal into four types, and shows how they have been used in the past and in the future. The lighter shaded areas indicate a quality level that works well in demos and X (formerly Twitter) posts, but does not work well in real life, while the darker shaded areas indicate a quality that is usable by general users.
Text and code have been tried and tested in various ways even before 2020, and have been used for translation and auto-complete. Since 2022, when ChatGPT was released, it has become possible to do drafts, simple copywriting, and automatic conversion of multiple lines. And here comes DALL-E, which has been used to create simple art, logos, and photos. On the other hand, it has not been used much for videos, games, or 3D.
Two years later, in 2024, it became possible to create two or three drafts of text that could be advanced to the final stage with a little work, rather than just a draft. It can also handle long texts and industry-specific content. It can also handle multiple languages and convert code well, allowing you to write draft-level code for various products from text. As for images and videos, the AI can be used to create materials for the architectural industry to get initial inspiration, and then models such as 3D renderings can be applied to them.
Fast forward another six years to 2030, and I think generative AI will be able to create text and code at a level that would be handled by professional developers and copywriters, as well as images, and we will be able to create more personalized experiences for each user when it comes to videos and games.
This time, we will focus on text and code and delve into use cases in startups. The diagram below classifies applications of text-based generative AI into three categories.
First, regarding general conversation, there is a startup called Otter that converts audio of presentations at meetings and exhibitions into text. Grammarly was originally a startup that did spell checking and grammar, but by incorporating generative AI, it has become possible to edit users' text to make it sound more like it was written by a professional. Writer and Cohere rewrite marketing content, product descriptions, contracts, and other text to reflect the tone of your company.
Each vendor has successfully trained AI based on the data they have accumulated in their business to make their services easier to use and more valuable. When it comes to industry-specific services, data becomes even more important. For example, Bloomreach offers a service that specializes in automation and personalization in marketing campaigns. Specifically, they provide a service where generative AI creates the right messages to send automatically at what timing by looking at e-commerce purchase history, where users clicked, when purchases were made, etc.
In healthcare, by automatically taking medical notes when a patient visits a hospital, not only does it reduce the burden on doctors, it also effectively automates the updating of records afterwards. This is what Nabla is doing.
Hippocratic AI is a startup that specializes in creating foundation models that are better than GPT4 in the healthcare field by encouraging collaboration between various hospitals and doctors. The company achieved a benchmark that exceeded GPT4 in 105 of 114 items in the US healthcare certification test. This is a case of creating a model dedicated to healthcare by learning data that is not available on the Internet.
And in the legal field, there are Case Text and Even Up. For example, lawyers look at many documents and think about how to explain things based on similar cases in the past, but AI is used to review documents, research necessary for preparation, and check for omissions. In addition, mock trials allow you to simulate "if I say this, the response will be like this." By setting the styles of various lawyers, it helps to avoid surprises when you appear in court.
In all of these cases, the specialized data in each field allows them to provide high-quality, realistic information. In particular, when it comes to healthcare and legal matters, the fact that security and privacy are not compromised and everything is completed in-house is a differentiating factor.
There are some points to be careful about in the general conversation section. Until now, companies have tended to use this for their own internal content, but not for content to be distributed externally. In addition, tuning is required to make it usable. In industry-specific cases, the creator may be in charge of that, but it may not be usable in situations that are even slightly unexpected, so it is very important to try it out yourself.
I'm sure you all use Github's Copilot a lot when it comes to code, but there are many other startups out there. They don't just autocomplete, they also reuse past code, convert code between languages, and some even provide test cases. Many of them also offer rich communities and marketplaces, suggest actions for other users, and differentiate themselves by the ease of front-end development and tuning.
American companies' efforts
I'd like to introduce three examples of what American companies are doing by utilizing startups and generative AI technology.
The first is the retail store Walmart. I have been in the United States since I was a student, and 10 years ago Walmart was a company that sold low prices and a wide selection of products. However, it has now transformed into one of the companies most advancing digital transformation.
The company provides AI assistant tools in various places to encourage employees to use AI and make training easier. For example, it makes it easier to look up company policies and benefits, and allows employees to ask questions to the AI instead of a busy boss. This will make work more efficient and increase employees' independence, allowing them to do more.
What's important here is that the company collects data on who checked what, what they asked, and whether the problem was properly solved. As a result, the company is able to make suggestions such as "This application is suitable for you, so try using it," which has enabled a cycle of efficiency and independence. In addition, providing an environment where employees can safely try out new tools also encourages digital transformation among employees.
From the customer's perspective, one good thing about AI is that it has a very powerful search function. Home parties are often held in the United States, and with other vendors, if you search for "3rd birthday," for example, all you get is a birthday T-shirt. However, you actually need balloons for decorations, tablecloths, invitations, gifts, sweets, and so on. If you do the same search on Walmart, all of these different categories are displayed in one row, making it very easy to use. The reason this system is possible is probably because they have purchasing data on "who is buying what with what."
In addition, if you upload your own photo, something like an avatar will appear, and you can try on the clothes through it. It's an interesting initiative to invite friends and partners onto the platform and exchange opinions. I think this was also realized in response to user feedback.
Regarding education, I like the approach of a company called Khan Academy. The company provides an AI-based service called Khanmigo, which actively adopts the Socratic method. In other words, they believe that it is important to teach how to learn, rather than teaching the answers to problems.
Founder Khan used to teach mathematics to his cousin. A one-on-one tutoring system is good for teaching someone something, but it doesn't scale and you can't deliver that content to the masses. So he wanted to provide it through e-learning. However, e-learning is difficult to personalize and tends to be an experience of learning answers.
Therefore, Khan Academy uses AI to teach students how to break down a math problem, for example, depending on their level, so that they can gradually improve and experience different ways of learning. In addition, not only students but also teachers use AI when creating curriculum.
The third example is from NASA, with whom I gave a speech at our company's event in October 2023. NASA is working on using AI to transform its manufacturing process, and as part of that effort, they used AI to design the "alien-like parts" shown in the photo, which were then actually manufactured and sent into space.
As shown in the bottom left of the figure, parts designed by AI are stronger than those made by humans, and are also characterized by the fact that they can be completed quickly and cheaply. Specifically, what previously took six months to a year from design to sending into space has been shortened to just two weeks. During the lecture, a NASA representative said that it is extremely important to prepare and define the AI environment, such as "what can be changed and what cannot be changed." Within the given range, it is possible to create something optimal, but if the range specified is incorrect, something that is useless will be completed.
NASA has experience in sending something into space, so they have a lot of knowledge and data on the scope. Therefore, they have many ways to test the finished product, and as a result, by actually testing what the AI has produced, the reliability has increased. Because AI can drastically shorten the manufacturing process in this way, I believe it will not just improve efficiency, but will also bring about change.
At CES 2024, many commentators said that it will not just be about software, but that this will also be connected to the physical world, and I think the era of Cyber Physical Systems (CPS) is quite close at hand.
In the area of mobility, this not only improves safety but also personal experiences. For example, Mercedes-Benz offers a function that checks traffic conditions and suggests videos and music appropriate for the time it will take to reach a destination, as well as route search based on the driver's stress level.
Sony has incorporated game console-like functions, such as AR monsters attacking you or making you feel like you're visiting an aquarium, to create a more personal, fun, and safe space. LG and Samsung are working to automate various things with smart home appliances while reducing power consumption.
In addition, robot arms and AI go well together, so I think we will see robots not only in logistics and agriculture, but also in cooking robots, photography robots, beauty robots, etc. Conventional robots could not function well unless they were in a specific given environment, but by utilizing AI, recognition and subsequent judgments will become more diverse, and I think we will soon see an era in which various things can be automated with high precision.
The future of generative AI
Finally, I would like to consider the future of generative AI from several perspectives. First, I think models will become increasingly lightweight. And by connecting these models, I think we will be able to do what we want to do more cost-effectively.
This is because the current text summarization using ChatGPT is very cost-inefficient and difficult to use. Therefore, it is important to have lightweight LLMs suitable for each task and to connect them well. To achieve this, I think that open source and more visible models will emerge, a HUB that brings them together will emerge, and eventually it will be offered as a service.
When it comes to data, AI will not only be able to understand text, but will also be able to understand voice, images, videos, and even 3D models as it gains the sensibility of eyes and ears. Vector databases, which allow you to create synthetic data for missing data and corner case data and organize the relationships to make it usable, will become more mainstream. Finally, engineering tools that allow AI to efficiently learn from your company's own data and code it well will become more common, and we expect to see an era in which AI will be democratized.
Applications that combine these models and data will change from various forms of virtual assistance to current software with embedded AI, and then to AI native applications, ushering in the era of digital twins.
Finally, there is security to support these. This includes firewalls to block prompt injections and information leaks by attackers, and hallucination management to prevent AI from outputting impossible information. Once these are in place, there will be detection that can classify what is created by AI and what is created by humans, and ultimately, I think it will be important to have a system where humans are included in the decision-making loop.
At Macnica, we follow various trends and startups and aim to continue to provide the right products and services to the people of Japan. If you have any questions or concerns about AI data, please feel free to contact us.