Creating a Real-Time Document RAG with VAST InsightEngine - Part 1

Narrow down by specifying conditions

現在2189件がヒットしています。check

Basic AI/Artificial Intelligence VAST Data

We held a webinar explaining the content of this article. Please register using the form below to receive a URL to access the on-demand video.

If you missed it or if you were a participant and would like to watch it again, please register now!

Click here to watch on-demand video

What is VAST InsightEngine?

VAST InsightEngine is a solution released by VAST Data that accelerates the use of AI in data. It incorporates Kafka and vector database functionality within the VAST Data storage. These features allow for the simple creation of data pipelines.
This feature eliminates the need to connect external systems such as Kafka or Milvus, which were previously required for building traditional data pipelines, making management easier.
For more details, please see the article below.

About the VAST InsightEngine feature

Before we begin building a specific application, let's explain the main features of VAST InsightEngine: Kafka functionality and vector database functionality.

About the VAST InsightEngine Kafka functionality

Kafka is a distributed messaging queue for processing streaming data. Kafka temporarily stores and manages real-time generated streaming data and its metadata. Data stored in Kafka is used by external systems such as data analytics platforms and web services.
The system is similar to a conveyor belt in a factory, managing the data that flows through it and linking it to external systems.
Kafka is becoming an essential feature for data pipelines that require real-time processing.
Kafka consists of three elements: a Producer that notifies when an event occurs, a Broker that stores the notified messages, and a Consumer that confirms receipt of the messages. Within the Broker, messages sent from the Producer are stored and managed in containers called topics.

Normally, using Kafka requires a dedicated server, but the server configuration had to be changed depending on the amount of stream data and the required throughput.
In contrast, VAST InsightEngine includes Kafka as a basic function, and when building a RAG pipeline using VAST Data, you can build the pipeline without preparing a dedicated server.

About VAST InsightEngine's Vector DB Functionality

A vector database, as the name suggests, is a database that stores vectors. Before introducing vector databases, let's first explain what a vector is.
The texts and images we see every day are in a format that humans can understand, but large-scale language models (LLMs) cannot directly understand them. Therefore, it is necessary to convert human-understandable data into a format that LLMs can understand. The format after such conversion is called a vector. A vector is a collection of floating-point numbers, and distances can be calculated between vectors. The closer the vectors are, the more similar their properties are.
A vector database stores vectors internally and searches its own vector data for the vector closest to a given vector. In other words, using a vector database makes it possible to extract vectors with similar properties. Thus, this vector database functionality is used in RAG to search for the closest answer (vector) to a question, and it is becoming an essential technology in modern AI applications.
Traditionally, vector databases, like Kafka, are built using dedicated servers and open-source software such as Milvus.
In contrast, VAST InsightEngine includes vector database functionality as a basic feature, making it possible to build a vector database without preparing a dedicated server.

Challenges that VAST InsightEngine can solve

As described above, VAST InsightEngine independently supports all the elements necessary for a data pipeline. Therefore, you can build a data pipeline using only VAST Data. Traditional data pipelines require setting up dedicated servers to prepare Kafka and vector databases, installing and managing open-source software on them, which is physically and logically complex. As a result, building an in-house data pipeline takes a lot of time, hindering the adoption of AI.

On the other hand, VAST InsightEngine is a solution that can address this complexity and development cumbersomeness. It requires no additional servers and eliminates the need to install and manage multiple open-source software (OSS). It allows you to build data pipelines quickly and simply.
In the next chapter, we will specifically build a data pipeline using VAST InsightEngine.

Click here for more

Contact Us

Manufacturer TOP page