Site Search

Contents of this time

In episode 2, we explained the specific software and hardware required, as well as how to deploy it.
In episode 3, we will explain how to build a RAG pipeline based on sample code using each container.
 
[RAG Chatbot Development Using NVIDIA NIM]
Episode 1: RAG system development using Microservices
Episode 2: What is the required software/hardware configuration for a RAG system?
Episode 3: RAG System Sample Code

RAG system sample code

We will explain how to use each container when developing Microservices based system and how to create a simple application.
In this environment, we deploy LLM on 2 H100 cards and an embedded model on 1 A100 card.
 
First, here is a video of the completed sample application:
When I asked llama3.1-70b about the DGX B200, who is not using RAG, they did not have the information to generate a correct answer.
Meanwhile, by using RAG, we are able to get answers about the DGX B200.

Finally, we will explain how to set up this demo application.

1. Installing the library

Create requirements.txt with the command below.

cat > requirements.txt << "EOF" 
fastapi==0.104.1
langchain==0.3.4
langchain-community==0.3.3
langchain-core==0.3.13
langchain-nvidia-ai-endpoints==0.3.3
numpy==1.26.4
sentence-transformers==2.2.2
unstructured==0.11.8
langchain_milvus==0.1.6
gradio==5.1.0
EOF

This section explains how to install the required libraries.
Execute the above command and use the requirements.txt created to install the library with the following command.

pip install -r requirements.txt

Once you have confirmed that the library was installed successfully, proceed to the next step.

2. Libraries to import

from langchain_nvidia_ai_endpoints import ChatNVIDIA
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
import os
from langchain_milvus import Milvus
import gradio as gr

We will explain about the libraries to be imported.
This time, we will import the libraries installed in 1. Installing the libraries.
We will use an open source framework for building applications that use LLM called langchain.
Additionally, we will use Milvus, which was set up in Episode 2, as the vector database.
The interface is created using gradio.

3. LLM Settings

#llmの設定 llm = ChatNVIDIA( base_url="YOUR_LOCAL_ENDPOINT_URL", api_key = "not-used", model="model_name",#冒頭の動画ではmeta/llama-3.1-70b-instructを使用しています。 temperature=0.2, top_p=0.7, max_tokens=1024, ) #embedding modelの設定 embedding_model = NVIDIAEmbeddings( base_url="YOUR_LOCAL_ENDPOINT_URL", api_key = "not-used", model="model_name", truncate="NONE", ) #MilvusのURL URI="YOUR_DB_URL:19530" #promptのテンプレートの設定 prompt = ChatPromptTemplate.from_messages([ ("system", "You are a helpful and friendly AI!" "Your responses should be concise and no longer than two sentences." "Do not hallucinate. Say you don't know if you don't have this information." # "Answer the question using only the context" "

Question:{question}

Context:{context}" ), ("user", "{question}") ])

We will explain how to set up LLM and embedded models.
This time, we will create a sample application with the following configuration explained in Part 1. Therefore, we will set up the LLM NIM container, vector database, embedded model NIM container, and prompt. Please check your own environment and change the NIM container URL etc. as necessary.
Milvus also starts on port 19530 by default, but this can be changed.
Here, the system prompt is set to "You are a helpful and friendly AI!". Changing the system prompt will change the output of the LLM, so you can try to get a better answer.

4. Functions for storing in a vector database

# VectorDBに格納する関数 def process_pdf_to_milvus(pdf_file): # PDFを読み込む loader = PyPDFLoader(pdf_file) document = loader.load() # テキストをチャンクに分割 text_splitter = RecursiveCharacterTextSplitter( chunk_size=500, chunk_overlap=100, separators=["

", "
", ".", ";", ",", " ", ""], ) document_chunks = text_splitter.split_documents(document) # ベクトルデータベースの設定 vector_store = Milvus.from_documents( document_chunks, embedding=embedding_model, connection_args={"uri": URI}, drop_old=True ) return "PDFが処理され、Milvusに保存されました!"

This section explains the functions stored in the vector database.
This time, we are targeting PDF, so we load the PDF with PyPDFLoader. Then, we set the chunk size to 500, the number of duplicate tokens to 100, and split it with the characters specified by separators. Then, we save it to a vector database (Milvus).

5. Functions that generate answers

def answer_question(question): vector_store_loaded = Milvus( embedding_function=embedding_model, connection_args={"uri": URI} ) #chainの作成 chain = ( { "context": vector_store_loaded.as_retriever(), "question": RunnablePassthrough() } | prompt | llm | StrOutputParser() ) answer=chain.invoke(question) return answer

We will explain the functions that generate the answers.
First, the database that has been previously stored in Milvus is read.
It then creates a Langchain chain and uses the information from the Milvus database to generate answers.

6. Launching the Gradio interface

with gr.Blocks() as demo: gr.Markdown("# PDFからベクトルデータベースへ & 質問応答システム") with gr.Tab("質問応答"): question_input = gr.Textbox(label="質問を入力") answer_button = gr.Button("質問する") answer_output = gr.Textbox(label="回答") answer_button.click(answer_question, inputs=[question_input], outputs=answer_output) with gr.Tab("PDFをアップロードしてMilvusに保存"): pdf_input = gr.File(label="PDFファイルを選択",type="filepath") save_button = gr.Button("PDFをMilvusに保存") save_output = gr.Textbox(label="結果") save_button.click(process_pdf_to_milvus, inputs=[pdf_input], outputs=save_output) demo.launch()

Finally, we'll use gradio to put it into a simple chatbot format.
Here, the screen for entering questions and the screen for uploading the PDF are separated using gr.Tab.
We are creating the Chatbot UI in demo.launch.

Summary

In this series, we explained Microservices based development using NIM and explained how to create applications using sample code.
This time we implemented only the minimum necessary functions, but various applications are possible, such as vector database management functions, utilizing reranking models, and building pipelines using multiple LLMs.
We hope this article will be a step forward for you in developing your LLM application.

If you are considering introducing AI, please contact us.

For the introduction of AI, we offer a wide range of services, including the selection and support of NVIDIA GPU cards and GPU workstations, as well as algorithms for face recognition, trajectory analysis, and skeletal detection, and learning environment construction services. Please feel free to contact us if you have any questions.