NVIDIA NeMo™ Agent Toolkit

What is NVIDIA NeMo™ Agent Toolkit?

Have you ever wished for a system to comprehensively manage the development process of an AI agent that you built in-house on a trial basis, in order to put it into full-scale operation?

Or, when building an AI agent in-house, have you ever wanted a basic system to facilitate smooth development?

The software library that meets these needs is the NVIDIA NeMo™ Agent Toolkit, which we will introduce below.

The NVIDIA NeMo Agent Toolkit is an open source toolkit that covers the entire agent building process.

It comprehensively covers major AI application frameworks and enables centralized management of AI agents.

NVIDIA NeMo Agent Toolkit Features

Framework-independent

Works with LangChain, LlamaIndex, CrewAI, Microsoft Semantic Kernel, and even your own enterprise frameworks.
It is not dependent on any particular agent framework, database, or data source.

・Reusability

The NeMo Agent Toolkit includes agents, tools, and agent workflows.

They act as function calls that work together within your application and can be composed together, allowing you to build them once and reuse them in many different scenarios.

Rapid development

Start with a pre-built agent, tool, or agent workflow and customize it to suit your needs.

Profiling

Use the Profiler to profile the entire agent workflow.

By tracing input and output token timing, you can identify bottlenecks.

Observability

You can monitor and debug agent workflows using examples powered by W&B Weave and Phoenix.

・Evaluation system

Validate and maintain the accuracy of your agent workflows with built-in evaluation tools.

　
User interface

NeMo Agent Toolkit's UI chat interface allows you to interact with agents, visualize their output, and debug their workflows.

・MCP support

Compatible with Model Context Protocol (MCP). Use NeMo Agent Toolkit as an MCP client,
You can connect to a remote MCP server and use tools, or you can use NeMo Agent Toolkit as an MCP server and publish tools via MCP.

What can you do with NVIDIA NeMo Agent Toolkit?

Prototyping AI Agent Applications

The NeMo Agent Toolkit includes pre-built agents. As of August 2025, the following agents are available:

・ReAct Agent

・Tool Calling Agent

Reasoning Agent

・ReWOO Agent

You can quickly try out these pre-built agents by providing them with user data.

You will need a separate LLM that runs on the NVIDIA NIM™ API or an API compatible with OpenAI Chat Completion, but you can get the API key provided by NVIDIA for developers.

You can connect various LLMs to the NeMo Agent Toolkit and try them out (free for evaluation purposes, limited number of tokens per unit time).

When using the tools included in the NeMo Agent Toolkit, no special programming is required on the user's part; agent workflows can be built by simply editing configuration files.

Evaluating the performance of RAG

By preparing an LLM to be evaluated and an LLM to be used as an evaluator, the performance of RAG (Retrieval-Augmented Generation) can be automatically evaluated.

This is commonly called LLM-as-a-Judge and is built into the NeMo Agent Toolkit.

Regular manual evaluations require a great deal of effort, but automating them makes it possible to speed up the improvement cycle.

The evaluation indicators are as follows:

-Accuracy of answers

Contextual relevance

・Reasonableness of the answer

The obtained evaluation indicators allow users to take the following measures to improve the situation:

- Change chunk size for text extraction

- Changed the chunk separation method for text extraction

- Changed the text embedding model

- Changing the LLM or fine-tuning the LLM

・Introducing a model that re-evaluates (re-ranks) text chunks that are candidates for RAG

Identify bottlenecks

The profiler allows you to collect LLM usage statistics.

The collected statistical information is saved in a directory specified by the user, and by analyzing this information, the user can identify bottlenecks in the application.

Examples of items that can be analyzed are as follows:

- Prompt token usage predicts tool invocation failures and determines whether the LLM in use is appropriate for the application's tasks.

- Know whether each LLM is efficient from the workflow execution time.

Understand where LLMs spend most of their time and identify potential bottlenecks.

Debugging the Agent

Events in agent workflows can be traced and sent to external tools such as W&B Weave and Phoenix.

Within an agent workflow, many LLM calls and many tool calls occur.

This allows for efficient debugging to identify which malfunctions ultimately led to a drop in the accuracy of the answers.

The screenshot below shows the tracing results using W&B Weave.

Adding a data source to an existing agent

It allows you to connect data sources in a unified way to agents built into the NeMo Agent Toolkit and to user-developed agents plugged into the NeMo Agent Toolkit.

For example, you can connect to an MCP (Model Context Protocol) server to support new data sources.

How Macnica can help

AI agent construction support

AI agents are a technology that has the potential to create innovative businesses and services. I believe many people are maximizing their potential and turning their amazing ideas into reality.

However, as we take on new technologies, it can sometimes be difficult to balance our day-to-day work with the complexities of new technologies.

To support such challenges, Macnica offers an AI agent construction support service.

This is a two-month program that utilizes NVIDIA products, including the NVIDIA NeMo Agent Toolkit, to accompany you from the basics of AI agents to implementation tailored to specific use cases. If you are interested, please see the link below for more details.

AI agent construction support