Building an AI Agent with NVIDIA NIM and GPT-OSS Models in 5 Minutes
In this article, we'll explore how to perform inference and construct a straightforward agent application utilizing the NVIDIA-hosted NIM API for the new gpt-oss models from OpenAI. The gpt-oss-20b and gpt-oss-12b models are accessible as individual NIMs. You can interact with them via the NVIDIA-hosted NIM API or by downloading, deploying, and running them on your own infrastructure. This guide will concentrate on the NVIDIA-hosted service available at build.nvidia.com. We will also provide instructions for those who prefer to download and deploy the NIMs for local use.
Getting Started with NVIDIA NIM
To begin, navigate to build.nvidia.com to locate the models and learn how to integrate them into a Colab notebook. Once on the build.nvidia.com platform, you'll need to find the model you wish to use. For this demonstration, we will select the gpt-oss-20b model from OpenAI. Clicking on its corresponding tile will take you to the model's page.
The interface provides a chat-like environment for asking questions and observing the model's inference and reasoning processes. To access the model programmatically, select the 'View Code' option in the top-right corner. From there, generate an API key, which is essential for proceeding with the Colab notebook setup. After the key is generated, you can move to the next phase. With the API key from build.nvidia.com secured, the next step is to open a Colab notebook and begin the implementation.
Building the Agent: A Step-by-Step Guide
Inside the notebook, we will cover how to perform inference using both the responses API and the chat completions API. Additionally, we'll construct a simple web search agent using the OpenAI agents SDK.
Step 1: Environment Setup First, set the obtained API key as an environment variable for secure and easy access. Next, install the OpenAI Python SDK to interact with the API programmatically.
Step 2: Client Configuration Configuring the client is straightforward. You will instantiate the OpenAI client, providing the base URL for the NVIDIA-hosted NIM API and your API key for authentication.
Here is an example of how to set up the client: ```python import os from openai import OpenAI
Set the API key as an environment variable
os.environ["NVIDIAAPIKEY"] = "YOURAPIKEY"
Configure the client
client = OpenAI( baseurl = "https://integrate.api.nvidia.com/v1", apikey = os.environ["NVIDIAAPIKEY"] ) ```
It's worth noting that this implementation uses the responses API, which leverages the new Harmony response format. The NVIDIA NIM handles this integration seamlessly, so you can use the responses API as you normally would without any special configuration.
Step 3: Making API Calls
With the client configured, you can now make calls to the responses API. Simply provide your prompt, the model name (gpt-oss-20b
), and your input to the client.responses.create
method. The responses API allows for specifying a reasoning_effort
. For a simple query, setting it to 'low'
is sufficient. While the model supports a context window of up to 128,000 tokens, you can manually set a lower max_tokens
limit for concise answers.
For optimal performance, it is recommended to enable streaming (streaming=True
) and handle the response accordingly. Upon sending the request, the model generates reasoning tokens before delivering the final, accurate response.
Here is an example API call: ```python
Example API call
response = client.responses.create( model="gpt-oss-20b", prompt="What are the main benefits of using NVIDIA NIM?", reasoningeffort="low", maxtokens=1024, stream=True )
for chunk in response: print(chunk.choices[0].delta.content, end="") ```
Tackling Complex Problems
For more challenging questions, such as a math problem from the AIME 25 dataset, you can set the reasoning_effort
to 'high'
. This prompts the model to generate a more extensive set of reasoning tokens to arrive at the correct solution, which it will provide in a boxed format.
Next, we will examine how to construct a simple web search agent powered by the gpt-oss-20b NIM.
Building a Web Search Agent in Just 7 Steps
Step 4: Install Dependencies To create the web search agent, you need to install the OpenAI agents SDK and the Tavily Python library for the search functionality. You will also need to acquire and set up a Tavily API key.
Step 5: Initialize an Async Client and Search Tool
Initialize an asynchronous client, similar to the previous setup. Then, create a search tool by wrapping an async search function with the @function_tool
decorator from the agents SDK. This tool will use a Tavily client configured with your API key. You must provide clear instructions and parameters for the model. The function should return a newline-separated string containing the titles and content from the Tavily search results.
Here is an example of the search tool: ```python from openai.agents import function_tool from tavily import TavilyClient
tavilyclient = TavilyClient(apikey="YOURTAVILYAPI_KEY")
@functiontool async def searchweb(query: str): """Performs a web search for the given query.""" try: response = tavily_client.search(query=query) return "\n".join([f"{r['title']}: {r['content']}" for r in response['results']]) except Exception as e: return f"Error during search: {e}" ```
Step 6: Use the Chat Completions Model for Tool Calling
Currently, the NVIDIA-hosted NIM API does not support tool calling with the responses format. Therefore, you must use the chat completions model for this purpose. The setup process is identical to the client configuration described earlier. The remainder of the implementation follows the standard OpenAI agents SDK documentation. The agent is instructed to be a helpful assistant, providing responses tailored for enterprise executives and utilizing the search_web
tool.
Step 7: Run the Agent As a final configuration step, disable tracing to prevent data from being sent to OpenAI. You can then execute the agent with a specific query, such as, "Briefly describe the enterprise benefits of NVIDIA NIM, 200 words or less." The agent will process the request, use its web search tool, and generate a concise, markdown-formatted response outlining the benefits.
This publication has demonstrated how to leverage the gpt-oss-20b and 12b NIMs for various tasks. To begin your own projects, visit build.nvidia.com to use the hosted API or download the NIMs for local deployment.