Building an AI Agent with NVIDIA NIM and GPT-OSS Models in 5 Minutes

In this article, we'll explore how to perform inference and construct a straightforward agent application utilizing the NVIDIA-hosted NIM API for the new gpt-oss models from OpenAI. The gpt-oss-20b and gpt-oss-12b models are accessible as individual NIMs. You can interact with them via the NVIDIA-hosted NIM API or by downloading, deploying, and running them on your own infrastructure. This guide will concentrate on the NVIDIA-hosted service available at build.nvidia.com. We will also provide instructions for those who prefer to download and deploy the NIMs for local use.

Getting Started with NVIDIA NIM

To begin, navigate to build.nvidia.com to locate the models and learn how to integrate them into a Colab notebook. Once on the build.nvidia.com platform, you'll need to find the model you wish to use. For this demonstration, we will select the gpt-oss-20b model from OpenAI. Clicking on its corresponding tile will take you to the model's page.

The interface provides a chat-like environment for asking questions and observing the model's inference and reasoning processes. To access the model programmatically, select the 'View Code' option in the top-right corner. From there, generate an API key, which is essential for proceeding with the Colab notebook setup. After the key is generated, you can move to the next phase. With the API key from build.nvidia.com secured, the next step is to open a Colab notebook and begin the implementation.

Building the Agent: A Step-by-Step Guide

Inside the notebook, we will cover how to perform inference using both the responses API and the chat completions API. Additionally, we'll construct a simple web search agent using the OpenAI agents SDK.

Step 1: Environment Setup First, set the obtained API key as an environment variable for secure and easy access. Next, install the OpenAI Python SDK to interact with the API programmatically.

Step 2: Client Configuration Configuring the client is straightforward. You will instantiate the OpenAI client, providing the base URL for the NVIDIA-hosted NIM API and your API key for authentication.

Here is an example of how to set up the client: ```python import os from openai import OpenAI

Set the API key as an environment variable

os.environ["NVIDIAAPIKEY"] = "YOURAPIKEY"

Configure the client

client = OpenAI( baseurl = "https://integrate.api.nvidia.com/v1", apikey = os.environ["NVIDIAAPIKEY"] ) ```

It's worth noting that this implementation uses the responses API, which leverages the new Harmony response format. The NVIDIA NIM handles this integration seamlessly, so you can use the responses API as you normally would without any special configuration.

Step 3: Making API Calls With the client configured, you can now make calls to the responses API. Simply provide your prompt, the model name (gpt-oss-20b), and your input to the client.responses.create method. The responses API allows for specifying a reasoning_effort. For a simple query, setting it to 'low' is sufficient. While the model supports a context window of up to 128,000 tokens, you can manually set a lower max_tokens limit for concise answers.

For optimal performance, it is recommended to enable streaming (streaming=True) and handle the response accordingly. Upon sending the request, the model generates reasoning tokens before delivering the final, accurate response.

Here is an example API call: ```python

Example API call

response = client.responses.create( model="gpt-oss-20b", prompt="What are the main benefits of using NVIDIA NIM?", reasoningeffort="low", maxtokens=1024, stream=True )

for chunk in response: print(chunk.choices[0].delta.content, end="") ```

Tackling Complex Problems

For more challenging questions, such as a math problem from the AIME 25 dataset, you can set the reasoning_effort to 'high'. This prompts the model to generate a more extensive set of reasoning tokens to arrive at the correct solution, which it will provide in a boxed format.

Next, we will examine how to construct a simple web search agent powered by the gpt-oss-20b NIM.

Building a Web Search Agent in Just 7 Steps

Step 4: Install Dependencies To create the web search agent, you need to install the OpenAI agents SDK and the Tavily Python library for the search functionality. You will also need to acquire and set up a Tavily API key.

Step 5: Initialize an Async Client and Search Tool Initialize an asynchronous client, similar to the previous setup. Then, create a search tool by wrapping an async search function with the @function_tool decorator from the agents SDK. This tool will use a Tavily client configured with your API key. You must provide clear instructions and parameters for the model. The function should return a newline-separated string containing the titles and content from the Tavily search results.

Here is an example of the search tool: ```python from openai.agents import function_tool from tavily import TavilyClient

tavilyclient = TavilyClient(apikey="YOURTAVILYAPI_KEY")

@functiontool async def searchweb(query: str): """Performs a web search for the given query.""" try: response = tavily_client.search(query=query) return "\n".join([f"{r['title']}: {r['content']}" for r in response['results']]) except Exception as e: return f"Error during search: {e}" ```

Step 6: Use the Chat Completions Model for Tool Calling Currently, the NVIDIA-hosted NIM API does not support tool calling with the responses format. Therefore, you must use the chat completions model for this purpose. The setup process is identical to the client configuration described earlier. The remainder of the implementation follows the standard OpenAI agents SDK documentation. The agent is instructed to be a helpful assistant, providing responses tailored for enterprise executives and utilizing the search_web tool.

Step 7: Run the Agent As a final configuration step, disable tracing to prevent data from being sent to OpenAI. You can then execute the agent with a specific query, such as, "Briefly describe the enterprise benefits of NVIDIA NIM, 200 words or less." The agent will process the request, use its web search tool, and generate a concise, markdown-formatted response outlining the benefits.

This publication has demonstrated how to leverage the gpt-oss-20b and 12b NIMs for various tasks. To begin your own projects, visit build.nvidia.com to use the hosted API or download the NIMs for local deployment.

Author: Ahmed Bouchefra
Updated: 10 Aug 2025

Subscribe to our 10xDev Newsletter and Get These Free Books

Think Python

Python

Learn fundamental Python concepts while emphasizing problem-solving and algorithmic thinking.

Python for Data Analysis

Python

This book is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python.

Automate the Boring Stuff with Python

Python

Automate everyday tasks, including file handling, emails, spreadsheets, PDFs, web scraping, regular expressions, and APIs.

Small Python Projects

Python

Hands-on practice with 81 projects covering loops, functions, algorithms, game development, and automation.

Learn More Python 3 the Hard Way

Python

Structured guide to Python Python, covering data structures, algorithms, and object-oriented principles through rigorous practice.

The Quick Python Book

Python

Concise yet powerful coverage of Python syntax, libraries, and best practices for fast-track learning.

Python Beyond the Basics

Python

Focus on writing clean, professional Python code, design patterns, decorators, generators, and other advanced features.

Clean Code in Python

Python

Learn how to structure Python code for readability, scalability, and maintainability using SOLID principles and design patterns.

Awesome Vibe Coding

A curated list of vibe coding assistants, IDEs, tools and references for learning how to efficently collaborate with AI to write code. Best practices of prompt enginnering for developers. AI-powered tools for coding and UX/UI design. Our focus is on tools that allow for generating code or designs using natural language prompts.