Slash AI Agent Costs by 98% with This Novel Tool Discovery Method
Instead of pre-loading AI agents with a fixed set of tools, what if they could discover and create them on their own? A recent paper introduces a novel approach called MCP-0, which provides an alternative method for how AI agents are equipped with tools. This technique has shown remarkable results, including a 98% reduction in AI agent costs, significantly faster performance, and a massive drop in token usage. This article will explore this approach in detail and demonstrate how you can apply it to your own agentic use cases.
The Problem with Traditional Tooling
Currently, the common way to provide tools to an AI agent is to passively list all the tools it might potentially use within its instructions or prompt. For use cases with a limited number of tools, this is acceptable. However, for many complex scenarios, this method presents significant challenges.
Let's examine the two main traditional approaches and their drawbacks.
Approach A: Static Tool Lists
In this method, we simply list all available tools and their corresponding servers directly in the agent's instructions. The challenges here are numerous:
- Wasted Tokens: If you have many tools, you create a very lengthy prompt, wasting a large number of tokens just to list options the agent may never use.
- Increased Confusion: A larger prompt with more tools increases the risk of confusing the AI agent, leading to a higher chance of failure or incorrect tool selection.
Approach B: Retrieval Augmented Generation (RAG)
Another method involves using Retrieval Augmented Generation (RAG) to select a tool. Based on the user's query, the system converts the query to word embeddings and performs a similarity search to find a relevant tool name or description.
The primary challenge with this approach is that relying solely on the user's query for similarity search is often not enough. A user's query can be complex and may require multiple tools that cannot be fetched accurately with a single similarity search.
The MCP-0 Solution: Agent-Led Tool Discovery
The MCP-0 paper proposes a more intelligent architecture. Instead of giving the agent any tools upfront, we let the agent tell us what it needs.
The workflow is as follows: 1. The user submits a query to the AI agent. 2. The agent analyzes the query and determines what kind of tool it needs to proceed. It specifies this need itself. For example, it might state, "To help you, I need a tool that can fork a repository." 3. The system then performs a similarity search based on the need specified by the agent, not the user's initial query. 4. It ranks the available tools and brings the most relevant one to the agent. 5. The agent uses the tool and the process continues iteratively. For the next step, the agent might say, "Now I need a tool that functions like a knife," and the system retrieves that. This loop continues until the job is finished.
By following this approach, the paper's authors tested it against a repository of almost 400 MCP servers with thousands of public tools. A single MCP server definition can exceed 4,000 tokens. Imagine providing hundreds of these to an agent at once—it's incredibly inefficient. MCP-0 avoids this by only retrieving the one or two tools the agent needs at any given moment.
A Practical Example: Debugging Code
Here’s a step-by-step example of how MCP-0 works in a real-world scenario:
- User Query: "I want you to debug my code," followed by the code snippet. No tools are provided to the agent.
- Agent's First Need: The agent responds, "I can fix that for you, but first I need to read the file." It generates a tool requirement:
I need a tool from an MCP server that lets me read a file by its file path.
- System Retrieval: The system takes this requirement, performs a similarity search across its tool database, and finds the best match—a
read_file
tool. It provides this tool to the agent. - Agent's Second Need: The agent invokes the tool, reads the file, and identifies the issue. It then says, "To fix this, I need to make some file changes. I need a tool to edit a file."
- System Retrieval: The system performs another similarity search and retrieves an
edit_file
tool. - Agent's Final Need: The agent edits the code and concludes, "I just want to run the changes I made to ensure it is working properly. I need a tool that lets me execute code."
- System Retrieval: A final similarity search provides an
execute_code
tool. The agent runs the code and confirms the bug is fixed.
Note: You don't necessarily need to run a RAG process for every single tool. You could batch the agent's needs, retrieve a few related tools at once, and present them to the agent to reduce the number of search executions.
Astonishing Performance Gains
The benchmark results are compelling. When using traditional approaches with models like GPT-4, Gemini, and Claude, the average token cost increases significantly as more tools are added.
However, with an agent built using MCP-0, the token cost remains flat even with thousands of available tools. This is because the agent's context only ever contains the one or two tools it is actively using. The study demonstrated up to a 98% reduction in average token usage. This translates to: * A 98% potential cost reduction. * Shorter, more concise prompts. * A more accurate and less confused agent. * A significantly faster agent.
Putting MCP-0 to the Test: A Demo
To test this approach, I implemented two versions of an AI agent using Google's ADK framework.
The Old Way: Hardcoded Tools
The first implementation uses the traditional approach. A number of different tools for web searching, news analysis, and more are defined as Python functions and listed for the agent.
# 1.py - Traditional Approach
# Dummy functions to mimic tools
def search_web(query: str): return "Static web search result"
def search_news(query: str): return "Static news result"
def analyze_page(url: str): return "Static page analysis"
# ... many more tool functions
# List of all tools provided to the agent
all_tools = [search_web, search_news, analyze_page, ...]
# Create agent with the full tool list
agent = MyAgent(
model="gemini-2-flash",
tools=all_tools,
instructions="You have access to these tools..."
)
The New Way: Dynamic Tool Retrieval
The second implementation uses the MCP-0 approach. The key difference is a function that calculates the embedding of the tool requested by the agent and performs a cosine similarity search against a database of tool embeddings.
# mcp_zero.py - MCP-0 Approach
def get_embedding(text: str):
# ... returns embedding for the text
def find_similar_tools(agent_tool_request: str):
request_embedding = get_embedding(agent_tool_request)
# ... performs cosine similarity search against tool database
# ... returns top 5 matching tools
# In the main agent loop:
agent_need = agent.ask("What tool do you need?")
retrieved_tools = find_similar_tools(agent_need)
# Create agent dynamically with only relevant tools
dynamic_agent = MyAgent(
model="gemini-2-flash",
tools=retrieved_tools,
instructions="Here are the tools you requested..."
)
The Tool Universe: The JSON Data
The similarity search is performed on a JSON file containing definitions for over 300 MCP servers and thousands of associated tools, sourced from the official MCP GitHub repository.
[
{
"name": "AgentQL-MCP",
"description": "AgentQL is a query language for web agents...",
"url": "https://mcp.agentql.com",
"summary": "This MCP provides tools for web automation and data extraction using AgentQL.",
"tools": [
{
"name": "agentql.start_session",
"description": "Starts a new web browsing session..."
}
],
"embeddings": [ ... ]
}
]
Side-by-Side Comparison
Here are the results from running both approaches with a couple of test queries.
Query 1: "Search for information about Python programming"
Traditional Approach Results:
- Loaded all 17 available tools, including irrelevant ones like
get_weather
andget_github_issue
. - Estimated context tokens: 850.
- The agent reported confusion: "There are multiple search tools that have overlapping purposes... There are too many similar options I need to choose."
- Execution time: ~3 seconds.
- Loaded all 17 available tools, including irrelevant ones like
MCP-0 Approach Results:
- Step 1 (Agent Need): The agent stated, "I need a tool to search the web for information about Python."
- Step 2 (Tool Retrieval): A similarity search returned the top 5 most relevant tools:
web_search
,documentation_search
,github_search
, etc. It selected the best match,web_search
. - Step 3 (Execution): The agent used the selected tool to get the result.
- Execution time: ~1.9 seconds. The context was much smaller, and the agent was not confused.
Query 2: "What's the weather like in San Francisco?"
Traditional Approach Results:
- Again, it listed all 17 tools, wasting context on irrelevant options.
MCP-0 Approach Results:
- Step 1 (Agent Need): The agent identified the need for a weather tool.
- Step 2 (Tool Retrieval): The similarity search correctly identified five highly relevant tools:
get_current_weather
,weather_forecast
,get_temperature
, etc., and selected the best one. - Step 3 (Execution): The agent responded with the weather information.
- Execution time: ~1 second.
Conclusion
The MCP-0 approach is a creative and powerful solution for agentic use cases that involve a large number of tools. By allowing the agent to lead the tool discovery process, you can significantly improve precision, reduce costs, and build faster, more reliable AI agents. While it may not be the perfect fit for every scenario, it is a valuable technique worth considering for your next project.