AI Agents Explained: Build Your Own in Minutes
In this article, I'll walk you through the simplest way to understand what an AI agent is, its different components, how it takes action, and then I'll show you how to build a simple one in just a few minutes. There's a lot of information out there about AI agents, so I'm going to break it down as simply as possible. We'll dive into n8n, and I'll guide you step-by-step to get an email agent up and running. You might be surprised by how quickly you can get one running.
Understanding the Core Components of an AI Agent
An AI agent can be visualized with a diagram showing its different aspects. The core of this setup is the AI agent itself. Think of it as an entity, like a digital employee, that can understand instructions and has a 'brain'.
When we talk about an AI agent, there are several key components:
- The Brain: This is a large language model (LLM) like GPT-4, Claude 3.5, or similar. It's the core processing unit that understands language and makes decisions.
- Memory: Memory provides the agent with the context of the ongoing conversation, making it feel more human-like and conversational. Instead of treating each interaction as a new one, the agent maintains a running history of the conversation, allowing its future responses and actions to be more relevant.
- Instructions (System Prompt): These instructions define the agent's role, how it should act, and what tools it has available. This is commonly known as a system prompt or system message. This is distinct from user messages, which are the actual inputs from the user during an interaction. The user message is the dynamic input for each interaction, while the system message provides a constant set of instructions.
- Tools: This is where the magic really happens. Tools give the large language model the ability to take real-world actions. By adding multiple tools, the agent can use its brain and instructions to analyze the user's message and select the appropriate tool for the task.
- Output: Finally, the agent produces an output. This could be a confirmation that an action was taken or information retrieved using a tool.
This basic structure—input, agent (with brain and memory), and output—is similar to how a standard chatbot like ChatGPT works. The real power emerges when you introduce tools.
The agent we'll build today follows this model. The process starts with an input (a chat message). The agent, using its brain (a chat model) and memory, processes the user's request. It then uses a tool—in this case, a 'send email' function—to execute the task. Finally, it provides an output confirming the action, such as "The email was sent."
Building a Simple Email Agent with n8n
Let's walk through how to build this simple email agent in n8n.
Step 1: Setting Up the Agent and Its Brain
First, in n8n, add a new 'AI Agent' node. This will automatically connect to a 'Chat Message Received' trigger, which serves as our input. Initially, the agent has no 'brain,' so you'll need to add one. Under the agent's options, add a Chat Model. From the list of LLMs, select an 'OpenAI Chat Model'.
You'll need to connect your OpenAI account by creating a new credential and providing an API key. You can generate one from your OpenAI account dashboard under 'API keys'. Paste the key into n8n and save it. The connection should turn green, indicating success.
Note: Ensure your OpenAI account has sufficient credits to avoid 'insufficient funds' errors.
Now, you can test the agent by sending a 'hello' message. With its brain connected, it will respond.
Step 2: Adding Memory
With the brain functioning, the next step is to add memory. Without memory, if you tell the agent your name and then ask "What's my name?", it won't remember. It has no context from previous messages.
To fix this, add a Memory node to the agent. The 'Window Buffer Memory' is a simple and effective option for this use case. This node can be configured to remember a specific number of past interactions (e.g., the last five messages). After connecting the memory, you can repeat the test. Tell the agent your name, and when you ask again, it will remember.
Step 3: Adding a Tool to Send Emails
Now it's time to add a tool. This will allow the agent to perform actions. Add a Gmail tool to the agent. You'll need to connect your Google account credentials. n8n provides documentation to guide you through the OAuth setup.
To configure the tool, you need to specify the To, Subject, and Message fields for the email. To make these fields dynamic, use an Expression. n8n has a powerful function that allows the AI to extract information from the user's query. By setting the 'To' field to an expression like ``, you instruct the AI to find the recipient's email address in the user's message.
Apply the same logic for the subject and message body using different keys, such as subject
and email_body
. Name this tool node 'send_email' for clarity. This helps the agent identify what the tool does.
Step 4: Configuring the System Message
The final core component is the system message (or prompt). This provides the agent with its instructions. A good system message defines the agent's role, personality, and, most importantly, what tools it has and when to use them.
You can use an LLM like ChatGPT to help generate a detailed system prompt. Provide it with the context of your agent and the tools it has. A well-structured prompt, often in Markdown, might include sections for an overview, context, instructions, a list of tools and their functions, and examples. Paste the generated prompt into the 'System Message' field in the n8n agent node.
Step 5: Testing the Complete Agent
Now, it's time to test the fully assembled agent. Give it a prompt like: "Can you send an email to [email protected] asking him how his day was?"
The agent will process the input, use its brain and system message to understand the task, and trigger the email tool. By inspecting the tool's execution, you can see that it correctly extracted the recipient, subject, and body from the query.
You might notice the email has a generic sign-off. You can refine the system prompt by adding a final instruction, such as "Always sign off the emails from Frank," to make the output more consistent. Checking your Gmail, you'll find the email sent by the agent.
The agent's logs provide a step-by-step breakdown of its thought process: it receives the message, stores it in memory, consults the chat model and prompt, identifies the correct tool ('send_email'), extracts the necessary parameters, executes the tool, and then formulates a response to the user.
Enhancing the Agent with a Contact Database
To make the agent even more powerful, you can connect it to a contact database so you don't have to type the full email address every time. For this example, we'll use a simple Google Sheet as a contact database, but this could be any CRM, Airtable, or other data source.
Add a new Google Sheets tool to the agent. Configure it to read from your contact spreadsheet and name the tool 'contact_database'.
Of course, the agent doesn't know about this new tool yet. You need to update the system prompt. You can go back to your LLM and ask it to refine the previous prompt, adding instructions for the new 'contact_database' tool. The new instructions should tell the agent to use this tool to look up contact information like email addresses. Update the system message in n8n with this new, more detailed prompt.
Testing the Multi-Tool Agent
Now, you can test the agent with a query that requires multiple tools, like: "Can you send an email to Phil letting him know that I will not be at work today?"
The agent's logs will show a more complex chain of thought: it first queries the contact database to find Phil's email, then uses that information to execute the 'send_email' tool. The result is a correctly sent email, even though you only provided a name.
Conclusion and Next Steps
While this was a simple build, it demonstrates how quickly and easily you can create powerful AI agents. Features like n8n's fromAI
function make it straightforward to add more tools and expand your agent's capabilities.
The possibilities don't stop here. You can integrate various inputs and outputs. Instead of the built-in chat, you could connect the agent to Slack, Telegram, or even a phone number using services for voice. This allows you to build a complete backend application that you can interact with from any interface.