How to Clone Any Website with an AI Agent and Firecrawl
So I cloned the Notion landing page, but I didn't just tell an AI agent to generate a copy. Instead, I gave it the actual link of the website, and it extracted all the metadata from Notion's landing page. Using that metadata, it created a one-to-one clone. It even pulled in the exact same images from the original site.
In this article, I'll show you how to set up the mCP server I used to scrape the website and provide my agent in Cursor with the necessary data to clone it accurately. And if you're using Windmill instead of Cursor, I'll also tell you about setting it up there. The process is nearly identical; you'll just need an API key, and I'll show you exactly where to get it.
The Firecrawl mCP Server
This is the GitHub repo for the mCP tool that I'm about to show you, and it's a pretty powerful tool. First, let me guide you through the installation process and share some information about the tool. Then, I'll show you how to use it and how I used it to clone the Notion landing page.
This is the Firecrawl mCP server GitHub repo. If you scroll down, you'll see that they've provided a very detailed installation guide. Everything is clearly explained, and I didn't find anything confusing.
Installation and Configuration
You can see the npx
command, and this is the command you'll be using with Cursor. You'll also need to enter your API key, and I'll show you how to obtain that in a moment.
npx firecrawl-mcp-server --api_key YOUR_API_KEY
The rate limitations are actually quite generous, and you can easily integrate it with Cursor. The same goes for Windmill. Additionally, you can install it locally and modify the files and set up if needed, but at the end, you'll still need the API key.
They've also provided the configuration steps for Windmill, including the exact file path where you need to place the configuration. You can copy and paste the provided configuration into the designated file, insert the API key, and you'll be good to go.
Further down, they've included a detailed explanation of the various tools the mCP server offers. However, the two main ones are the scrape
tool and the extract
tool.
Understanding the Tools
Here's the main difference between the scrape
tool and the extract
tool:
- Scrape Tool: This tool gathers all the raw data from the website. This includes HTML tags, section headers, and of course, the website's content, but in a raw format. Essentially, it captures both the structure and the content of the site, making it the best choice for our use case.
- Extract Tool: On the other hand, the
extract
tool focuses specifically on the content itself. For example, if you already have a website and you want to populate it with content from another site, you can use this tool. It extracts all the content, feeds it to the LLM, and then returns data in a structured format that you can use to populate your website.
Getting Your API Key
Now, let's move on to the API for the mCP server. To get the API key, go to the Firecrawl website and sign in. This will take you to the dashboard. As you can see, I already have two pages scraped, and they're logged in my usage history.
Your API key is displayed right on the dashboard. You can simply copy it.
Note: Make sure to remove the
FC
prefix from the key, as it is already included in the command. If you repeat it, your API key will become invalid, and the request won't be sent to the mCP server, resulting in a 401 error.
Now, for the extract
tool, it has its own separate tokens, which are also quite generous. However, this is still a paid tool. You can scrape up to 500 pages before needing to upgrade to a paid plan. That said, if you plan to use this regularly in your workflow, you'll eventually need to upgrade. But honestly, it's worth it. This tool is incredibly powerful; it extracts websites, gives you the data, and then you can do whatever you need with it, including asking your AI agent to generate full websites for you.
Integrating with Cursor
Once you have your API key, head over to Cursor. In the Cursor settings, under the mCP section, you'll need to add the mCP server. Once added, all the tools will become available to you.
Here is the configuration:
1. Name: This is just a nickname for the mCP server.
2. Type: This should be set to command
since we're running the npx
command.
3. Command: In the command field, paste your API key.
Once you've added it correctly (again, making sure not to repeat the FC
placeholder), go ahead and save the settings. If everything is set up properly, the tool will become available, and the indicator light will turn green. If it doesn't, try restarting Cursor, double-checking your API key, or simply pressing the refresh button. One of these should resolve the issue. And just like that, you'll have mCP integrated into Cursor.
If you prefer, you can also set it up in Windmill. The process is similar; you just need to paste the config file in the specified location, and you'll have it up and running.
Cloning the Notion Landing Page in Action
All right, so I'm here in Cursor. You can see that I've already set up the mCP server that we'll be using. On this side, I have my Next.js app; it's a completely blank app, nothing inside it, just the standard template. And over here, I have the prompt that I'll be using to ask it to clone the Notion landing page using the Firecrawl mCP server.
Prompt: "Clone the Notion landing page using the Firecrawl mCP server."
Now, I have to specify the Firecrawl mCP server because if I don't, it'll start using the agent, meaning it will attempt to create the clone itself instead of using the metadata or calling the mCP tool for website data. Also, you need to be in agent mode for this to work properly. You can choose any model you like; I've just chosen the Claude 3.7 Sonnet model.
So, let's go ahead and give it the prompt.
You can see it's now calling the mcp2 tool, specifically the Firecrawl scrape
tool. It automatically detected the Notion webpage link and is going to use that. Now, let's go ahead and run the tool.
It looks like it needs a longer timeout, probably because the website or the metadata it needs to extract is larger than usual. So, it's making another call to the mCP tool. Let's go ahead and run it again.
All right, it has started the generation, which means it successfully extracted the metadata. If you open it up, you'll see all the extracted data right there. Now, it's generating the website based on that metadata. So, let's see what it comes up with.
At this point, it's requesting permission to create the components
directory to store the various components it has identified from the metadata. I also want to point out that it's currently creating the Features
component as well as the Hero
component, and if you scroll down, you'll see it's generating the AISection
component as well. Now, all of this is happening because it understands the website structure, including metadata and even the image links. So, it's able to generate a near-identical clone. Pretty cool.
Based on that metadata, it's also creating other components, such as the ProductCard
component.
Now that all the components have been created, it's asking us to navigate into the directory and run the front-end server. So, let's go ahead and execute the command.
npm run dev
Everything has been cloned successfully. Now, let's open the provided link in our browser and see what it actually created.
The Result
All right, so this is the web landing page that it cloned and returned, and I have to say, it's almost a perfect one-to-one clone with all the sections intact. However, it seems that the styling might not have been included in the metadata because it defaulted to dark mode. I'm not sure why, maybe an error with the AI, but because of that, some elements look a bit off. For example, the logo and the icons further down the page don't appear quite right.
But aside from that, it's extremely accurate. It even pulled in all the same images from the Notion landing page. These are the exact images from the original site, meaning the mCP server successfully crawled through, extracted the data, and applied it to our website. It's a really powerful and impressive tool.
That said, there are a few issues, like images being cut off and the icons not contrasting well with the background. Some icons are even missing entirely.
Now, I was using Cursor for this, but I probably should have used Windmill because of its new click-to-select-and-edit feature, which is amazing. However, I haven't set up the mCP integration in Windmill yet, which is why I went with Cursor instead.
Overall, though, this is an incredibly powerful mCP integration, and it's a fantastic tool for front-end development. This mCP service is incredibly powerful and simplifies website cloning like never before.
Join the 10xdev Community
Subscribe and get 8+ free PDFs that contain detailed roadmaps with recommended learning periods for each programming language or field, along with links to free resources such as books, YouTube tutorials, and courses with certificates.