An Honest Review of Google's Gemini CLI: Hype vs. Reality |

The paradox of choice is a psychological concept suggesting that the more options we have, the less satisfied we feel with our decision. This occurs because having too many choices requires more brain effort, which leads to decision fatigue and regret. However, here's an example. What if while trying to pick potato chips in the supermarket aisle, you find one bag from a really good brand you like, but it's free? Not only is it free, but it's also delicious. Would this kill your decision-making process right there and then?

A similar phenomenon seems to be happening with CLI coders right now. Claude CLI was the first, but then OpenAI released something similar. Then came AWS, and late in the game, Google arrived with Gemini. But Google knew they were very late. So not only were they the only ones to fully open-source the tool, but they also made it pretty much free.

Is this obvious advantage a good enough answer for its popularity? Is this really why everyone is going crazy over Gemini, or are we just hearing about it all of a sudden because someone wants us to?

Pushing the Limits: A Real-World Test

To find out, we took it for a spin. The goal was not just to build a tic-tac-toe game but to push things further. We wanted to see if it's possible to go full SOA (Simulated Over-employment Automation). Can one build something that would allow a developer to work like several senior developers simultaneously? In all seriousness, while this is an amusing task, we are also going to let Gemini do some agentic work and build something more complex. Let's see if the hype is real.

As mentioned, the Gemini CLI is fully open-source. It's a CLI written in TypeScript. It can be installed or run directly with npx, which is pretty cool. It suggests a gemini.md file for guardrails and general context, which we'll touch on later. But for now, let's build.

Building a Production-Ready Code Review Agent

The first task was to build a small application that connects to GitHub. We're after a production-ready, autonomous code review agent. A long prompt was built to customize the PR title and address common issues.

It starts with a plan, without even having to be asked, which is then approved. It then asks for the GitHub repository and branch to monitor for changes. As with other CLI utilities, it needs permissions to perform different tasks, much like Amazon's Q developer.

In order to work properly, it requires a GitHub token and an OpenAI API key. It's interesting how a Gemini model running inside the Gemini CLI eventually builds an agent that runs with OpenAI.

After a lot of back and forth, a proof of concept was reached. A Python script, main.py, deleted its own previously non-functional branch, created a review, pushed it, and suggested a pull request. Not 100% what was asked, but a great start. A quick look through GitHub, however, showed the PR was quite dull in its changes, to say the least.

This is a common frustration with models like this. When you throw more than one task at them, they never seem to comprehend or follow everything. So, with a lack of a better option, the instructions were repeated.

Running again on a new branch and over to GitHub, the result was a good start with a long list of review items, from hard-to-read code through edge cases not being handled, lack of error handling, and more.

Testing Advanced Agentic Capabilities

This is not enough, though. How about we test its agent capabilities? An idea from a cloud subreddit was picked up, where a user had come up with a prompt to see whether the tool can actually create a separate, parallel task. How does it handle the concept of sub-agents, like you'd expect in platforms like n8n? Would it yield partially sub-agent documents serving as logs and then combine everything into one large analysis?

While Gemini did come up with something, the results are best shown. The prompt goes like this:

I want five agents reviewing source files in parallel. They should focus on potential bugs, security vulnerabilities, and best practices.

Divide up the workload evenly between sub-agents.

Have each sub-agent write their final analysis on a separate file.

Lastly, a few bash commands to concatenate everything into a master analysis file and go through the sub-agents' markdown files and empty their content.

This gives a good indication of how the instructions are followed. Are the files there? Do they have any content? And what's the master analysis file like?

After a few seconds of thinking, it had located all Golang files and started working on the log files with permissions to read, write, and create. At least according to Gemini, it runs in parallel. 50 seconds in, it hits a bunch of self-made errors with non-absolute paths, which it then fixes. It was allowed to write files to new paths, and we're at the final stages of concatenating the files into one and then emptying the rest.

Let's see how it went. The master analysis file is here. Over 200 lines of potential issues, DB connection management suggestions, hard-coded cron schedules, some more hard-coded values, etc. All in all, a great review. The sub-agent files are empty as instructed, which is quite impressive.

So, not bad work from Gemini, or should we say Gemini 1.5 Pro, which applied the instructions directly to a small-sized codebase, came up with a review, followed tasks, and as we've seen, pushed it to a branch.

Securing the Agentic Workflow

This brings up a huge question. For the agent to comment on a GitHub PR, it needs to access GitHub as the user. How can this be done securely? The old way is terrible. One could give it a password or maybe a personal token and then save that secret directly in the code. That's a massive security hole just waiting to be exploited. How do we make sure this agentic flow doesn't lose touch with reality and do things it shouldn't?

Instead of giving a personal agent a password or token, we're going to let the agent request access to GitHub on the user's behalf.

First, a quick setup in the Auth0 dashboard to create a new application and configure it to connect to GitHub.
Now that the API is available, the Gemini agent is given the Auth0 domain and client ID, but notice, not the client secret. That will be saved behind in the environment.
The last piece of the puzzle is GitHub's API, which in order to access Auth0, needs to be configured. Under APIs, a new one is created, adding api.github.com as the identifier to work with.

Once created, there's a set of instructions that are already applied to the code. Now, the agent is asked to start the process of commenting on that PR. Watch what happens.

The script runs, and instead of just working, it's asking for permission. It'll pop the browser, and it's a standard, secure consent screen that Auth0 hosts. It says the user needs to log in with a user and password that's created in Auth0. Then, the application is requesting to access the user's GitHub account. The user is in complete control. After clicking accept and being redirected, it's back to the terminal where the agent can continue. It has the access it needs.

So what just happened? The agent initiated the request. The user securely proved their identity and gave consent to Auth0. And then Auth0 handed a short-lived, secure pass directly to the agent that can now use GitHub. The agent never saw the password or any permanent secret. And the best part, it will get a new, fresh token every time it needs to run. No need to manage tokens, no need to store secrets in the code, and the agent can now securely access GitHub to finish its work.

The Importance of Guardrails

Before summing things up, it's important to reiterate the concept of guardrails for general work with models on a codebase. To save both tokens and hours of anger management courses, use a markdown file. Gemini accepts gemini.md, which can be referenced globally on ~/.gemini/gemini.md. And while global instructions are great, getting specific is always better.

This prompt was used to come up with a boilerplate, telling it the context is an expert developer that's required to analyze the directory and create the file:

Naturally, it would come up with the very basics with an overview of where general things are, what libraries are used, and this saves time of coming up with the same context in every new conversation.

But it's your job to enrich it with specific instructions: where to read the schemas, how to create a DB connection, or read the user's name instead of making up its own systems, which is seen every time it runs.

An Honest Verdict: Hype vs. Reality

So, how do we feel about the CLI? Let's be honest here, and this doesn't have to do with one CLI or another. It doesn't even have to do with a model or a provider. These tools can do a lot. They're great at front-end if you need simple work. They're okay with back-end if you can break down the tasks and chew them for them. However, and we have to be honest here, they're terrible with DevOps and architecture.

Here's a hot take: it feels like whoever hypes these has either never coded a feature, let alone a project with these AI tools, or just has something to gain or lose from doing it. The value is there, it really is. But as we see it, it's not there yet. It can't live without a thorough review, which begs the question: is it actually improving efficiency and time?

Gemini is a beast. There's no doubt. Unfortunately, this isn't a trend you can count on. And by the time this article goes out, there will probably be three new crazy announcements. So, take these humble opinions with a pinch of salt.