Skip to main content

Command Palette

Search for a command to run...

Calling It an "AI Agent" Was Never the Point

Published
6 min read
Calling It an "AI Agent" Was Never the Point
F
I build things that work. My writing focuses on practical AI implementation, real-world tech solutions, and the pragmatic decisions that make software actually ship.

Everyone is building AI agents now. Or at least, that's what they're calling them.

The word has become a badge. Slap it on a product, a GitHub repo, a job description, and suddenly everything sounds more serious. More cutting-edge. More fundable. But here's what nobody stops to ask: what makes something an "agent" in the first place?

Because when you actually look under the hood — at the real architecture, the real decision flow, the real level of autonomy — the picture gets a lot more complicated. And a lot more interesting.

The Label Problem

"Agent" has become one of those words that means everything and therefore means nothing. Frameworks brand themselves with it. LinkedIn bios claim it. Product launches announce it. And somewhere in that noise, the actual definition got lost.

Which creates a real problem — not a philosophical one, but a practical one. If you don't know what an agent actually is, you can't make a conscious decision about whether to build one. You're just reaching for the trendiest word in the room.

So let's get precise.

What Actually Makes Something an Agent

Anthropic draws a clean line between two kinds of systems. The first is a workflow — where the code decides the sequence, and the LLM fills in content at designated steps. The second is an agent — where the LLM itself decides what to do next, which tools to call, and when the task is done.

The critical word is dynamically. Not whether an LLM is involved. Not whether the output sounds intelligent. But whether the LLM has actual decision-making authority over the execution flow.

Think of it this way. A GPS that follows a pre-planned route is a workflow. A GPS that detects a road closure mid-journey, re-evaluates its options, and reroutes itself — that's an agent. Both get you somewhere. Only one is truly in control.

It's a Spectrum, Not a Switch

Here's where it gets more interesting — and more forgiving.

Agentic behavior isn't binary. It's a spectrum. Multiple industry frameworks — from Vellum to Sema4.ai to Anthropic's own engineering docs — describe it in graduated levels, from fully deterministic pipelines at one end to fully autonomous systems at the other.

The levels that matter most in real-world production today sit somewhere in the middle:

  • L1 — The LLM is a passenger. It generates content at a fixed step, but the code drives.

  • L2 — The LLM can decide which tools to call, but the overall path remains predictable.

  • L3 — The LLM observes results, plans multi-step actions, and adapts mid-execution.

  • L4+ — True autonomy. Rare in production. Mostly still theoretical.

Most of what's being built and called "AI agents" today sits between L1 and L3. That's not a criticism — it's just where the technology and the trust currently land.

So Where Do the Five Prototypes Actually Land?

This is where it gets honest. In my previous post, I introduced five working prototypes — what I called the "boring AI" stack — each built to solve a real operational friction point. I called them all agents. Now let's see which ones actually earned that label.

Holiday Notifier and PTO Tracker are L1 — and proudly so. Both follow a fixed sequence: fetch data, compose email, send. The LLM writes the email, but it has no say in the flow. Both use LangGraph — a framework built for complex agent orchestration — without touching a single one of its agentic features. That's not a failure of imagination. A holiday notifier should always fetch holidays. Hardcoding that decision is the right call.

Download Folder Monitor sits at L2 — but just barely. The LLM exercises real judgment on each file: remove, archive, or keep. But the process itself never changes: scan, evaluate, report. An agent by framework. A near-deterministic workflow by behavior.

Timesheet Assistant is the most interesting case — a split personality. Its scan mode is a rigid, structured workflow. Its interpret mode is genuinely agentic: the LLM dynamically decides which tools to call based on natural language input. Same codebase, two different levels. The most pragmatic pattern in the set.

RAG Knowledge Agent is the clearest L3. Ask it a question and it might query the knowledge base directly — or notice the results are thin, ingest a new document, then re-query. It observes, plans, adapts. The steps per turn are genuinely open-ended — and that's not a design choice, it's a necessity.

The Counterintuitive Part

Notice something about that last point. The RAG agent isn't L3 because someone decided to build something impressive. It's L3 because the problem required that level of flexibility. And the Holiday Notifier isn't L1 because someone cut corners. It's L1 because the problem didn't need anything more.

This is the reframe that changes everything: the right level of autonomy is determined by the problem, not by ambition.

An over-engineered L3 system handling a task that never changes is just a liability — more moving parts, more surface area for failure, more chances for the LLM to reason its way into the wrong answer. A well-designed L1 pipeline that reliably does its job every Monday morning, without drama, is quietly one of the most valuable things you can build.

The spectrum isn't a leaderboard. It's a design tool.

The Label Was Never What Made It Worth Building

So is what you built a real AI agent? Maybe. Maybe not. Honestly, that's the wrong question.

The right question is simpler: does it solve the problem it was built for?

Because the spectrum of agentic behavior isn't a competition. L1 isn't losing and L3 isn't winning. Each level exists because each level is the right answer to a certain kind of problem. The mistake isn't building something "too simple." The mistake is building something mismatched — an over-engineered system chasing a label, solving a problem that didn't need that much intelligence to begin with.

Build to the problem. Not to the title.

More from this blog