Six primitives behind autonomous execution: what an AI agent actually needs to run a landscape business

A landscape company foreman asks his phone to draft a proposal for a mulch top-up at a property he's standing on. The phone thinks for two seconds and sends back something that looks reasonable. The customer name is right. The property address is right. The proposal even sounds like the company's voice.

Look closer. The mulch price is wrong by $0.12 per square foot. The proposal quotes hardwood when this customer specifically asked for pine fines last fall. And there's no line item for the irrigation flag the same crew lead photographed two weeks ago, which the office has been waiting on a proposal for.

That's not a useless AI. It's a dangerous one. It's just useful enough to send. And operators are going to ship it because the alternative is typing the proposal themselves.

This post is about why some "AI agents" are safe to act and others aren't. Same word. Very different machine underneath.

TL;DR

The word "agent" has become a marketing label. Everyone uses it. Few platforms can deliver what it actually means.
A real AI agent that can act on landscape operations needs six things stitched together: context, agents, skills, memory, evals, self-learning.
Most vendors ship one or two of those well, call the thing an agent, and let the buyer find out the hard way.
If you're being sold an "AI agent" by anyone, this is the unbundling that tells you whether it's going to draft good proposals or expensive ones.
HeyFlora's platform is organized around all six primitives. They aren't optional features. They are the architecture.

"AI agent" has become a word doing too much work

Read any landscape software website right now. Half of them have shipped "agents." Almost none of them have shipped the architecture underneath that makes an agent worth trusting with real money.

Here's the test. An "agent" that does any of the following is doing real work:

Drafts a renewal proposal with the right pricing and the right tone, given that customer's history
Books an irrigation service call when it sees a brown patch in a snap photo
Reorganizes tomorrow's routes when a heat advisory crosses three properties
Catches a margin slip on a job before it gets invoiced

An "agent" that answers your question and produces a paragraph back is a chat assistant. Useful, but not the same thing.

The difference is whether the system can act, and whether you can trust what it's doing. That trust is the architecture.

The six primitives

The six pieces below are how HeyFlora is built, and (in our view) how any platform claiming to ship landscape AI agents in 2026 has to be built. We'll walk each one. After the walk-through, there's a short test you can use to evaluate any vendor pitch.

1. Context: the situation the agent is acting inside

Context is what the agent knows before it does anything. Not the question you asked. Everything around it.

For a landscape business that includes:

Every property the company services, with all its history
The pricing book and labor rates
The crew leads and their certifications
The SOPs and standards
The integrations: Aspire, NetSuite, Fleetio, ADP, email, weather
The last twelve weeks of field signals: snaps, huddles, site walks

An agent without context is just a chatbot pulling from public web data. The output will sound plausible. It will not be right. The proposal will quote rates from somewhere on the internet, not from your pricing book. The schedule shift will not know which crew lead has a Class A license.

Most vendors connect to one or two systems. That's a partial agent at best. (This is the same connection problem we walked through in Post 1, one layer up.)

2. Agents: specialized workers, not one big model

An agent in the architectural sense isn't a single AI talking. It's a specialized AI trained around one department, one role, one workflow.

At HeyFlora we ship distinct agents per function:

Flora Executive: strategy, market intelligence, executive briefs
Flora Sales: proposals, leads, pipeline, upsell scouts
Flora Field: crew support, field intelligence, quality
Flora Ops: scheduling, routing, equipment, resource optimization
Flora Finance: budgets, P&L flash, margin alerts
Flora Admin: docs, automations, follow-ups
Plus Design, Water, Care, Marketing, Procurement, Safety

Why this matters: a general-purpose AI is bad at all of these because it has no boundary. Asked to draft a proposal, it doesn't know whether it should pull from the pricing book first or the customer history. Specialized agents have those defaults wired in.

The opposite of this is "one chatbot does everything." That's a fine starting point. It is not where the category lives in 2026.

3. Skills: the reusable actions an agent can take

Skills are the verbs. They're how agents do things, not just say things.

A skill is reusable. The same "draft a service proposal" skill should work for any property, any palette, any month. The skill knows how to:

Pull the customer history
Apply the right pricing book
Generate the bilingual version
Save the artifact in the right folder
Send for approval through the right workflow

Skills compound. Once you have "draft a proposal" working well, the same skill plugs into NightShift to auto-draft enhancement proposals overnight, into Vibescaping to draft the design narrative, and into the Huddle Agent to draft a follow-up after a crew huddle that flagged an opportunity. The work doesn't get re-built per surface.

Vendors who ship "agents" without a skills layer tend to ship one workflow at a time. Every new use case is a new product. That's a maintenance pattern that doesn't scale.

4. Memory: the part that makes the second answer better than the first

This is the primitive most vendors skip, and the one that matters most over time.

Memory means the system remembers what happened. Not just chat history. Property history, customer preferences, recurring issues, past decisions, what the operator corrected last time. The longer it runs, the more it knows.

In practical terms: if a crew lead tagged a property in September as "irrigation issue on the south bed," and a snap in November shows the same brown patch, the agent should know the two are related. The proposal it drafts in December should reference the September flag.

Without memory, every interaction is the first one. The agent will ask the same questions, make the same beginner mistakes, and never get smarter from the operator who corrected it last week.

Memory is also what makes evaluations possible (next primitive). You can't measure whether the agent is improving if you don't remember what it did before.

5. Evals: how you know it's working

Evaluation is the accountability layer. It's what separates "we shipped an agent" from "we shipped an agent that's measurably right 99.99% of the time."

Evals measure accuracy on real outputs:

Did the proposal use the right pricing book?
Did the route optimization respect the crew lead's certifications?
Did the renewal warning match the customer's actual history?
Did the heat advisory route apply only to turf properties?

Evals run automatically. Every output is checked against company standards before it's surfaced to the operator. The ones that fail get flagged for review, not sent.

This is the difference between "draft a proposal" and "draft a proposal you can confidently put your customer's name on." One has a measurement layer. The other is hoping.

If a vendor can't tell you how they evaluate their AI's outputs, they're either being polite about not knowing, or there are no evals. Either way, you're the eval. Don't be the eval.

6. Self-learning: the compounding edge

The last primitive is the one that makes the platform get better over time without anyone retraining a model.

Every snap, huddle, ticket, correction, and completed workflow goes back into the system as a signal. The corrections especially. When an operator changes the price in a draft proposal, the system learns. When an account manager marks a renewal warning as a false positive, the system learns. When a crew lead overrides a route optimization, the system learns.

This is what separates a platform that ships better next quarter from a platform that ships the same thing forever. The agents in a healthy intelligence layer should be measurably better six months from now than they were today, on the same property, with the same operator.

Self-learning isn't magic. It's the loop. Capture the output, capture the correction, feed it back into context, memory, and evals.

How to evaluate an "agent" you're being sold

When a landscape software vendor pitches you "AI agents" in the next twelve months, this is the test. Ask these questions in this order:

Which of the following can the agent see? Aspire, NetSuite, Fleetio, ADP, your email, your shared drive, your field snaps. If the answer is "we integrate with one of those," it's not an agent in the architectural sense. It's an integration with a chat interface.
Is it one general AI, or specialized agents per function? If you're being sold "one assistant for everything," you'll hit the boundary fast on real work.
What can it actually do, not just say? Ask for the skills list. Draft a proposal, queue a schedule shift, generate a service report, escalate a renewal. If the only verb is "answer questions," it's a chat product.
Does it remember what happened last time? Ask the vendor to show you a property whose history fed into an output last quarter. If they can't, the memory primitive is missing.
How is it evaluated? Ask for the eval framework. Accuracy on what, measured how often, with what gate before output reaches the operator. If there's no answer, you're the eval.
Does it learn from corrections? Ask whether yesterday's edit to a draft proposal teaches tomorrow's draft. If not, the platform is static and will be static next year.

A vendor who can answer all six well is shipping landscape AI in the way the category is going to demand by 2027. A vendor who can answer two might still be useful for one workflow, but you're paying for the future of the category without getting it.

Why we organized HeyFlora this way

When we started architecting HeyFlora, we made one decision that's shaped everything since: build the six primitives first, then ship the products on top of them.

The reason is operational, not theoretical. If we built one product on top of a half-finished primitive layer, the second product would inherit the gaps. By the time we shipped the fifth product, we'd be drowning in workarounds. Most landscape software stacks are exactly that, accidentally. They started with one workflow and grew outward without an architecture underneath.

The six primitives are the architecture underneath. They're laid out in detail on the execution-layer page. Every agent we ship, NightShift, the Huddle Agent, Flora Sales, Vibescaping, Flora Field, sits on the same context, the same memory, the same evals, and the same self-learning loop. That's why a correction made in the morning huddle improves a proposal drafted that afternoon. They're not separate products. They're surfaces over the same primitives.

What's coming

In the next post in this series I'll walk through what NightShift actually does between 9 p.m. and 6 a.m., what it drafts, queues, alerts, and books, and what an operator wakes up to. It's the most concrete example we have of what the six primitives produce in the wild.

For now, if you've been hearing "AI agents" in landscape pitches and not sure what the word actually means, this is the unbundling. Bring the six questions to your next vendor demo.

Scott Hutcheon is the founder and CEO of HeyFlora. He's spent twenty-five years in design-build, green infrastructure, and landscape technology. HeyFlora is building Artificial Grounds Intelligence, the AI operating system for commercial landscape.

Read Post 1: The connection problem, or get the next Field Note in your inbox: Subscribe to Grounds Intelligence Weekly.