Jim Rosser's Thoughts

Measuring the Agentic SDLC

Sat, 23 May 2026 00:00:00 GMT

Often, the conversation around AI in software delivery has narrowed into a single question: can we ship code faster? And because that's the question everyone is asking, it's the question most teams try to answer when they set up their metrics. Lines of code going out. Tickets closed. Deployment frequency.

None of those numbers tell you whether the rollout is actually working.

The reason is that they're measuring output, not outcomes. Output is easy to measure because it's visible. Outcomes require you to have defined what you were trying to achieve before you started, and many organizations skip that part entirely.

The Funnel Starts Above the IDE

The first mistake in most agentic SDLC rollouts is treating it as an engineering problem. Hand the dev team better tools, tell them to use AI in their workflow, and track the output. The engineers go faster, or they don't, and you've got your answer.

That framing misses the most important variable in the system, which is the quality of the work coming into the team in the first place.

I've written before about what I see as the real bottleneck in most engineering orgs. It isn't coding speed. It's that the work arriving at the team lacks basic context about what it's for and what success looks like. That was true before AI and it's still true with it. No amount of model capability translates a poorly defined problem into a good solution faster.

Here's a useful test. Take a story out of your current backlog and ask: could someone build this correctly using only what's in that ticket, the codebase, and your documented engineering standards? Not someone with years of institutional context baked in. Someone capable, but starting cold. If the answer is no, if completing the work requires reaching for knowledge that isn't written down anywhere, then you haven't defined the work well enough to hand it to an agent in any meaningful autonomous way. You've just handed the problem to a developer along with a more powerful typing tool.

The funnel starts at product definition. What do feature requests look like before they become stories? What do the stories contain? Are the acceptance criteria objective or interpretive? The returns from autonomous code generation scale directly with the quality of the input it's working from. Fix the input first.

Define the Measurements First

Next is to define what success looks like before you start measuring anything.

It may sound pretty obvious, but unfortunately it's not consistently practiced.

"We're shipping more code" is not a success definition. Neither is "the team is moving faster." Those describe activity. A KPI is specific: if this metric, or this group of metrics, looks like this, then we can assume the rollout is working. If it looks like that, it isn't. You need to know what you're looking for before the data starts coming in, because humans interpret data, and without defined context for what the metrics actually mean in your specific rollout, that interpretation will drift toward whatever story is most convenient. Define the context of the data being used to measure with, upfront.

There's a range of options, each reflecting a different level of rigor about what you're actually trying to accomplish.

At the loosest end: features get built and they technically work. That's a floor.

Probably not what you're aiming for, but worth being honest if that's what you're willing to accept as validation.

Tighter: features shipped with fewer than X percent of bugs reported by clients within the first 15 days.

Now you've tied velocity to quality, and the source of the report matters. A bug caught by internal QA before release is a different signal than one a client finds in production. Both are worth tracking, but conflating them obscures what your process is actually catching and what it's missing. A feature that ships fast and breaks in a client's hands hasn't helped you.

Which definition fits depends on what you're actually trying to accomplish.

Output is easy to celebrate. Outcomes require you to have known what you were looking for.

The Metrics Trap

Even when teams attempt to measure, the most common failure mode is picking metrics that are directionally related to performance but structurally incomplete.

Lines of code going up is a signal of activity, but it's one of the harder metrics to judge in an agentic SDLC context, and here's why. A skilled human engineer might use meta programming to ship an entire feature without adding significant lines of code, because the point of that approach was always to make the code more efficient and maintainable. An AI agent largely doesn't get used that way. It stamps out code because token count isn't a concern for it. So more lines doesn't mean better, and fewer lines doesn't mean worse.

That said, if the error rate is also climbing alongside output, the question isn't whether errors went up in absolute terms. Ship more code and you'll probably see more errors. What matters is whether the error rate as a percentage of total output is holding steady, improving, or getting worse. Five errors for every five thousand lines of code is the same rate as ten errors for every ten thousand. If that rate stays consistent as output scales, you're at least not losing ground.

But even that framing misses the most important variable, which is where the engineering time is actually going.

Consider this: you ship code faster, so deployment frequency goes up. But your error rate goes up too. Now consider that the error rate climbs high enough that your team is spending more time fixing problems than shipping new features. To understand whether that's actually happening, you need to look at cycle time per error, how long does it take from the moment an error is identified to the moment it's resolved and back in production. Then compare that total time investment across all errors against the time going toward net new features. That ratio is what tells you whether your team's increased output is going toward outcomes the business actually wanted, or toward cleaning up problems introduced after the rollout.

The ratio of error remediation work to net new feature work is one of the most telling signals in an agentic SDLC rollout. Some questions worth having answers to:

What percentage of total engineering time is going toward net new features?
What percentage is going toward error remediation, including testing, fixing, and re-releasing?
Is that ratio improving over time, holding steady, or getting worse?
Are you classifying errors consistently enough to track trends across time?
Is technical debt accumulating faster than it was before the rollout?

If the answers to any of those questions are "we're not tracking that," then it's genuinely challenging to state for certain that the rollout is working. You might have a story, but it may not actually match reality.

The compounding interest of technical debt is the one that gets overlooked most often. Errors that don't get addressed don't sit still. They interact with new code, create surface area for new failures, and consume progressively more bandwidth to manage over time. A rollout that looks like a velocity win in month two can turn into a maintenance drag by month six if nobody is paying attention to the overall technical debt accumulating underneath it.

DORA Is a Framework

DORA metrics (deployment frequency, lead time for changes, change failure rate, time to restore) are a genuinely useful starting point. They give you a structured way to think about delivery performance. The problem is that how you define what falls into each category, and what you choose to focus on, determines whether any of it tells you something real.

Deployment frequency going up is a positive signal in isolation. Deployment frequency going up while change failure rate also climbs is a more complicated story. Is a bug reported in the first 24 hours a failure? The first 15 days? Does severity factor in? The definition isn't in the DORA framework. You have to supply it. If you don't, you're measuring the word, not the thing.

The same applies across the rest of the framework. If you're getting faster at deploying code but the code is failing more often and taking longer to recover from, the aggregate picture is not one of improvement. You'll never see that picture if you're celebrating deployment frequency in isolation.

The organizations getting real value from these metrics treat them as a system. They've defined what each category means in their specific context, they track the relationships between metrics rather than individual numbers, and they ask whether the trends make sense together, not just whether any one number went up.

Asking the Right Question

If the conversation started with "can we ship code faster," it's the wrong question. If lines of code is the only thing you're measuring, you'll get an answer that tells you very little about whether a rollout of Agentic SDLC actually worked.

The right question is whether you set up the measurements to know. Did you define the work well enough for the funnel to function? Did you establish what the metrics meant in your context before the data started coming in? Did you track where the time was actually going, not just how much code went out the door?

If you did, the data will tell you what's working and what isn't. The discipline of setting context before interpreting signal is what separates a rollout that actually improved delivery from one that just generated a more compelling story about it.

Output is easy to celebrate. Outcomes require you to have known what you were looking for.

The Enterprise Digital Twin

Fri, 22 May 2026 00:00:00 GMT

The concept of a digital twin isn't new. Manufacturing and engineering have used them for years. A virtual replica of a physical system that updates in real time and lets you monitor, analyze, and simulate against it. What I've been noodling on for a couple of years now is what that concept looks like when you apply it to a business. Not a dashboard. Not a data warehouse. A living, continuously updated model of how your organization actually operates.

Where the idea started

The idea started taking shape for me when I was building Iris, a sales operations platform that integrated directly into Slack and wired together Salesforce, Gong, Jira, and Slack into a single conversational interface. The more systems I connected, the more obvious it became that the real value wasn't in any single integration. It was in the connections between them. A lead doesn't just appear in a CRM. It originated somewhere, touched specific content, was influenced by specific market conditions, and was handled by a specific seller running a specific process.

That's the enterprise digital twin. Your CRM, marketing automation, web analytics, ad spend, support systems, sales call transcripts, pricing engines, contract data, all feeding into a unified model where the relationships between them are mapped as they actually exist. You stop asking "how's the website doing" and start asking how your public brand presence is correlating with pipeline generation in a specific segment this quarter versus last.

An always-on analytical surface

Once the twin has enough signal, it becomes an analytical surface. This is the same idea from the sales call analysis work. Taking data that already exists and interrogating it with real specificity. What does the qualification process actually look like for sellers who close versus sellers who don't? What content sequences correlate with faster deal cycles? Which pricing structures produce better retention in which segments? These aren't questions you answer with a single query. They're questions the twin is always computing against.

Tracking not just what things mean, but when those meanings changed and why.

Consistency matters here too. Human analysis of these patterns is inherently variable. Different managers interpret different signals differently, institutional knowledge lives in people's heads, judgment shifts with mood and context. The twin doesn't eliminate subjectivity entirely. The prompts and frameworks still reflect choices. But it applies those choices consistently across every data point, every time.

Ontology with lineage

But the part of this concept I keep coming back to is the ontology layer. And specifically, ontology with lineage.

Everyone talks about ontology as a way to give context to data. And it is. The twin needs a structured representation of what things mean within your organization and how they relate to each other. What counts as an "enterprise deal" in your organization? What's the boundary between marketing-qualified and sales-qualified? How do your product lines relate to your market segments? These definitions aren't static. They shift as the business evolves.

But most people stop there. They think of ontology as a flat map of definitions. What makes it powerful is the time dimension. Lineage. Tracking not just what things mean, but when those meanings changed and why.

When your company changes its pricing structure, redefines deal stages, reorganizes territories, or shifts its ICP, those changes are inflection points in your data. Without lineage, a model that compares Q1 to Q3 might be comparing two fundamentally different definitions of success without knowing it. With lineage, those changes become data points themselves. You can see that close rates shifted in Q3, and the twin knows that Q3 is also when the pricing model changed, the sales team was restructured, and the definition of "qualified" was updated. The correlation surface gets dramatically richer.

This is what takes relatively flat data analysis and gives it real depth. Recording mandated changes, strategic pivots, and definitional shifts as first-class data turns organizational history from institutional memory into something queryable. The kind of context that usually lives in someone's head or gets lost when that person leaves.

Management and accountability

And it applies well beyond sales and marketing. Think about management. Employment records, org changes, who managed what team and when, what turnover looked like during those periods. Historically this stuff gets rolled up to the top based on feels. Based on what managers wanted to share. Based on what employees felt comfortable putting in their company or manager reviews. Now consider all of that as a data layer you can correlate against your larger organization's data.

Say your support team's ticket resolution rate starts dropping. You dig in and the decline started shortly after manager X took over that team. Then you start noticing attrition ticking up in the same group while the rest of the org is stable. These are patterns that exist in the data today but nobody is connecting them because the systems don't talk to each other and there's no time dimension tying it together.

Consider the level of accountability that provides. Not based on what someone chose to report upward. Based on what actually happened, tracked over time, correlated against everything else the organization knows about itself.

This isn't something that exists yet. Not the way I'm describing it. But the pieces are all there. The integrations, the analytical capability, the ability to model relationships between systems at scale. The interesting work is in the ontology and lineage layer, because that's what turns a collection of connected data sources into something that actually understands the business it's modeling.

Sticker Price Is Not Cost

Thu, 21 May 2026 00:00:00 GMT

Total cost of ownership is one of those concepts that sounds obvious but gets skipped frequently. Most people already understand it intuitively when it comes to something like a vehicle. A Toyota might cost more upfront and the parts might be pricier, but if the Ford breaks down twice as often, the cheaper purchase price doesn't mean much over five years of ownership. People think about maintenance, reliability, resale value.

When it comes to technical decisions, that same thinking often goes out the window. A team will compare two price tags, pick the lower one, and stop there, but the price tag only tells you what something costs to buy, not what it costs to own.

The Database Example

Here's a simple example I've used before to illustrate this. Say you need a MariaDB database on AWS. You need 100 GB of storage, 4 vCPUs, and 16 GiB of memory. You have two options.

Option A: run MariaDB yourself on two EC2 instances (a primary and a standby). Two m5.xlarge instances in us-east-1 run about $280 a month.

Option B: use RDS Multi-AZ, where Amazon manages the database for you. That same spec on RDS comes in around $520 a month.

On sticker price, EC2 wins by $240 a month. Obvious choice, right?

But with RDS, Amazon handles automated backups, replication, automatic failover, OS patching, and MariaDB updates. On EC2, your team handles all of that.

So what does that actually cost? Let's say the average salary plus benefits for an SRE is around $170k, which works out to roughly $82 an hour. If your SRE spends 2 hours a month on routine maintenance of the EC2 setup, plus another hour or two on unplanned work when things break, you're looking at 3-4 hours a month of engineering time. That's $246-$328 a month in labor, which brings your EC2 total cost to $526-$608.

The $240 a month in savings just disappeared, and you're still carrying the risk and on-call burden on top of it. When your self-managed database has a failover event at 2 AM, someone has to wake up and deal with it. RDS handles that automatically. Regular 2 AM pages contribute to burnout, and burnout contributes to turnover. Replacing an SRE costs a lot more than $82 an hour.

The Pattern

The database example is simple, but the pattern behind it shows up everywhere.

A team evaluates a SaaS tool that costs $2,000 a month versus building something in-house. The in-house option looks cheaper because the developers are already on salary. But the three months of development time, the ongoing maintenance, the on-call burden, and the opportunity cost of those developers not working on the product all factor into what it actually costs.

The question isn't "can we build this." The question is "should we own this."

A company picks a cheaper vendor because the per-unit cost is lower. But the integration work, the support quality, and the time the team spends working around limitations that the more expensive vendor had already solved are all part of the real cost.

Or someone hires a cheaper contractor because the hourly rate is lower. But the ramp-up time, the rework, and the hours a senior engineer spends reviewing and correcting the output are all part of what that contractor actually costs.

The pattern is the same in every case. The sticker price got compared. The total cost of ownership did not.

Build vs Buy

This is especially common in build-vs-buy decisions. Engineers love to build. There's a gravitational pull toward building your own solution because it feels like the cheaper, more flexible option, and because building things is more interesting than evaluating vendors.

But building something means owning it forever. It means maintaining it when the person who built it leaves. It means debugging it at 3 AM when it breaks in a way nobody anticipated. It means every hour spent maintaining your custom solution is an hour not spent on the thing your company actually sells.

Sometimes building is the right call. When the thing you need is core to your product, when nothing on the market fits, when the cost of a vendor dependency is genuinely higher than the cost of ownership. It's worth being honest about whether that's actually the case though.

The question isn't "can we build this." The answer to that is almost always yes. The question is "should we own this." The first is a sticker price question. The second is the total cost of ownership question.

Signals in the Data

Wed, 20 May 2026 00:00:00 GMT

A colleague of mine, Horacio Fernandez, and I were sharing a familiar frustration one day. Sellers were bringing us into client calls that hadn't been qualified. We were walking into conversations cold, and it was happening often enough that it stopped feeling like an accident.

That frustration turned into curiosity. We decided to stop complaining about it and actually look at the data.

Defining "qualified"

The first thing we had to do was decide what "qualified" meant in concrete terms. We landed on BANT: Budget, Authority, Need, Timeline. Not because it's sophisticated. Because it's the floor. BANT is the bare minimum you'd expect from a seller who's doing their job. It's also well-defined enough that you can identify whether those things are being gathered during a conversation. You don't need a subjective judgment call. Either the seller asked about budget or they didn't. Either they identified the decision maker or they didn't.

That clarity mattered, because we weren't interested in grading calls on vibes. We wanted to know, with data, whether our theory was correct: that sellers were skipping qualification entirely.

The Tooling

All sales calls were already being recorded and transcribed. The data was sitting there. Nobody was using it for this purpose, but it existed. A wealth of recorded conversations, timestamped, transcribed, just waiting for someone to ask a question of it.

So we went to work. We built some basic tooling. Nothing fancy. The workflow was straightforward: batch the calls that were identified as new customer conversations, run each transcript through an LLM with a prompt that defined what to look for and how to identify it, and collect the results. BANT criteria mapped against what actually happened on each call.

It worked really well. We were able to analyze months of call data in a very short period of time and determine whether the pattern we'd been seeing anecdotally held up when you looked at it at scale.

When you define the problem well, the technology part gets simple fast.

What If It Ran Live

Once you've proven that this kind of analysis works in batch, the obvious next question is: what if it ran live? What if every sales call got graded automatically, and that data flowed back to sales leadership in something close to real time? Not as a gotcha. As a signal. A way to know, with actual evidence, whether qualification is happening or not.

That's the part that got us thinking bigger. Sales is one of the hardest functions to hold accountable with real data, because so much of the data depends on the seller's participation and honesty. CRM entries, pipeline updates, deal stages, all of it requires the seller to accurately report what's happening. And the role itself demands autonomy. Sellers need room to operate. They need to run their own process, build their own relationships, close on their own terms. You can't micromanage someone into closing a deal.

The Boring Work First

What Horacio and I stumbled into wasn't really a sales project. It was a proof point for something larger. Your organization is generating data all the time, in conversations, in workflows, in the normal course of doing business. Most of that data is sitting unused, or it's being used in the most surface-level way possible. AI gives you the ability to actually interrogate that data. To ask questions of it that would have taken a team of analysts weeks to answer manually, and get answers in hours.

When you define the problem well, the technology part gets simple fast.

The reason it came together so quickly is that we'd done the boring work first. We defined what we were looking for before we built anything. BANT gave us a concrete framework to measure against, not a vague direction like "assess call quality." That specificity is what made the tooling trivial to build. The LLM didn't need to make judgment calls. It needed to check whether specific things happened on a call. When you define the problem well, the technology part gets simple fast.

Other Places This Lives

The sales call example is concrete and specific, but the pattern is bigger than sales and bigger than accountability. It's about finding novel ways to use the data your organization is already generating to solve problems that were previously too expensive or too manual to touch.

Most people are still thinking about AI as a way to do existing work faster. The more interesting use is pointing it at problems you couldn't practically solve before.

Think about customer success. Every support call, every ticket, every chat interaction is generating signal about where your product is confusing, where your documentation is failing, where customers are getting stuck in the same spot over and over. Or think about engineering. Every PR, every code review, every incident retro has signal in it about where the team is strong and where it's struggling. You don't need a consultant to run a six-month assessment. The work product is already telling you if you know what to ask.

In a client engagement, my team used a similar approach to grade the quality of stories in an engineering backlog. The results were fascinating and uncomfortable. The overall quality of the work definition was so poor that AI tooling wouldn't have actually accelerated anything, because the entire system depended on a handful of people who carried enough institutional context in their heads to translate vague requirements into real work. The tooling didn't fix that problem, but it surfaced it in a way that months of standups and retros hadn't.

No matter what your industry, there's probably a version of this sitting in your organization. A problem everyone can feel but nobody has the data to prove. A process generating information that nobody's asking questions of. Most people are still thinking about AI as a way to do existing work faster. The more interesting use is pointing it at problems you couldn't practically solve before.

What Company Culture Actually Is

Tue, 19 May 2026 00:00:00 GMT

There's a version of "company culture" that shows up in job postings and employer branding decks, and then there's the real version. The one that determines whether people spend their energy on the work or on navigating each other.

A lot of companies treat culture like it's the fun stuff. Ping pong tables, happy hours, team-building retreats. Those things are fine. They can make a workplace more enjoyable. But without the right foundation underneath them, they're decorations on a building with no structure. You can have the best snack bar in the industry and still be a place where nothing gets done without kissing the right person's ring first.

What It Looks Like When It Works

I've been in environments where culture actually worked. Not because anyone was particularly focused on culture as a concept, but because the basics were in place. The mission was clear. People understood how their work connected to it. There was enough transparency that you could make decisions without having to guess what leadership actually wanted.

I've written before about what happened at Texas A&M, where a handful of technologists across departments pushed for changes that reshaped how the university ran its infrastructure. None of it was anyone's job. It worked because the culture let it work.

That's what healthy culture looks like in practice. Not a set of values on a wall. Just an environment where people can focus on the actual work instead of spending half their energy figuring out who to avoid, who to flatter, and whose feelings to protect before they can get something done.

What It Looks Like When It Doesn't

A broken culture is easy to recognize once you've been in it. It's an environment where the org chart says one thing but the actual power structure is something completely different. Where getting a decision made has less to do with the merit of the idea and more to do with whose name is attached to it.

The examples are everywhere. Sellers throwing fits to get projects prioritized because their commission depends on it. People spending weeks building political cover for decisions that should take an afternoon. Teams learning not to bring problems to leadership unless they've pre-sold the solution to everyone who might feel threatened by it. The problems that get surfaced are the ones that are safe to talk about, not necessarily the ones that matter most.

All of that is energy. Real, finite, human energy that's being spent on politics instead of outcomes. And the people burning that energy know exactly what they're doing. They're not confused about priorities. They've correctly identified that in their environment, managing relationships is more important than doing good work.

The Cost

The real cost of a political culture is everything that doesn't happen. The engineer who saw a production problem forming but didn't raise it because the last person who raised one got labeled as difficult. The project that everyone knows is failing but nobody will say out loud because the VP who sponsored it is untouchable. The senior developer who leaves not because they got a better offer, but because they got tired of spending more time managing personalities than writing code.

The underlying problem is that keeping the existing structure intact becomes more important than whether the structure is producing the right results. And the people at the top either don't realize it's happening because the uncomfortable information stopped reaching them a long time ago, or they do realize it and just don't want to hear it.

In my experience, a lot of it traces back to leadership egos. The mission gets clouded because someone at the top is more interested in being right than in getting the right outcome. The transparency disappears because admitting something isn't working feels like a personal failure. The hierarchy calcifies because it protects the people who built it. Sometimes it's about control. Sometimes it's about making sure nobody gets close enough to see just how bad things actually are.

What It Actually Takes

The foundation of a good culture isn't complicated. It's a clearly defined mission that people can understand and believe in. It's leaders who are actually there to lead, not to protect their own position. It's making sure every team and role can draw a line between what they do and why it matters. It's transparency about what's going on, what's working, and what isn't. And it's giving people room to solve problems through first principles instead of through hierarchy.

When the foundation is missing, no amount of happy hours will fill the gap. You'll have a nice time at the bar and then go back to the same broken environment on Monday, spending the first half of the week figuring out the politics and the second half trying to get actual work done in whatever time is left.

Culture isn't really about organizational design. It's about whether people can spend their time on the customer, the product, and the problems in front of them, or whether they're spending it figuring out whose ego to manage before they can get anything done. We all want to feel like we're contributing to something that matters. The job of leadership is to make that possible, not to build a system where the only way to survive is to play the game.

Direct, Own, Tune

Mon, 18 May 2026 00:00:00 GMT

I've spent many hours inside an agentic harness. Over time I started noticing that the sessions that went well had three things in common, and the sessions that went badly were always missing at least one of them. I'd either set clear intent before the session started, or I hadn't. I'd either reviewed the output at full bar, or I hadn't. I'd either fed what I learned back into the harness, or I hadn't.

The ownership bar doesn't change because a model did the typing.

Once those patterns were clear, the framework to address them was obvious. Direct, Own, Tune.

The framework

Direct means setting clear intent before the agent does anything. What are you trying to produce. What are the constraints. What patterns should it follow. What should it absolutely not touch. If you don't resolve ambiguity before the task starts, the agent "resolves" it for you.

Own means reviewing the output as if you wrote it yourself. Because you did. Your name is on it. Whether the agent produced code, copy, a user story, or an incident analysis, the ownership bar doesn't change because a model did the typing.

Tune means taking what you learned and feeding it back into the system. The brief that worked becomes a template. The boundary the agent drifted past becomes an explicit constraint. The review checklist that caught the problem gets encoded so it runs the same way next time. Tuning is what turns a one-off win into a repeatable one.

Direction

Direction is where you set yourself up to understand your own vision. Before the agent does anything, you should be able to articulate what you're trying to produce, what the constraints are, and where the boundaries are.

A PM turning a vague executive ask into a sprint-ready user story needs to know the audience, the acceptance criteria, and the scope before the session starts. "Write a user story for guest checkout. Audience is a returning shopper who abandoned cart at account creation. Acceptance criteria: complete purchase without an account, payment errors show inline, cart persists across sessions." The output might still need adjustment, but the adjustment is refinement, not rework.

Ownership

The relief of handing over a hard problem to an agent is genuinely addictive. You get back something that looks complete, you skim it, you move on. The work is still yours to own.

Own is the structural answer to that instinct. Review the output at the same bar you'd apply to your own work.

For marketing copy, that means reading every word as if you wrote it. Does the headline actually communicate value, or is it just technically accurate. Is the tone consistent across channels, or did the agent shift register between the long-form and the short-form deliverables. Did it pull language from the product spec that would confuse a non-technical reader. Does it sound like something your team would produce, or does it sound like something an agent produced.

Tuning

Most of the time, once the work is done, people move on. The task shipped, the session's over, on to the next thing. Tuning is the step that breaks that pattern, and it's the one that pays the biggest dividends over time.

Without tuning, the same context gets re-explained every session. The architectural boundaries, the patterns to follow, the test commands. The agent drifts past the same boundary that was corrected the session before, because the correction lives in someone's head. The tooling produces value, but the value doesn't compound. Every session starts from roughly the same place.

Tuning is the step where you take what worked and encode it. The brief format that produced a clean user story on the first pass becomes a reusable skill. The architectural boundary the agent kept drifting past gets written into the context so the next session starts with that constraint already loaded. The review checklist that caught a problem becomes the standard checklist.

I built Runbook out of this same instinct. The agent kept inventing its own commands instead of using the ones that actually worked. So I stopped asking it to choose and gave it a deterministic tool that only does the right thing. That's tuning. You take a failure mode, turn it into a constraint, and remove the opportunity for it to happen again.

Across roles

The mental model works the same way regardless of the function or the level of technical proficiency. For example:

Customer success

Direct: A renewal is coming up and the account has had a rough quarter. Feed the agent the account history, the support ticket log, the usage data, and the CSM's notes from the last QBR. Define what you need back: a renewal strategy that accounts for the friction points and a talk track for the call.

Own: Read the strategy the agent produced. Does it reflect what actually happened on the account, or did the agent gloss over the rough spots. Is the talk track realistic for this customer's temperament. Would you walk into that call with this plan and feel prepared.

Tune: The account brief format that produced a strong renewal strategy becomes the standard template. The review questions that surfaced gaps get encoded so every CSM runs the same quality check before a renewal call.

</aside>

I've written separately about how the rest of the business is hitting the same scaling wall that IT hit decades ago, and how the answer then was the same as the answer now: stop throwing bodies at the problem and start building systems that absorb the work. DOT is a mental model the rest of the business world can use to start pushing that mentality forward.

Getting real value out of AI takes discipline and intention. Clear direction produces better output. Thorough ownership keeps the bar high. Consistent tuning makes the harness sharper every cycle. That's DOT.

Throwing Bodies at Scale

Sun, 17 May 2026 00:00:00 GMT

For most of its existence, IT was a cost center. Not in the "we're being strategic about overhead" sense. In the "you don't make us money, so justify every dollar" sense. Every headcount request was a fight. Every budget cycle was a negotiation where you explained, again, why the systems that keep the entire company running need more than a skeleton crew to operate.

Meanwhile the infrastructure kept growing. More servers, more applications, more things that could break at 2 AM. And the answer was never going to be more people, because there was no money for more people. Imagine if IT had the same luxury the business side had, where the answer to "we're overwhelmed" was just "let's hire another person." A person per server. A person per application. It would have been absurd, and everyone knew it was absurd, which is exactly why IT never got to operate that way.

So IT did what you do when you can't throw bodies at a scaling problem. They automated.

Plenty of practitioners were already hand-rolling their own bash scripts and cobbling together whatever they could to manage their systems. By the early 90s, formal tooling to automate fleet management started to show up on the scene, beginning with CFEngine by Mark Burgess. Luke Kanies started building Puppet around 2005 and Adam Jacob launched Chef in 2009. Instead of a person logging into each machine and configuring it by hand, you write the configuration once and let software apply it everywhere. Instead of a runbook that says "SSH into the server and edit this file," you have code that does it. The machine does what the code says, every time, at whatever scale you need.

That shift didn't come from a strategy deck. It came from practitioners who recognized that automation was simply the best way to manage infrastructure at scale. I was one of them. I picked up Chef because I like to automate. I wrote cookbooks, built tools around it, and eventually joined Rackspace's DevOps Automation group where the whole job was running client infrastructure with code instead of hands. The community that grew up around those tools was full of people who had independently arrived at the same conclusion: the manual way doesn't scale, and no one is going to give you the headcount to make it scale. So you write code instead.

The Rest of the Business Is Where IT Was

I think about the early infrastructure scaling era constantly right now, because I see the same pattern everywhere outside of IT.

Once you know the inputs and outputs, you can reason about the work without needing to audit every individual's Tuesday.

I was talking with a colleague recently about the challenge of figuring out where automation applies in a mid-size organization. A typical 5,000-person company has tens of thousands of workflows and processes. How do you even begin to reason about that?

Here's what I've come to think. Workflows are largely personal. Every individual has their own way of getting through their day, their own shortcuts, their own version of the process. Trying to map every person's workflow is a fool's errand. But that's not actually what you need to do.

Teams and business units are systems. They work the same way any other system works. They have inputs and they have outputs. A content management team in marketing takes in briefs, brand guidelines, and campaign goals. It puts out finished content, published on schedule, in the right channels. An accounts receivable team takes in invoices and puts out collected payments. The specifics of how each person inside that group does their piece will vary, and honestly, a lot of that variance doesn't matter. What matters is understanding what goes in, what comes out, and where the friction lives between those two points.

When you frame it that way, the problem gets a lot more tractable. You're not trying to document ten thousand personal workflows. You're trying to understand the function of each group, what they need to do their jobs, and what they're expected to produce. Once you have that, you can start asking where the bottleneck is, where a human is doing something a machine should be doing, where information is getting stuck because two systems don't talk to each other.

The Luxury IT Never Had

IT automated because they had no choice. The rest of the business hired because they could.

Historically, the pattern in how business departments grow is one that everyone recognizes once you name it. Someone is overloaded. They don't have time for everything on their plate. So they make the case to hire. The new person absorbs some of the workload. Things stabilize for a while. Then the work grows again, or the original person takes on something new, and now the new person is overloaded too. Repeat.

This is the luxury that IT was never afforded. When a sysadmin was drowning in work, the answer wasn't "let's get you some help." The answer was "figure it out." The budget wasn't there. IT was overhead. So IT figured it out, and the way they figured it out was by making the machines do the work that people had been doing by hand.

Business departments never faced that constraint. When marketing needed more content, they hired more people. When operations couldn't keep up with order volume, they hired more people. When finance was buried in reconciliations, they hired more people. Adding headcount to a revenue-generating or revenue-supporting team was "investment." Adding headcount to IT was "cost." That asymmetry meant the business side was never forced to question whether the work should be done by a person at all. The question was always "who can we hire to help with this," never "why does this require a human in the first place."

And the incentive structures reinforced it. Managing a bigger team is a career milestone. Managing a more efficient team that does the same work with fewer people is, at best, a footnote. Nobody got promoted for eliminating the need for three roles. They got promoted for managing twelve.

The rest of the business hasn't hit the wall IT hit, or at least hasn't recognized it as the same wall. But the economics are pointing the same direction. The era of solving every scaling problem with a new hire is ending, not because companies suddenly got smarter, but because the math is getting harder to justify. And the ability to automate repetitive work is better than it's ever been.

The Mental Shift

The shift in IT wasn't really a technology change. The tools mattered, but what actually changed was how people thought about the work. Before, the mental model was: here's a server, a person manages it. After, the mental model became: here's a fleet, code manages it, and a person manages the code. The unit of work moved up a layer of abstraction.

That's the shift the broader business world needs to make. Not "replace people with AI." That framing is as wrong now as "replace sysadmins with scripts" was wrong then. We didn't get rid of the systems administrators. We changed what they spent their time on. Instead of configuring servers by hand, they wrote the code that configured servers. Instead of firefighting the same recurring issue every week, they fixed the root cause and automated the check. The work got more interesting because the tedious parts got absorbed by the tooling.

The same thing is available to every team in a modern organization. The marketing coordinator who spends four hours a week compiling a report from three different analytics tools should not be doing that. The operations team that manually reconciles data between two systems that don't share an API should not be doing that. The project manager who copies status updates from email threads into a slide deck every Friday should not be doing that.

These aren't hypotheticals. This is what many people's actual days look like. And just like the sysadmin who was hand-editing config files on 200 servers, most of them have stopped questioning it because it's just how the job works.

It Won't Come From the Top

The other thing I took from the early infrastructure scaling era is that the shift didn't start at the executive level. It started with practitioners. Individual contributors who were frustrated enough with the status quo to go find a better way. I've seen this play out in my own career more than once. At Texas A&M, a small group of technologists across departments pushed for infrastructure changes that ended up reshaping how the whole university operated. None of it was anyone's job description. It happened because the people doing the work could see what needed to change, and they had enough autonomy to act on it.

The parallel isn't perfect. Business processes are messier than server configurations, the tooling is less mature, and the organizational politics are thicker. But the underlying dynamic is the same. Too much repetitive work, not enough people to do it, and a set of tools that can absorb the repetitive parts if someone points them in the right direction.

The question for every organization is the same one IT has been facing since the use of computers grew beyond what any team could manage by hand. The work scales faster than the headcount. It always has. The difference is that now the rest of the business is running into the same wall, and the tools to do something about it are already here.

Before You Add AI, Try Explaining the Job

Sat, 16 May 2026 00:00:00 GMT

If you're thinking about where AI fits into your work, there are two things worth getting clear on before you start. The first is whether you understand the task well enough to hand it to something that has zero context about your business. The second is whether you understand enough about how the tool works to set it up for a useful result.

The Random Person Test

I use a thought exercise with clients when they're considering integrating AI into their business processes. Imagine you've pulled a random person off the street. Someone with no knowledge of your industry, your business, or the task you want performed. Now think about the job you want AI to do.

Could you explain the task clearly to this person? Could you define what a successful outcome looks like? Could you outline the boundaries and constraints? Could you teach them how to interpret the data they'd need to work with? And beyond just explaining it, is any of that written down somewhere you could point them to? Is it in one place or scattered across a dozen documents, wikis, and people's heads? Is it written down at all?

The burden of clarity is entirely on you.

If a random person couldn't sit down and figure out how to do this job with what you've given them, a model can't either. It has no context about your business. It doesn't know your customers, your internal processes, your edge cases, or your definition of good. Everything it needs to do useful work has to come from you, and it has to come in a form that's explicit enough for something with zero institutional knowledge to act on.

This is the same problem I've written about before with scoping projects. A vague SOW kills a project before it starts because nobody defined what success looks like. AI makes this worse, not better, because the model will always produce something. It won't tell you that your instructions were unclear. It'll just give you a confident-looking output that may or may not have anything to do with what you actually needed. The burden of clarity is entirely on you.

The random person test usually reveals more about your own processes than it does about AI. If you can't articulate the task, the boundaries, and the expected outcome clearly enough for a stranger to attempt it, that's not an AI problem. That's a process problem. Fix that first.

The Intern Experiment

The second question is about the tool itself, and most people integrating AI into their work don't have a clear picture of what's actually happening under the hood. Not at a research level. Just at the level of understanding what you're working with and where the limits are.

Here's another thought experiment. You have a massive book of knowledge that needs to be understood. To accomplish this, you can hire interns who will study the content and be available afterwards for questions.

Option A: hire one intern who reads the entire book. They'll have complete context, but given the volume of information, their understanding of specific details might be more general.

Option B: hire multiple interns and divide the book into sections. Each intern deeply studies their assigned portion. You capture more specific detail, but you lose the interconnections between sections.

All the interns have identical intelligence and capabilities. There's no time pressure. The goal is complete comprehension of the content. Which approach leads to better understanding?

There's no clean answer, and that's the point. Each intern is a context window. This is the tradeoff at the center of how large language models process information. Every model has a limit on how much information it can process at once, and how you work within that limit involves real tradeoffs. One large context gives you breadth but loses detail. Multiple smaller contexts give you depth but lose the connections between pieces. How you chunk the information, what you include and what you leave out, how you stitch the results back together, these are architecture decisions that directly affect the quality of what comes back.

AI is not magic. It's a tool that rewards clarity and punishes the absence of it.

Most people don't think about any of this. They just paste something into a chat window and expect the model to figure it out. When the output is shallow or misses something important, they assume the technology isn't ready. Sometimes it isn't. But often the real issue is that they handed the tool a problem without thinking about how the tool actually processes information.

Do The Boring Work First

AI is not magic. It's a tool that rewards clarity and punishes the absence of it. If you don't understand your own problem well enough to explain it to a stranger, the model isn't going to understand it either. If you don't understand the constraints of the tool, you're going to hit walls that feel like failures of the technology when they're really failures of approach.

Before you add AI to anything, try explaining the job to a stranger. If that goes well, think about how a tool with a fixed window of attention would process the information you're planning to give it. If you can get clear answers to both of those, you're in a much better position than most.

Shut Your Mouth When You're Talking to Me

Fri, 15 May 2026 00:00:00 GMT

I was once on the receiving end of this phrase and at the time it wasn't pleasant. It's from Wedding Crashers, but the person who said it to me wasn't joking. It took a while to land, but it ended up being some of the best advice I've carried into sales and presales work.

If we are talking, the client isn't. And if the client isn't talking, we are almost certainly missing something important.

Why You're in the Room

A lot of effort goes into getting a client on a call. By the time you're in the room, someone has done the outreach, the follow-ups, the scheduling. That meeting is expensive before anyone says a word. And the instinct for most people in that moment is to pitch. To fill the space with what you do, what you offer, how you're different.

The job in that room is to listen.

The problem is that none of that matters yet. You don't know what they need. You don't know how they work. You don't know what they've tried before, what failed, why they're even on this call in the first place. And you're not going to learn any of that while you're talking.

The job in that room is to listen. Ask questions and then actually hear the answers. Not just the surface answers, but the way they describe their problems, the language they use, the things they emphasize and the things they skip over. All of that tells you something about what they actually need versus what they think they need.

The Deal

After Caylent had relaunched as an AWS partner, I was working a deal with a major company. Big name, big opportunity. The kind of client that could change the trajectory of the business.

I spent most of my time on those early calls just listening. They walked me through how their organization worked, how their teams were structured, and what they were looking for. They were using the Spotify agile model, squads and tribes, and it was deeply embedded in how they operated.

When I put the proposal together, I built the entire delivery approach around their way of working. Our team would embed into their structure, not the other way around. The process, the communication cadence, the team composition, all of it was shaped around what I'd heard them describe on those calls. The proposal wasn't about what Caylent could do. It was about what Caylent could do for them, specifically, given how they actually operated.

We won it. Not because we had the best technology or the lowest price. Because the proposal showed that we'd been listening. I have many stories like this one. The details change but the principle doesn't.

The Other Side

Listening in sales isn't passive. It's not sitting quietly while you wait for your turn to talk. It's active work. It's tracking what someone is telling you, connecting it to what you know you can deliver, and identifying the gaps between what they're asking for and what they actually need.

And when it is your turn to talk, because you've been listening, your questions are more targeted and more precise. The client feels that. They feel like their time is being used well, not wasted on generic questions you could have answered yourself if you'd been paying attention. That builds trust early.

I used to mentor inside sales reps, and at one point I brought my dad in as a would-be customer for them to pitch. He'd spent 40 years in senior IT leadership roles and could realistically have been a client. I wanted the reps to get firsthand feedback from someone who'd been on the receiving end of sales calls for decades.

The things he focused on as positive feedback had nothing to do with the pitches themselves. He was impressed that they introduced themselves and the company, then asked him to talk about himself, his concerns, and his needs. After 40 years of sales people talking at him, he was genuinely delighted that someone wanted to hear what he had to say.

The more you understand about how a client works, what they've struggled with, what their real constraints are, the better positioned you are to align what you offer with what they need. That alignment is what wins deals. Not the pitch, not the deck, not the feature list. The proof that you heard them and built something around what they told you.

Cite Your Sources

Thu, 14 May 2026 00:00:00 GMT

"I'll defend you to the death, but I will always need the truth." That was one of my mom's rules when we were growing up. It wasn't negotiable. If you got in trouble at school, she wasn't going to hang you out to dry. But she needed to know what actually happened before she'd go to bat for you. The deal was simple: give her the real story, and she'd fight for you. Lie or hide something, and you were on your own.

That rule has followed me through my entire career, first as someone on the receiving end of leadership, and later as someone doing it. And the thing I keep coming back to is how rarely leaders hold themselves to the other side of it. They want the truth from the people under them, but when it comes time to deliver hard feedback, they don't do the work of making sure the feedback itself is true.

The Conversation

I once had an executive leader tell me, to my face, that no one wanted to work with me. Not "there's been some friction on a project" or "here's a specific situation we need to talk through." No one wanted to work with me.

That's a gut punch of a thing to hear. And my first reaction was to take it seriously, because I take feedback seriously. So I started digging. I went to colleagues I was close to inside the company, people I worked with regularly, and told them what I'd been told. The reaction was uniform. They thought it was absurd. These were people who actively sought me out to work on things. They had no idea where this was coming from.

So I kept pulling the thread. And what I eventually found was that the whole thing traced back to a single person. One of this executive's direct reports, someone they put a lot of stock in, sharing opinions about me. That was it. One person's perspective, unverified, unchallenged, laundered through an executive's authority and delivered to me as though the entire organization had weighed in.

The feedback doesn't make them better. It makes them smaller.

The executive later admitted, to their credit, that the whole thing had been handled badly. But by the time that acknowledgment came, the damage was already done. I'd spent weeks questioning myself, questioning relationships that turned out to be completely fine, and carrying a weight that wasn't mine to carry.

It Wasn't Just Me

A colleague of mine at the same company went through his own version of the same thing with his direct manager. Different specifics, but the same underlying failure. Instead of validating what they were hearing, instead of tracking down what actually happened and whether the expectations had even been clearly set, his manager just passed it along. Here's the sentiment. Here's how bad you apparently are. Good luck with that.

Someone close to me experienced the same thing at a completely different company. Their direct manager told them that things were going incredibly badly on their project, that there were serious concerns. Made it sound dire. When they dug into it, the reality was far less severe than what had been presented. The manager had taken unverified feedback, inflated it, and dropped it on them like a bomb. No validation, no context, no specifics they could act on. Just a crisis that wasn't actually a crisis, delivered in a way that made them feel like the sky was falling.

Three people, three different situations, the same pattern. This isn't a single bad manager or a single broken culture. This is a failure mode that's everywhere. Leaders hear something negative, skip the part where they verify it, and pass it along, sometimes making it sound worse than it actually is, because relaying a complaint feels like doing something about it. It isn't.

What It Does to People

When you tell someone there's a problem but you can't tell them what specifically happened, what was expected, or what they should do differently, you haven't given them feedback. You've given them anxiety. They can't fix it because there's nothing concrete to fix. So instead of correcting a behavior and moving forward, they sit in it. They replay every interaction they've had recently, trying to figure out which one was the problem. They start second-guessing themselves in rooms where they were fine. The feedback doesn't make them better. It makes them smaller.

And it compounds. The next time they speak up in a meeting, there's a voice in the back of their head. The next time they push back on something, they wonder if this is the thing that's going to generate the next round of whispers. The vagueness is what makes it so corrosive. A specific piece of feedback, even a hard one, gives you something to work with. "You were too aggressive in that client call on Tuesday and here's why" is something I can reflect on, learn from, and change. "No one wants to work with you" is a hole with no bottom.

What I Do Differently

I've been on the leadership side of this enough times now to know what the job actually requires. When I hear feedback about someone on my team, my first move is to request cited examples. I want to know what the issue was, what was expected, what actually happened. I want to know that the expectation was explicitly defined and made available, no mind reading required, and that the person was given an opportunity to correct before anyone escalated it.

Because if I show up and tell someone they did something wrong, but I can't tell them what it was or how to fix it, I haven't led. I've just passed along noise.

I've also been on the other side of it in a different way. Defending the people doing the work when the complaints were coming from above. I once had a client start pushing back on a delivery team for a project I had sold and written the SOW for. The client kept using vague language about expectations, trying to get things added to the scope, implying that the team wasn't delivering what had been promised. One of their asks was essentially a complete rework of their data schemas, something that was nowhere near the agreed scope.

I went point by point. I argued on the specifics. I stood by the team because I knew the work, I knew the SOW, and the team was right. That's the job. If you're a leader and you believe your people are doing the right work, you defend them. But you can only do that if you've done the homework. If you don't know the specifics, you can't defend anyone or hold anyone accountable.

The Actual Rule

The principle is the same on both sides. Whether you're delivering feedback or receiving a complaint about someone, you owe it to the people involved to deal in specifics. Was the expectation clearly set? Was it documented? Did the person know what was expected before they fell short of it? If you can't answer those questions, you don't have feedback.

If you're a leader and someone comes to you with a complaint about a member of your team, don't just relay it. Go dig. Find out what actually happened. If the complaint holds up, you now have something you can actually work with. If it doesn't, you've just protected someone on your team from getting blindsided by something that wasn't real.

And if you're the one giving feedback about someone, do the work. Don't just say you're frustrated with them. Say what happened, what you expected, and what they actually did. Ask yourself whether the expectation was clearly communicated in the first place. If it wasn't, that's not their failure. That's yours.

My mom had it right. Defend the people you're responsible for. But make sure you're working from the truth first. The whole system falls apart without it.

The Rosser Controversy

Wed, 13 May 2026 00:00:00 GMT

When I joined Rackspace in 2013, I joined a group called DevOps Automation. It was a premium managed-service tier where Rackers ran client environments using Chef. We sat in a Slack shared with the client, we managed their infrastructure, and the whole pitch was that we'd make their day-to-day easier, including deployments.

By the time I got there I was pretty deep in the Chef community. I'd written some popular OSS cookbooks and a handful of libraries and tools around them. So I came in with strong opinions about how Chef was supposed to be used, and I came in genuinely excited about the role.

The Stacks

Here was the setup. There was a product team whose job was to build baseline cookbooks for popular tech stacks at the time. They called these stacks. A Ruby on Rails stack, for example, or a Magento stack. The idea was that the support team, which is the one I'd joined, would take a stack the last mile to fit a specific client's application. The product team built the prescription, support delivered it.

It didn't work. The prescriptions were too narrow. A stack would commit you to Apache when the client was running Nginx, or commit you to a specific deployment pattern that the client's app didn't fit. And because the stacks were built as cookbooks rather than as something more composable like custom resources or providers, working around them was almost always worse than starting from scratch. You'd end up with a client cookbook that pulled in a few recipes from the stack, then a wall of inline attribute overrides to undo the assumptions you didn't agree with, then your own logic stapled on top. The result was a mess that was hard to read, hard to maintain, and hard to hand to anyone else.

The deeper problem wasn't the cookbooks. It was the philosophy. Instead of teaching the support team to be proficient in Chef and acknowledging that every client environment was different, the org pushed an "easy" prescriptive solution on the people closest to the work, and then shrugged when the prescription didn't fit.

Raising It The Normal Way First

I tried the obvious path first. I gave the product team direct feedback. I made the case that the prescriptive-stack approach was working against the support team more than it was helping us, and that we'd get better outcomes if the building blocks were more composable and the team was trusted to use Chef properly. It didn't move.

So I started working around it. I began teaching some of the more senior techs on support how to use community cookbooks directly, how to think about real customizations, how to build something maintainable instead of layering overrides on top of a stack that didn't fit. That kept growing. Other senior people on support started doing the same.

At some point we realized we needed somewhere to put the public community cookbooks we were maintaining for client work. So we made a GitHub organization for it. We were frustrated enough at that point that we named it The Galley and used a skull-and-crossbones for the avatar. It was a joke, but it wasn't really a joke. The flag said exactly what it looked like it said.

We ran like this for several months. The stacks became less and less relevant to what we actually did day to day. Our direct managers had a general sense of what was happening. They didn't actively bless it, but they didn't stop it either, which in that culture was its own kind of endorsement.

If you're going to push hard, push through the front door first.

The Meeting

Eventually, myself and a few other senior support engineers got pulled into a meeting by a team adjacent to the stacks product team. They were working on some new IaC ideas for provisioning and managing cloud customers (something that, in retrospect, looked a lot like modern-day CloudFormation) and they wanted our feedback.

Somewhere in the conversation, it came up that the support side had almost completely diverged from the stacks model. That was news to the room. The product team's understanding of how their own product was being used was several months out of date.

My manager later thanked me for the "week of meetings" he had directly after that one. Those meetings ended with the model getting overhauled. They brought me and the other senior support engineers from across the teams into what was left of the stacks product team. They replaced the leader of that group with someone I'm grateful to and learned a lot from. And they reshaped the offering into something much closer to what we'd been doing on the side.

What I Actually Took From It

This whole thing became known as the Rosser Controversy, coined by our GM at the time.

What it actually was: a product team built the wrong solution, the people closest to the work routed around it, and eventually the org noticed and corrected course. The interesting part of the story isn't that we built The Galley. The interesting part is what the org did once it found out.

There are a few things I take from it that have held up.

Treat your internal users like customers

The product team probably could have caught this much earlier just by paying attention to the volume of questions and concerns coming from the support side. They almost certainly read the eventual quiet as things getting better, when in reality it was the sound of people giving up on the product and going their own way. If they'd verified instead of assumed, they would have found out months earlier.

The deeper issue underneath that is they never really treated us like customers. We were the people who were supposed to implement what they built. That framing locked them out of the most useful information they had access to. A real customer relationship has feedback loops in it. Account check-ins, success metrics, an honest accounting of where the product is and isn't working. None of that existed between the product team and support, because internally we weren't customers, we were implementers. When your only model for the people downstream of you is "people who execute what we ship," you stop being curious about whether the thing you shipped is working for them.

If you're building internal tooling or platforms, the question "are the people I built this for actually using it, and how is it going for them" is one of the few that doesn't lie to you. But you only get an honest answer if you've set up the relationship to give you one.

Raise it normally first

I gave the product team direct feedback before I started routing around them. That mattered later, when the dust settled. Nobody could honestly say I hadn't tried the normal channels. If you're going to push hard, push through the front door first. Not because the front door always works, but because when it doesn't, you want the record to be clear about what you did before you stopped using it.

Leadership treated it as information, not insubordination

The reason this story ended well is that at every layer, leadership treated what we were doing as information about the work, not as a breach of process. That happened at two layers.

Our direct managers knew enough about what we were doing to stop us if they'd wanted to. They didn't. They let us run, and when the bigger meeting happened, they didn't throw us under it. The version of this story where direct leadership panics the moment it gets visible looks completely different and ends with people getting reprimanded for "going around process," not promoted into the team they were going around.

The second layer was the leadership above the product team. When it became clear the support team had built a parallel system, they treated that as information rather than as insubordination. They pulled the people who'd been doing the work into the room and rebuilt the offering around them. The version of this story where the org closes ranks around the original product team and disciplines support for going off-script is one I've seen play out in plenty of other places. It's a much worse story.

The throughline is the same at both layers. When leadership saw practitioners routing around an internal product, the response was curiosity about what they knew that wasn't being captured, not enforcement of the existing structure. That instinct is what made the difference.

None of this would have been necessary if the product team had been in a real relationship with us the whole time. The Galley existed because there was no honest channel between the people building the product and the people using it. The reorg fixed that, eventually, but a healthier setup would have meant the product never drifted that far from the work in the first place.

If you're building something for another team inside your own company, they are your customers. Not your implementers. Not your distribution channel. Customers. Treat the relationship like one.

From Practitioners to "Thought Leaders" and Back Again

Tue, 12 May 2026 00:00:00 GMT

The Question I Ask Every Candidate

I ask every candidate the same opening question. Talk to me about a technical system or solution you were responsible for architecting or building. What was the business problem it was solving, and then walk me through the solution, going as deep into the tech as you can.

It's not a trick question. There's no right answer. I'm not running a quiz. I'm asking someone to describe their own work, in their own words, at whatever depth they're comfortable going to. As they talk, I ask the kinds of follow-ups any architect would ask. Why did you pick that technology over the alternative. How did you know the system was healthy. What did you monitor. What broke, and what did you do about it.

What comes back, more often than I'd like, is shallow. The candidate can usually name the cloud services they used, typically whichever provider they're aligned with and certified on. AWS, GCP, Azure, the vocabulary is fluent. What's missing is the layer underneath. Why those services and not the alternatives. How they knew the system was actually working. What broke and what they did about it. Candidates who can recite the reference architecture but can't describe a single operational decision they made. Technologists who clearly haven't been close to a running system in a long time, if they ever were.

How Certs Became the Credential

Certifications have been a fixture in tech for a long time. Cisco, CompTIA, Microsoft, Red Hat. The industry has been hiring against credentials, and building muscle memory around screening for them, for decades. None of that was inherently a problem. A network engineer with a CCNA in 2005 had probably touched real gear. The cert lined up with something.

Cloud certifications fit into that existing pattern. AWS launched its certification program in 2013, and the other major providers followed. Bootcamps were already a well-trodden entry point. None of it was new. The cloud certs are tied to a specific vendor's product surface, and even the ones that nominally cover architectural patterns are mostly testing whether you can map a pattern to the right product applied in the prescribed way.

Cloud providers also run partnership programs, and the tier a partner sits in is partly a function of how many certified people they employ. That matters for consulting shops, professional services firms, SIs, anyone whose business depends on being a Gold or Premier or whatever-the-top-tier-is-called partner. So those companies were incentivized to find people who already held certs, or were willing to go get them.

That demand reshaped hiring in the broader market too. Certs had always been one signal among several. You'd see them on a resume next to a real portfolio of work, and they were treated as a useful supplement. Somewhere along the way, the supplement became the credential. Hiring teams started filtering on the cert directly. Recruiters built their pipelines around it. The signal got more and more weight until, for a lot of roles, it was the only thing being checked.

Fast-Tracked Past Building

Then came 2020. Suddenly a much larger group of people wanted into tech, mostly because they wanted to work remotely, and the on-ramp the industry had already standardized on was sitting right there. Cert up and get a job in tech.

The industry, hungry for headcount and pattern-matching on signals it already trusted, fast-tracked a lot of these people into roles that historically required years of practitioner experience. Architect. Engineer. Titles that used to imply you'd built and operated enough systems to have real opinions about why one approach beats another.

The result is a generation of workers in solutioning roles whose entire model of "designing a system" is picking the right boxes from a vendor's whiteboard template. Ask them what they'd do if the constraint were different, or if the platform weren't available, and the answer often falls apart.

To Be Fair

The people in these roles aren't villains. Many of them are doing what the industry rewarded. They optimized for the signals that got people hired and promoted, and those signals were "sounds like an architect" more than "can architect." If you'd told a smart, ambitious person in 2020 that the way up was to stop building things and start talking about them in the right vocabulary, a lot of them would have done exactly that.

For a stretch, the gap between practitioners and the "thought leader" class didn't cost the latter much. If your job was to produce decks, set direction, and translate vendor narratives into internal strategy, you didn't really need to be able to build the thing. There were practitioners under you who would handle that part.

The Title Inflation Got Absurd

For a while this worked, in the sense that nothing visibly broke. Then the titles started outrunning the work by a margin that was hard to ignore.

I've met people who went from intern to Architect with nothing in production behind them, just school projects and a cert track. This was especially common at the big cloud providers and the partner ecosystem around them, where the incentive to staff up on certified headcount was strongest. Engineer is the title you earn by building and operating things. Architect is the title you earn by having built and operated enough things to have real opinions about trade-offs.

The industry hit a saturation point. There were a lot of people in solutioning roles who could talk fluently about systems they'd never actually built, run, or fixed. That worked as long as someone underneath them could absorb the gap. The question was always going to be what happens when the cover gets pulled back.

AI Changes the Math

The judgment lives in the practitioner, not the tool.

The promise people are reading into AI coding tools is that they reduce the value of practitioner skill, because now anyone can produce code. That's not what I've observed. These tools reward the context you bring to them and punish its absence. The person who already understands the work gets dramatically more out of the tool. The person who doesn't gets convincing-looking output they can't evaluate.

I've written before about a family member who used Claude Code to build a working app, then hit a wall the moment it needed real structure.

If you ask the AI for something, it will produce something. Whether that something is right is a judgment call, and the judgment lives in the practitioner, not the tool.

What this means, I think, is that the value of being able to actually do the work is going back up. The people who genuinely understand the systems and concepts they're working with, rather than parroting back the industry standards, are going to matter more.

I don't think this corrects overnight, but the inertia argument is weaker than it looks. The forces that protected the gap assumed companies could keep carrying that layer at the size they'd built it to. They can't. The major tech companies are shedding employees by the thousands every six months, and that alone breaks the pattern that let the gap stay hidden. The candidates who do well in my interviews are the ones who actually understand what they built. They can reason about the trade-offs. They can answer the probing questions, or they're honest enough to say they're not sure. The ones who don't really understand what they built fill the space with vocabulary and almost no depth.

The practitioner is coming back.

Transformation From the Ground Up

Sun, 10 May 2026 00:00:00 GMT

Where It Started

When I was a Unix systems administrator at Texas A&M's College of Architecture, our department wasn't particularly organized. That wasn't unusual for higher ed IT at the time. Tech departments across the university were growing organically, responsibilities were expanding, and most groups were figuring it out as they went. Around that time I read Gene Kim's The Phoenix Project, and like a lot of people who read it, I recognized way too much of my own environment in it.

Eventually I was able to convince my manager to read it, and he pushed for the entire department to read it before scheduling a group offsite. We stepped back and named every problem we could see. We looked at our processes and figured out what to cut, what to improve, and where software could do the job instead of a person. Then we made a plan and executed it.

That exercise is where the appetite for this kind of work came from for me. Once you've stepped back from the day-to-day, looked at how the work actually flows, and changed the shape of it, you start looking for the next place to apply the same thinking.

Wanting APIs Instead of Hardware

For me, the next place was my own infrastructure. We ran our own everything in the College of Architecture. Active Directory, mail, file servers, the whole stack. All of it out of a makeshift server room that had been retrofitted with extra cooling and not much else in the way of redundancy. I didn't want to keep managing hardware. I wanted APIs I could consume to provision what I needed.

Each College Ran Its Own

The way the university worked, central IT provided shared services for the institution, but each college and program had its own IT organization. Because each college controlled its own money, none of them were required to use central IT for anything. They could, but they didn't have to. And most of them didn't want to. There was real hostility toward the idea of moving anything to central IT, mostly because nobody wanted to give up control of something they'd been running themselves for years. So when I started pushing for our college to move that direction, I was pushing against the default position of most of my peers across the university.

Central IT was running VMWare, same as we were, same as most groups across campus. The difference was that they were running it at scale, in a real data center, with the staff and processes to operate it as a platform. They were building it out with the intent to resell capacity to the colleges. So I pushed for the College of Architecture to move our compute over. We were already moving mail off-prem as part of a university-wide push, so the timing was right.

Crossing Department Lines

Offloading the hardware was a clear win, but it didn't get me out of running ancillary services. I still had a Git server, a Chef server, and Atlassian Confluence and Jira, none of which I had any interest in continuing to manage. So I started getting to know my counterparts in other colleges. The Library and the College of Liberal Arts had fellow technologists doing similar automation work, dealing with the same problems. We started meeting regularly.

I pushed for us to centralize some of these ancillary services and share the management burden across departments, with real representation and a voice for each group. The services themselves ran on central IT's infrastructure. They offered up resources to support the effort because it served their goals too.

That kept growing. We ended up running an official TAMU private GitHub instance that quickly became popular across the university. We pushed central IT to think about how they could support automation use cases that historically hadn't been provided for, because nobody had ever asked. We helped centralize services that were genuinely useful to a much wider group than the small handful of us who started the conversation.

It was a good run. All of it from the bottom. A few technologists across departments pushed for changes that ended up affecting the entire university.

What Actually Made It Work

There were a few specific conditions in place around us, and without any one of them the push to centralize across departments would have died on the vine.

Autonomy

The first was that we were treated like adults. My direct manager was largely absent. He let me focus on the work that made me valuable and trusted me to figure out the rest. The other technologists I worked with across departments had similar latitude. Nobody was asking us for status updates on whether we'd been talking to people in other departments. Nobody was checking whether centralizing a Chef server was on a roadmap somewhere. We were given enough room to notice problems and act on them.

That kind of autonomy gets undersold. If I'd been on a tightly managed team where every hour had to map to a ticket, I never would have had the slack to walk over and meet the Library's sysadmin in the first place.

Leadership Backing

The second was that our direct leadership backed the work once it became visible. Saving the university money and building something useful for the broader institution turned out to be an easy story for them to get behind. They didn't have to drive the work, and they didn't try to. What they did was endorse it once it had legs, give us cover when we needed it, and make it easy to keep going. The version of this that doesn't work is leadership that tries to steer it from the top, or that gets territorial because it didn't originate in their org.

You can't mandate that culture into existence. But you can absolutely kill it.

Permission to Cross the Lines

The third is the one I think about the most. None of what we did was technically my job, or anyone else's job either. Reaching out to the Library wasn't on my goals. Pushing central IT to expose new capabilities wasn't a project anyone had funded. We were doing it because the work was useful and we could see it was useful. That only works in a culture that doesn't punish you for it. If at any point someone had pulled rank and told us to focus on our own departments, or made us feel like we were overstepping, the whole thing would have collapsed. What we had was a small group of people who had decided, individually, that the work mattered more than the turf. Central IT didn't feel threatened that technologists from other groups were proposing changes to their platform. That was rare, and we knew it was rare, which is part of why we worked to keep the group functioning.

You can't mandate that culture into existence. But you can absolutely kill it. One leader who treats a question from outside their reporting line as a challenge to their authority is enough to teach everyone around them to stop asking.

Transformation from the ground up sounds romantic when you write about it after the fact. In the moment it's just a handful of people noticing something is broken and choosing to do something about it. The hard part isn't finding those people. They exist in every organization. The hard part is whether the organization gets out of their way.

If you're a leader and you want this kind of energy in your organization, the work isn't to manufacture it. It's to make sure you're not the reason it can't happen.

AI Coding Tools: It's Complicated

Fri, 24 Apr 2026 00:00:00 GMT

I've seen a few people I genuinely respect write off AI coding tools. The concern I hear most often is the one I've written about before. Offloading the work means you stop developing the judgment that comes from doing the work, and when the tool fails, you don't have anywhere to stand. That's real. I've fallen into it myself and written about it. So I want to be upfront that I take the critics seriously.

I also want to be upfront that I've been using these tools heavily, and for me, they work.

Bad code is not an AI problem. It's a pervasive problem that predates AI by several decades.

Before I get into what that looks like, it's worth saying something about the baseline. A lot of the critique of AI-generated code compares it to idealized human code. I've spent most of my career working on code other humans wrote, and I've seen plenty of human-generated code that would have you thinking it was the early days of AI. Bad code is not an AI problem. It's a pervasive problem that predates AI by several decades. That doesn't excuse bad AI-generated code, but it does complicate the comparison.

What works

The first thing is that coding isn't actually my job. My job is to think about systems, make architectural decisions, and lead people. Coding is instrumental to that work, not the substance of it. The tools let me prototype an idea in an hour or two instead of a day, which means I can actually run my normal development loop (build a thing, use the thing, decide I hate it, rewrite it) at a faster cadence. My colleagues used to joke that I go in for a rewrite every six months on any active project. The tools make that cycle tighter.

The second thing is that software is never done. There's always bugs, there's always architectural work that should be happening, there's always the next feature. Most of that work doesn't get done because the team is too busy shipping new things. If the tools can absorb routine bug fixes and basic feature requests, that frees up real attention for the harder problems. How do you structure modules so they stay manageable as scope grows. How do you trace exceptions in a large event-driven app. That kind of work is more interesting than the mechanical part, and is extremely beneficial for long-term health.

The third thing is that I've always hated the typing portion of coding. I appreciate the craft and I've practiced it for a long time, but I have better uses for my time. When I'm working with decent context, owning the architectural decisions and overall direction, and using good tooling around the model, it works really well. I'm not handing off the thinking. I'm handing off the mechanical execution of thinking I've already done.

What they don't fix

The thing AI coding tools don't fix, and I don't think they can, is poorly defined work. From what I've seen and what a lot of my colleagues have seen, the real bottleneck in most engineering orgs isn't coding speed. It's that the work coming into the team lacks basic context about what it's for and what success looks like. No amount of model capability translates a poorly defined problem into a good solution. You can produce code faster, but you can't produce the right code faster if no one has been clear about what the right code would do.

That's an organizational problem and it's the one I'd push on if someone asked me where to actually invest.

Your mileage will vary

I want to be careful about generalizing from my own experience here. I've been building and shipping software for a long time. I have opinions about architecture, I know what bad code looks like when I see it, and I've developed a strong internal sense of when a tool's output is going to hold up and when it isn't. A lot of what makes these tools work for me is the context I'm bringing to them, not the tools themselves.

It took real time in the tool to figure out, which is itself a kind of expertise.

One small proof point on that. I used to be on the highest tier of Claude that was available. I've gotten efficient enough with my token usage that I'm planning to downgrade. That efficiency came from building patterns around how I use the tools. Curating context well, using something like Runbook to handle the determinism I want, and staying in the loop on architectural decisions. None of that was obvious when I started. It took real time in the tool to figure out, which is itself a kind of expertise.

Closer to home, a family member got a copy of Claude Code and used it to build a small web app on GCP to make his work easier. He got it doing what he needed. What he didn't have was a sense for how the files should be broken out for maintainability, how to track the thing in version control, or how to tell whether the deployment was safe. When he started struggling to add features, he sent the code over. It was a single 5000-line HTML file with all the JavaScript and CSS inlined that I read through with some amusement. My experience let me verify the deployment, direct Claude Code through the cleanup, and hand him back something he could keep working on. Same tool he had. I just knew what to ask for.

So when I say the tools work for me, I mean specifically, they work for me, given the way I work and the experience I'm bringing. I'm genuinely not sure how much of that transfers to someone who's earlier in their career or working in a different mode. If you're a skeptic, I'm not going to try to talk you out of it. If you're a believer, I'd encourage you to notice how much of the value you're getting is coming from what you already know.

The tools are neither a panacea nor useless. They're a real capability that rewards the context you bring to them, and punishes its absence.

The Cost of Not Wanting to Hear It

Thu, 23 Apr 2026 00:00:00 GMT

I once raised quality control and scale problems to an executive leader. Real problems. The kind that affect clients and compound over time if you don't deal with them. The response was something along the lines of, "We are bigger than [our largest competitor] and doing significantly more revenue."

That was the whole answer. Not "here's how we're going to address it." Not "walk me through the specifics." Just a size comparison, delivered like it settled the matter.

I've thought about that moment a lot since. Not because the exec was uniquely bad, but because the reflex was so recognizable. When a leader's first move in the face of a hard truth is to reach for a reason it doesn't count, it's usually a sign they're more focused on the story they're telling than on the long-term cost of whatever debt is piling up underneath it. Technical debt, process debt, trust debt. It all compounds the same way, and it all gets more expensive the longer you avoid looking at it.

What it costs

The first thing that happens is the information environment around the leader degrades. People adapt to what gets rewarded. If raising problems gets you labeled a pessimist or a bottleneck, the smart move is to stop raising them, or to soften them until they're unrecognizable. The leader doesn't notice this, because from their vantage point everything looks fine. Meetings are productive. The updates are positive. What they've actually done is build a filter between themselves and reality, and the filter gets thicker every quarter.

The second thing is that the problems don't go away. They get absorbed. The ICs and middle managers who can see the issues end up carrying them, patching them, working around them. For a while, that works. The system holds together on the backs of the people closest to the work. But those people also know what's happening. They know the difference between leadership that engages with hard things and leadership that deflects. And the ones with options eventually use them. What's left is a team that's either too checked out to push back, or too inexperienced to know they should.

The third cost is the slowest and the most expensive. When leaders optimize for the story long enough, they lose the ability to tell when the story has stopped being true. The feedback loops that would have caught a problem early are gone. The people who would have flagged it have left or given up. By the time the issues surface in a way the leader can't deflect, usually through a client leaving, a number missing, or a key person walking out, the problem is no longer the one they could have addressed earlier. It's much bigger, and the options for fixing it are much more complicated.

To be fair

I want to be fair to the position, because it's genuinely hard. Everyone wants to believe the decisions they're making are the right ones. Sitting in the seat where everyone is looking at you for an answer, and then living with that answer after people inevitably push back on it, is not easy work. So you develop a narrative. You build a strategy everyone can rally around. It makes alignment easier. It makes the job livable. I understand the pull of that, and I don't think most leaders who fall into this pattern are doing it cynically.

The story was the product. Someone else would inherit the rest.

That said, for a long stretch in tech, the incentives rewarded a version of leadership that didn't have to care much about the long term. If the role was a stepping stone to the next one, and the next one came quickly, you could grow some numbers that looked good on paper and move on before the debt came due. The story was the product. Someone else would inherit the rest.

A different cycle

The economy those incentives were built on isn't really the one we're in anymore. Job hopping at the senior level has gotten harder. The competitive landscape has gotten meaner. The runway for carrying a story that the work doesn't support is shorter than it used to be. Which makes it a reasonable moment to ask whether the leadership habits that worked in the last cycle are the ones worth keeping for this one. Optimizing for long-term growth and sustainability, even when it means hearing things you'd rather not hear, starts to look less like an ethical stance and more like a practical one.

The opposite habit is simple. When someone brings you a hard truth, hear it out, understand what they're actually concerned about, and address it. Even "that isn't something we can focus on right now" used with real context is different from deflection, and it doesn't train people to stop bringing things up. Every time you deflect instead, you make it a little harder for the next real thing to reach you.

Introducing Runbook

Wed, 22 Apr 2026 00:00:00 GMT

I built runbook because of a specific annoyance I kept hitting with AI coding agents. You tell them the workflow. Here's the build command, here's the test command, here's the lint command. You write it in CLAUDE.md. You set up skills. Next time the agent needs to run tests, it still invents its own command. Sometimes wrapped in grep and sed to keep the token count down. Sometimes with the wrong flags. Sometimes in the wrong directory. Every invocation, a different command.

The root issue is that context isn't constraint. Even when the right command is documented in the context window, the model is still making a fresh decision every time it needs to run something. It doesn't remember that two minutes ago it ran npm run test:e2e successfully. It decides again, and maybe this time it's npx vitest piped through grep for "FAIL". Same task, different command, unpredictable result. The documentation helped sometimes, but it didn't fix the underlying behavior.

How runbook works

The premise is simple: if you don't want the model improvising the command, don't ask it to choose the command. Give it a deterministic tool that runs the command for it. Runbook is an MCP server. You define your project's workflow commands in a YAML file, add runbook as an MCP server in your agentic coding tool (mine's Claude Code), and each task becomes an MCP tool the agent can call. The agent isn't being convinced to use the right command. The right command is the only thing the tool can do.

A minimum config looks like this:

version: "1.0"

tasks:
  build:
    description: "Build the project"
    command: "npm run build"
    type: oneshot

  test:
    description: "Run tests"
    command: "npm run test"
    type: oneshot

  dev:
    description: "Start the dev server"
    command: "npm run dev"
    type: daemon

Drop that at .runbook/tasks.yaml, add runbook to .mcp.json, and the agent gets run_build, run_test, start_dev, stop_dev, status_dev, and logs_dev as tools.

What matters most

Three features matter most in practice.

The first is that long-running task output never enters the context window unless the agent asks for it. When runbook executes a task, it writes the full output to a session log on disk and returns the agent a summary plus a session ID. A five-thousand-line build log doesn't cost the agent five thousand lines of context. If it needs to look at something specific, it calls read_session_log with the session ID and a regex filter, and gets back only the matching lines.

The second is daemon supervision with cross-session clustering. For long-running services like dev servers and previews, runbook tracks PID and metadata on disk. If I open a second Claude Code session on the same project, it sees the dev server is already running and can query its status, read its logs, or stop it. The daemon isn't owned by any one session. It's owned by the project. Two sessions can work in parallel and both know what's already up.

The third is prompts. Runbook lets you define workflow prompts in YAML that template-reference your task tool names. So instead of telling the agent "run lint, then test, then build" in a CLAUDE.md, you write the workflow once as a prompt and it gets exposed through MCP. The prompt resolves the task names at the moment of use, so renaming a task doesn't break documentation that lives somewhere else. A small example:

prompts:
  ship-check:
    description: "Pre-deploy checks"
    content: |
      Run these in order, stop on first failure:
      1. {{run_task "lint"}}
      2. {{run_task "typecheck"}}
      3. {{run_task "test"}}
      4. {{run_task "build"}}

{{run_task "lint"}} resolves to run_lint, the actual tool name. The workflow lives where the project lives. New sessions see it without me re-typing anything.

Smaller details

A few smaller details are worth calling out. The same runbook binary is a CLI, so I can run runbook list or runbook logs dev --filter=ERROR from my terminal and get the same view of state the agent has. The refresh_config MCP tool hot-reloads the manifest, so I can edit a YAML file and have the agent pick up new or changed tools without restarting the agentic coding tool. Configs can be split across multiple YAML files in a .runbook/ directory (tasks in one file, daemons in another, prompts in a third), and runbook merges them. Task groups and the task dependency graph are exposed as MCP resources so agents can introspect the workflow structure when they need to.

How I use it

In practice I usually disable the Bash tool entirely when runbook is wired up. If the agent needs to do something that isn't already a task, it adds the task to the runbook YAML and calls refresh_config, which registers the new tool without restarting the agentic coding tool. The new task is what it uses from then on.

The combined effect is that I stopped re-explaining commands. The agent stopped silently changing the invocation between runs. Long builds and test runs stopped costing context. And running things in the background works the same way whether "later" is thirty seconds away in the same session or an hour away in a new one.

The underlying point is that you can reduce how much the AI has to guess by giving it fewer degrees of freedom at the places where guessing costs you most. Command invocation is one of those places. The model is great at reasoning about what to do next. It's less good at remembering that the test command on this project is not pytest because the project isn't Python. Take the choice away. Give it a tool. The tool does the right thing because you wrote it to.

Source: https://github.com/launchcg/runbook

Define Good Before You Build

Tue, 21 Apr 2026 00:00:00 GMT

I spent enough of my career reading and writing statements of work to know what vague language does to a project. You can look at a scope document before a single person has been assigned to it and tell whether the project is set up to succeed or set up to argue about later. "Implement a recommendation engine to improve user engagement" is not a scope. It's an aspiration. Somewhere in the next six months someone is going to spend real money finding out that "improve engagement" meant three different things to the three people who approved the SOW.

None of this is new. Vague scope has been killing projects since before any of us were in the industry. What's new is that AI is giving teams a way to skip the fundamentals and feel good about it.

The pattern

The pattern works like this. Someone has an idea: what if we could use AI to do some task. They fire up a model, get an output that looks impressive, show it in a meeting, and suddenly there's a new AI project with lots of expectations and hype attached to it. There's a demo. There's excitement. But the thing that should have happened before any of that never happens. Nobody stops to define the actual problem, who it's for, what the success criteria are, or how you'd know it's working. The demo becomes the justification. The excitement becomes the plan.

Exploration isn't a POC

This is where the language matters. Teams often call this early work a proof of concept. It isn't one. A proof of concept starts with at least a rough definition of a problem and tests whether a given technology can feasibly address it. What many AI projects are doing is exploration. They're feeding a model some data or a vague prompt to see what it can do, but there's no problem statement underneath any of it. Exploration is valuable. It shows you what's possible. But it doesn't validate anything, because there was nothing defined to validate against. Calling it a POC gives the project a false sense of maturity, as if a real question was asked and answered. The question "can this model produce something cool" is not the same question as "can this model solve the problem we need solved."

The irony is that people skip the definition step thinking it will slow them down, and then they end up doing the definition work anyway. Just later and at a much higher cost. This is what I often see playing out. A team is mid-build on an AI project, weeks or months in, and they start realizing they don't fully understand the boundaries of the problem they're solving. What are the actual inputs? What are the edge cases? What does the process look like end to end? These are questions that belong on a whiteboard before anyone writes code. Instead, they surface during a sprint, when the cost of answering them includes rework, wasted cycles, and a team that's already committed to an architecture that may not fit.

AI doesn't change the fundamentals of building things. What AI does change is how easy it is to skip those steps.

You can't actually avoid defining the problem. You can only choose whether you do it deliberately upfront or accidentally mid-build. One costs you days, the other quarters.

The underlying issue is that AI doesn't change the fundamentals of building things. You still need to know what you're building, for whom, what the success criteria are, and whether the environment can support it. What AI does change is how easy it is to skip those steps.

This matters more for AI than for traditional software because deterministic systems have a floor. You tell a function to add two numbers, and as long as the code is correct, it adds two numbers. AI doesn't have that floor. The same input can produce different outputs, and quality varies along dimensions that aren't always obvious. That variance makes the absence of a clear problem statement and success criteria worse. The more variance you have in what a system produces, the more you need to know what you're measuring it against.

What good scope looks like

So concretely: "use AI to improve marketing" is not a plan. What specific output are you expecting? What defines a successful result? What does a bad one look like? What volume, what tone, what audience?

Compare that with: build an application that helps marketing draft emails using the company's internal technical documentation as source material, with output reviewed before sending. Now you have source data, an acceptance step, a defined workflow, and enough structure that someone can build it, someone else can test it, and you can tell whether it works. The AI is a component inside a workflow that makes sense on its own terms.

Specific KPIs, not directions

The same specificity applies to the KPIs. "Improve customer satisfaction" is a direction, not a measurement. "Reduce average support ticket resolution time from four hours to two hours" is a KPI. You can build against it. You can test against it. You can tell six months in whether you hit it or not. Without that specificity, you end up retrofitting metrics after launch to justify the spend.

The cost of defining good before you build is a few days. A working session, some writing, some alignment with the actual users. The cost of not defining it is months of a team scrambling to justify a project that nobody can say succeeded or failed.

The question I want every AI project to answer before it starts is the one that sounds the most boring: what does good look like, specifically, in writing, with examples. If the team can answer that, the rest of the project has a chance.

Agents Don't Go in the Org Chart

Tue, 14 Apr 2026 00:00:00 GMT

There's a troubling framing right now about treating AI agents like employees. IBM published a piece titled When the AI Agent Joins the Org Chart. The article itself is more measured than the title, including quotes from researchers pushing back on the "AI coworker" framing and acknowledging that agents are not employees. But the headline leans hard into the framing anyway.

The piece that actually commits to the framing is Harvard Business Review's To Scale AI Agents Successfully, Think of Them Like Team Members. HBR doesn't literally argue for putting agents on the org chart either. But they do argue that you should "stop treating them as turnkey software that simply needs to be installed" and "treat your agents like a new kind of workforce that requires management" with roles, scopes of authority, sources of truth, and escalation rules. The whole piece is built around the analogy to managing human employees.

An AI agent is software. It isn't a team member. It isn't an employee.

I want to say this plainly. An AI agent is software. It isn't a team member. It isn't an employee. The fact that this needs to be said out loud is the thing I find most concerning about the current moment.

Before I unpack what's wrong with the framing, I want to give HBR credit for something. The engineering substance of their piece is actually sound. The four pillars they lay out (identity, context, control, and accountability) are all widely accepted good practice for building agent-based software. I don't disagree with any of those prescriptions. I'd go further and say the Control section contains the strongest insight in the piece: that probabilistic systems should not directly mutate state, and that you should separate the generation of a recommendation from the execution of an action.

My objection is not to any of that substance. It's to the framing HBR uses to deliver it.

Identity

The article argues that agents shouldn't run under shared service accounts and should each have their own credentials and scoped access. All correct. But the framing is that organizations "should treat each AI agent as a distinct digital worker with its own identity." A scoped service account is an engineering pattern. Nothing about it requires you to think of the service as a digital worker.

Context

The article argues that agents need trustworthy, authoritative data sources. Also correct. Every system that depends on organizational data needs this, not just agents. Writing down tribal knowledge, resolving contradictory sources, and governing data provenance is something every business should already be doing. A billing engine pointed at a broken ledger will produce garbage too. Nobody says the billing engine needs to be part of the workforce to fix that.

Control

The pitch here is the strongest part of HBR's piece and worth calling out. They specifically prescribe separating the generation of a recommendation from the execution of an action: let the agent propose, have deterministic software validate, then let some other system execute. That's the right architectural response to probabilistic systems. It's also a well-understood pattern that predates agents entirely. You can build propose/validate/execute pipelines without ever treating the system as a team member. Validation is a change in how you wire things together, not a mental shift into colleague-hood.

Accountability

This is the one where the article undercuts its own framing. The scenario they describe is a procurement agent that posts supplier performance to Slack and accidentally leaks confidential contract terms because it interpreted "share transparently" too broadly. HBR argues you need enough observability to reconstruct the agent's decision. True. But notice what the example actually shows: the real issue is that the software was given access to do things it should not have been able to do. That's a failure of the humans who built and deployed the agent. The humans are the ones who own that consequence.

They then cite Moffatt v. Air Canada, where a court held the airline responsible for misinformation given by its chatbot. HBR cites it as evidence that agents need more accountability infrastructure. I would argue it's also a damning point against the thesis itself. The tribunal rejected the idea that the chatbot was a separate entity. It held the company responsible, because the chatbot is software that humans built and operate. That ruling isn't a call to treat agents like employees. It's a reminder that people who ship bad software own the consequences.

So back to the core question. If the engineering substance is sound, and it is, why do I object so strongly to the framing?

Once a buyer starts comparing the monthly cost of an agent against the monthly cost of an employee, you get to price it like an employee.

The first reason is commercial. If you can convince executives to think of agents as employees, you can start selling them like you sell human labor. Budget line items per agent. Consulting firms quoting projects by "agent headcount." License tiers priced against the salary of the person that agent is supposed to replace. The mental model does commercial work. Once a buyer starts comparing the monthly cost of an agent against the monthly cost of an employee, you get to price it like an employee.

The second reason is more fundamental. Calling the software a team member, at best, feels sociopathic to me. It simultaneously overstates what an LLM-based application can actually do and undercuts what a human contributes that no probabilistic system can. It implies these systems have something like judgment, responsibility, presence, continuity, or moral weight, when in fact they have none of those things. And it quietly flattens the value of human work by framing people and software as things that can be swapped into the same slot on a chart.

Agents are software. The right questions to ask are the same ones you'd ask about any other system: who owns it, how is it monitored, what happens when it fails, what are the guardrails, what's the blast radius if something goes wrong. None of those questions require an org chart, a role, or a headcount line on a P&L.

The real risk of the "team member" framing isn't that it's insulting to humans or silly on its face. It's that it bypasses the scrutiny that software actually deserves. An agent is a probabilistic system with access to real actions inside your business. It needs to be engineered, not employed.

How Do Experts Get Made Now?

Mon, 13 Apr 2026 00:00:00 GMT

I took drafting in high school. Three years of it. The first semester was entirely by hand, before they moved us to CAD. At the time I didn't think much of it, but looking back, that sequence mattered. The point of the class was never to learn AutoCAD. The point was to learn drafting. Creating plans, schematics, blueprints precise enough that someone else could build from them. Hand-drafting first meant that by the time I got to the software, I already understood what I was trying to produce. The tool just made it faster.

I think about that class a lot lately.

I got going in tech around 2010. Self-taught Linux, picked up capabilities without much context for what to do with them. It wasn't until I landed my first real job, as a student tech, that I started to apply any of it. From there I moved into automating as much as I could. Scripts, Chef, custom code, whatever made the repetitive parts go away.

The thing that made all of that work, in retrospect, wasn't the automation tooling. It was that I'd spent time in the weeds first. I knew how Unix signals worked. I'd worked on hypervisors, storage systems, networking gear. I have a much greater depth of understanding about how the internet actually works than people who showed up post-cloud, and that's not a brag, it's just a function of when I came up. I had to learn that stuff because the abstractions hadn't been built yet.

A concrete example

Here's a concrete example of what I mean. Years ago at Rackspace, we were helping a client roll out a new microservices-based application. We got it loaded onto the server and nothing worked. The app was just silently failing. No logs, no errors, nothing useful to go on. The client's team couldn't figure out what was happening, and from the outside there was nothing to figure out. The application wasn't telling anyone anything.

A couple of years before that, I'd spent time on Open Solaris storage systems and gotten fascinated with DTrace, which got me reading about syscalls in general. So I knew there was a layer underneath the application I could look at directly. I straced the process, watched what it was actually doing at the syscall boundary, and found the failure. If I remember right, it couldn't write its logs, which was why we weren't seeing anything from the application itself. The client fixed it and we shipped.

The thing I want to point out about that story is that strace wasn't some exotic tool I had to go discover in the moment. I knew to reach for it because I'd been curious about something completely unrelated years earlier. The knowledge was already there waiting, and the only reason it was there was because I'd done a lot of slow, undirected learning about how the layers underneath actually worked. None of that learning was efficient. None of it was assigned. It just turned out to be the thing that solved the problem when the abstractions failed.

That context turns out to matter more than I realized at the time. A lot of engineers who came up entirely post-cloud are sharp and they ship fast. But there's a category of problem where the wheels come off, and it's always the same category. Something is wrong below the abstraction layer they were trained on, and they don't have a model for what's underneath. They may know how DNS works as it relates to whatever cloud tooling they use, but they may not understand how DNS actually works, including on their own machine. When the tooling can't explain the failure, they're stuck.

I do my best to close out gaps in my own knowledge about systems that showed up before my time, for the same reason. I want to understand the context of how we got here. Having that depth means I'm not beholden to any single tool that just manages things on my behalf. At the end of the day I know it's not magic, and I can work around things when I need to.

This is the part I've been turning over.

Where judgment came from

The slow work wasn't valuable on its own. It was the substrate that the judgment grew out of.

The traditional way you became an expert in this field was that you spent years doing tedious work, and somewhere in the middle of all that tedium, judgment kicked in. You'd seen enough broken systems to recognize a broken system. You'd written enough bad code to know what bad code felt like before you finished writing it. The slow work wasn't valuable on its own. It was the substrate that the judgment grew out of.

AI is very good at removing that slow work. That's most of what people are celebrating about it right now, and reasonably so. But I keep coming back to the drafting class, because the drafting class is the one place I can think of where someone made a deliberate decision to preserve the slow version even after the fast version existed. And they were right to. I don't know if anyone is making that decision now, in any field, on purpose.

If judgment came from the slow work, and the slow work is going away, where does the judgment come from instead?

So here's what I'm actually wondering about. If judgment came from the slow work, and the slow work is going away, where does the judgment come from instead? Is there a different substrate that grows the same instinct? Do people develop it faster because they're freed up to work on harder problems earlier? Or do we end up with practitioners who can produce competent output across a wide range of tasks but have judgment-shaped holes in places they don't know to look, and won't know until something breaks in a way the tool can't explain to them?

I don't have a clean answer. My whole career is an argument that the slow stuff matters and that the people who skipped it are missing something. But "the way I learned was the right way" is the laziest possible position for someone in my position to hold, and every generation says some version of it about the one that came after. Maybe the kids are fine. Maybe they're developing a different kind of judgment that I'm not equipped to recognize because it doesn't look like mine.

What I'd want, if I were designing how someone learned this stuff today, is something like the drafting class. Not because I want to make people suffer through the slow version for nostalgia's sake, but because I want them to encounter the thing the tool is a tool for before they encounter the tool. I want them to know what they're trying to produce. I want them to have at least one experience of doing it without the magic, so that when the magic fails, and it will fail, they have somewhere to stand.

Whether that's actually how expertise gets made now, or whether it's just how it got made for me, I don't know yet.

We Stopped Expecting More From Our Computers

Fri, 10 Apr 2026 00:00:00 GMT

At some point, we decided that using a computer meant learning someone else's tool. Excel, Salesforce, SAP, whatever the platform was. You learned the tool, you got certified in the tool, and then you spent your career doing manual work inside the tool. Computer literacy got reduced to software proficiency.

That's a weird place to have landed. Computers are general purpose machines. They can be made to do almost anything. But for most people, "what can I do with a computer" became "what can I do inside this application." The idea that you could use a computer to reshape your own work, to build something that eliminates the tedious parts of your day, that stayed locked inside engineering departments. It never went mainstream.

Managing twelve people doing manual work gets you a promotion. Eliminating the need for those twelve roles gets you a pat on the back and maybe a pizza party.

And the incentives never pushed it that direction either. For decades, the career move wasn't to ask "how can I solve this so no one has to do it." It was to ask "how large of a team can I get to do it," because headcount looks better on a CV than efficiency. Managing twelve people doing manual work gets you a promotion. Eliminating the need for those twelve roles gets you a pat on the back and maybe a pizza party.

So we ended up where we are. People copying data from one system into another. Pulling numbers from an email and typing them into a spreadsheet and then again into a SaaS CRM. Reconciling information across three different tools because none of them talk to each other. All of it done manually, on machines that could be doing it automatically, by people whose time would be better spent on work that actually requires their brain.

What changed

What AI actually did, the thing worth paying attention to, is make the computer seem more accessible than it ever has. For the first time, an average business user can describe a problem in plain language and start exploring what's possible without needing to understand the deep ins and outs of how the technology works. The gap between "I have this problem" and "a computer could solve this" used to require an engineer in the middle. That gap is shrinking. People are starting to see the possibilities that were always there, just hidden behind a wall of technical knowledge they were never expected to climb.

Transformation for Everyone

Wed, 08 Apr 2026 00:00:00 GMT

For as long as I can remember, "transformation" has been something companies buy. They hire a consulting firm, spend a few million dollars if they are lucky, sit through months of workshops and discovery sessions, and at the end of it they get a PowerPoint deck, some renamed departments, and maybe a new tool nobody asked for. Everyone feels "transformed" for about a quarter, then it's back to business as usual.

That model is dying. Not because consulting is useless, but because the nature of the work is changing. The people closest to the problems are getting access to the tools that can actually solve them. And that changes who transformation belongs to.

The first question I ask when I look at a process is always some version of "why is a person doing this?"

I've spent most of my career automating repetitive work. It's been the through line in almost everything I've done, whether I was managing servers, supporting customers, closing deals, or helping clients figure out what to do with AI. The first question I ask when I look at a process is always some version of "why is a person doing this?" Most of the time, the honest answer is "because nobody has bothered to fix it yet." That's not a damning indictment of the people doing the work. It's just how organizations drift when nobody is actively pushing against the drift.

What it actually looks like

Think about what most people's days actually look like. You spend a chunk of your morning sorting through emails, flagging things, forwarding information to the right people. You copy data between systems that don't talk to each other. You build reports by pulling numbers from three different tools and assembling them in a slide deck. You do this every week. Sometimes every day. And at some point you stopped questioning it because that's just "the job."

But it doesn't have to be. The people who are going to do well in the next decade aren't the ones who get faster at the manual work. They're the ones who look at their day and start asking where a computer should be doing the work instead of them. Not so they can slack off, but so they can spend their time on the parts that actually need a human brain. Strategy. Judgment calls. Creative problem solving. The stuff they were supposedly hired for in the first place but never have time to do because they're buried in process.

The people I see doing this well aren't waiting for permission or a transformation initiative. They're just quietly looking at their own processes, finding the spots where a computer should be doing the work, and fixing it themselves. That's what transformation actually looks like when it's not a line item on a consulting SOW.

AI Doesn't Level the Playing Field

Fri, 03 Apr 2026 00:00:00 GMT

Everyone has access to the same AI tools. That was supposed to be the great equalizer, but we've had enough time now to see what's actually happening, and it's the opposite. The people getting real value from AI are the ones who already understood the work. They prompt better because they know what to ask for, they catch bad output because they know what good looks like, and they use AI to move faster along a direction they've already validated.

The data red herring

There's an obsession right now with data being the bottleneck. Your AI initiative will fail because your data is unclean, unstructured, not "AI-ready." Usable data matters, but it's a small part of the puzzle and the fixation on it is missing the bigger problem entirely. A row of numbers in a database could be anything. GDP of the world's richest countries, credit card numbers, random noise. That data is useless without someone who understands what it represents, why it was collected, and what questions it can actually answer. It's a domain knowledge problem, and no amount of cleaning fixes a lack of understanding.

And data is only one piece of the story anyway. Most of what AI and computers should be helping with goes well beyond analytics. It's connecting applications, moving information between systems automatically, handling the repetitive garbage that eats up people's days, proactively surfacing things that need attention instead of waiting for someone to go dig for them. The real value is in eliminating busywork, not just analyzing spreadsheets.

Unfortunately, most of the software we use every day was never built to be integrated with anything. The APIs are incomplete, poorly documented, or just don't exist. So even in cases where AI could provide immediate, obvious value (email management comes to mind), actually wiring it up means duct tape and bubble gum. You end up pointing Playwright at a browser and automating clicks like it's a screen macro from 2003. It works, technically. Until the vendor changes a button label and the whole thing falls over.

The tool didn't fail them. They didn't have the context to operate it.

That's the real garbage in/garbage out. Not dirty data. Lack of context, lack of proper interfaces, and a software ecosystem that wasn't designed for the way things are headed. When someone without domain expertise sits down with an AI tool, they don't know what to ask, they can't evaluate what comes back, and they run with the first thing that sounds reasonable. The tool didn't fail them. They didn't have the context to operate it.

When the tools are commoditized

When every company is AI-first (and that's where this is going), the differentiator won't be the tools. They're commodities. It won't be the data. Everyone figures out the data part eventually. It will be the people who understand their problems deeply enough to point these tools in the right direction. The competitive advantage was never the technology. It's the thinking that directs it.

The Dumb Question

Fri, 27 Mar 2026 00:00:00 GMT

There's a Home Improvement episode where Tim tries teaching Jill how to fix a sink. He buries her in technical terminology, she gets frustrated, and eventually he realizes he was using jargon to protect his status as the expert in the room. It's played for laughs, but I've watched that exact scene play out in professional settings more times than I can count.

If the "dumb" question consistently turns out to be the question everyone needed asked, it was never dumb.

Early in my career I started noticing a pattern in meetings. Someone would use a term or reference a concept, and I could see on people's faces that they didn't follow. But nobody said anything. Everyone just nodded and moved on. The meeting would continue, decisions would get made, and half the room would leave without actually understanding what they'd agreed to. Then two weeks later the same group would be back in a room trying to figure out why the project was off track.

That pattern is what led me to where I am today, where I'll sometimes literally say, "I have a possibly dumb question." The irony is that the response almost always starts with "that's a great question" or "that's not actually a dumb question at all." Which kind of validates the whole point. If the "dumb" question consistently turns out to be the question everyone needed asked, it was never dumb. It was just the one nobody else wanted to be first to say out loud.

What jargon is actually doing

What I noticed over time is that the jargon problem isn't really about vocabulary. It's about what the jargon is doing in the conversation. Sometimes technical language is genuine shorthand between people who share context. Two engineers talking about race conditions or two painters talking about value structure, that's just efficient communication. The language earns its place because both people know what it means.

But a lot of the time, especially in cross-functional settings, jargon is doing something else entirely. It's a way to sound authoritative without being clear. It's a way to avoid the vulnerability of simple language, because simple language can be questioned and challenged in ways that jargon can't. If I say "we need to leverage our synergistic capabilities to optimize the value stream," nobody knows what to push back on. If I say "we should combine these two teams because they're doing the same work," now we're having a real conversation, and I might be wrong.

I've been on both sides of this. I've caught myself burying a point in technical language because I wasn't fully confident in the point itself. The jargon was insulation. If nobody could parse what I was saying, nobody could tell me it didn't make sense.

The thing that keeps striking me about it is how much time gets wasted. Not just in the meeting itself, but in all the follow-up conversations where people try to figure out what was actually said. Or worse, when they don't have those conversations and just build on assumptions that were never validated. I've seen entire project timelines slip because two groups left the same meeting with completely different understandings of what was decided, and both thought they were right because the language was vague enough to support either interpretation.

Plain, not dumbed down

The older I get, the more I value the people who make things plain. Not dumbed down. Plain. There's a difference. Dumbing something down strips out the nuance. Making it plain preserves the nuance but uses language that actually communicates it. The best technical communicators I've worked with can explain complex systems to non-technical people without losing anything important.

When I first started asking, it felt like I was exposing that I'd missed something everyone else hadn't. Now when I get that feeling, I just ask.

Did You Know This at My Stage?

Tue, 24 Mar 2026 00:00:00 GMT

Early in my career, I was working with someone more junior and reacted with visible surprise when they didn't know something I considered basic. I don't remember the exact topic, but I remember their response: "Did you know this at my stage?"

That question stopped me. Because the honest answer was no. I didn't. I'd learned it at some point along the way, and at some point after that I'd forgotten that I ever had to learn it. It had just become background knowledge, and I'd unconsciously started treating it as something everyone should already have.

The further you get from the beginning of your own learning, the easier it is to lose patience with people who are still in it.

That moment exposed something I've watched play out everywhere since. The further you get from the beginning of your own learning, the easier it is to lose patience with people who are still in it. You forget how much of what you know came from stumbling through things, asking questions that felt obvious in hindsight, and having people around who gave you room to figure it out.

The instinct to react with "you don't know that?" or "how could you not know this already?" shows up fast, especially when you're stressed or under pressure. I've done it. Most people have. But every time it happens, it makes the other person less likely to ask the next question. And the question they don't ask is usually the one that would have prevented a problem down the road.

Beyond work

It's easy to frame this as a workplace thing, and it is. Teams where people are afraid to admit what they don't know are teams that move slower than they think they are, because everyone's spending energy performing competence instead of actually building it. But it's bigger than work. The assumption that someone is stupid or lesser because they don't know something you know is just a bad way to move through the world. Everyone is at a different point in their own trajectory. The things that are obvious to you weren't always obvious to you, and the things that are obvious to them might not be obvious to you either.

The best working relationships I've had were with people who made it safe to say "I don't know." Not in some corporate psychological safety initiative way. Just in the way where you could be honest about where your knowledge stopped and nobody made you feel small for it.

I think about that question a lot. "Did you know this at my stage?" It's a good one to keep in your back pocket, not to ask someone else, but to ask yourself when you feel that flash of impatience coming on.

The Trap

Wed, 18 Mar 2026 00:00:00 GMT

I'll admit it: AI has made me lazier at the exact moments when I need to be most thoughtful.

Here's what happened. I needed to implement a feature and asked Claude to research approaches. It came back with detailed code examples and explanations. Looked great. I ran with it. Two days later, I discovered the libraries it referenced were 2+ major versions behind, half the methods were deprecated, and the architectural approach didn't fit my actual use case. I threw it all out and started over.

I didn't hand off the manual labor. I handed off the thinking.

The thing is, I knew better. I've been building software long enough to know that the research phase is where the real work happens. The hard part of coding was never the typing, at least that's how I view it. It's developing the plan, the patterns, and the architecture. The actual writing of lines is closer to manual labor. That's the part AI is genuinely good at accelerating. But I didn't hand off the manual labor. I handed off the thinking. "Figure this out for me" feels exactly like delegating to a competent colleague. Your brain lets go of the problem. You move on to something else. And by the time you come back to it, you've already accepted the answer without doing any of the thinking that would have told you it was wrong.

The trap is that offloading a thought to something else is genuinely addictive. There's a relief that comes with handing over a problem and getting back something that looks complete. Your brain wants to be done with it. You've got other things to focus on. So you accept what came back, maybe skim it, and move on. And you don't realize you skipped the part that mattered until something breaks.

It's a pattern, not a slip

I've caught myself doing this more than once now. Not with trivial tasks, where AI is genuinely great and saves real time. With the hard stuff. The decisions that actually require me to understand the problem space before I can evaluate a solution. Those are the exact moments where I need to be doing more thinking, not less. And those are the exact moments where the temptation to offload is strongest, because the hard stuff is the stuff you most want off your plate.

The uncomfortable part is that it doesn't feel like laziness when it's happening. It feels like efficiency. You're moving faster, covering more ground, getting things done. It's only in retrospect, after you've burned two days on a bad approach or shipped something that doesn't hold up, that you realize you weren't being efficient. You were just avoiding the hard part.

I don't think this is something you solve once and move past. It's more like a gravity you have to keep resisting. The pull to let the tool do the thinking is always there, and it gets stronger the more comfortable you get with the tool. The better AI gets, the more convincing the output, the easier it is to trust it on things you shouldn't be trusting anyone on but yourself.