Back to Thoughts

For most of its existence, IT was a cost center. Not in the “we’re being strategic about overhead” sense. In the “you don’t make us money, so justify every dollar” sense. Every headcount request was a fight. Every budget cycle was a negotiation where you explained, again, why the systems that keep the entire company running need more than a skeleton crew to operate.

Meanwhile the infrastructure kept growing. More servers, more applications, more things that could break at 2 AM. And the answer was never going to be more people, because there was no money for more people. Imagine if IT had the same luxury the business side had, where the answer to “we’re overwhelmed” was just “let’s hire another person.” A person per server. A person per application. It would have been absurd, and everyone knew it was absurd, which is exactly why IT never got to operate that way.

So IT did what you do when you can’t throw bodies at a scaling problem. They automated.

Plenty of practitioners were already hand-rolling their own bash scripts and cobbling together whatever they could to manage their systems. By the early 90s, formal tooling to automate fleet management started to show up on the scene, beginning with CFEngine by Mark Burgess. Luke Kanies started building Puppet around 2005 and Adam Jacob launched Chef in 2009. Instead of a person logging into each machine and configuring it by hand, you write the configuration once and let software apply it everywhere. Instead of a runbook that says “SSH into the server and edit this file,” you have code that does it. The machine does what the code says, every time, at whatever scale you need.

That shift didn’t come from a strategy deck. It came from practitioners who recognized that automation was simply the best way to manage infrastructure at scale. I was one of them. I picked up Chef because I like to automate. I wrote cookbooks, built tools around it, and eventually joined Rackspace’s DevOps Automation group where the whole job was running client infrastructure with code instead of hands. The community that grew up around those tools was full of people who had independently arrived at the same conclusion: the manual way doesn’t scale, and no one is going to give you the headcount to make it scale. So you write code instead.

The Rest of the Business Is Where IT Was

I think about the early infrastructure scaling era constantly right now, because I see the same pattern everywhere outside of IT.

Once you know the inputs and outputs, you can reason about the work without needing to audit every individual’s Tuesday.

I was talking with a colleague recently about the challenge of figuring out where automation applies in a mid-size organization. A typical 5,000-person company has tens of thousands of workflows and processes. How do you even begin to reason about that?

Here’s what I’ve come to think. Workflows are largely personal. Every individual has their own way of getting through their day, their own shortcuts, their own version of the process. Trying to map every person’s workflow is a fool’s errand. But that’s not actually what you need to do.

Teams and business units are systems. They work the same way any other system works. They have inputs and they have outputs. A content management team in marketing takes in briefs, brand guidelines, and campaign goals. It puts out finished content, published on schedule, in the right channels. An accounts receivable team takes in invoices and puts out collected payments. The specifics of how each person inside that group does their piece will vary, and honestly, a lot of that variance doesn’t matter. What matters is understanding what goes in, what comes out, and where the friction lives between those two points.

When you frame it that way, the problem gets a lot more tractable. You’re not trying to document ten thousand personal workflows. You’re trying to understand the function of each group, what they need to do their jobs, and what they’re expected to produce. Once you have that, you can start asking where the bottleneck is, where a human is doing something a machine should be doing, where information is getting stuck because two systems don’t talk to each other.

The Luxury IT Never Had

IT automated because they had no choice. The rest of the business hired because they could.

Historically, the pattern in how business departments grow is one that everyone recognizes once you name it. Someone is overloaded. They don’t have time for everything on their plate. So they make the case to hire. The new person absorbs some of the workload. Things stabilize for a while. Then the work grows again, or the original person takes on something new, and now the new person is overloaded too. Repeat.

This is the luxury that IT was never afforded. When a sysadmin was drowning in work, the answer wasn’t “let’s get you some help.” The answer was “figure it out.” The budget wasn’t there. IT was overhead. So IT figured it out, and the way they figured it out was by making the machines do the work that people had been doing by hand.

Business departments never faced that constraint. When marketing needed more content, they hired more people. When operations couldn’t keep up with order volume, they hired more people. When finance was buried in reconciliations, they hired more people. Adding headcount to a revenue-generating or revenue-supporting team was “investment.” Adding headcount to IT was “cost.” That asymmetry meant the business side was never forced to question whether the work should be done by a person at all. The question was always “who can we hire to help with this,” never “why does this require a human in the first place.”

And the incentive structures reinforced it. Managing a bigger team is a career milestone. Managing a more efficient team that does the same work with fewer people is, at best, a footnote. Nobody got promoted for eliminating the need for three roles. They got promoted for managing twelve.

The rest of the business hasn’t hit the wall IT hit, or at least hasn’t recognized it as the same wall. But the economics are pointing the same direction. The era of solving every scaling problem with a new hire is ending, not because companies suddenly got smarter, but because the math is getting harder to justify. And the ability to automate repetitive work is better than it’s ever been.

The Mental Shift

The shift in IT wasn’t really a technology change. The tools mattered, but what actually changed was how people thought about the work. Before, the mental model was: here’s a server, a person manages it. After, the mental model became: here’s a fleet, code manages it, and a person manages the code. The unit of work moved up a layer of abstraction.

That’s the shift the broader business world needs to make. Not “replace people with AI.” That framing is as wrong now as “replace sysadmins with scripts” was wrong then. We didn’t get rid of the systems administrators. We changed what they spent their time on. Instead of configuring servers by hand, they wrote the code that configured servers. Instead of firefighting the same recurring issue every week, they fixed the root cause and automated the check. The work got more interesting because the tedious parts got absorbed by the tooling.

The same thing is available to every team in a modern organization. The marketing coordinator who spends four hours a week compiling a report from three different analytics tools should not be doing that. The operations team that manually reconciles data between two systems that don’t share an API should not be doing that. The project manager who copies status updates from email threads into a slide deck every Friday should not be doing that.

These aren’t hypotheticals. This is what many people’s actual days look like. And just like the sysadmin who was hand-editing config files on 200 servers, most of them have stopped questioning it because it’s just how the job works.

It Won’t Come From the Top

The other thing I took from the early infrastructure scaling era is that the shift didn’t start at the executive level. It started with practitioners. Individual contributors who were frustrated enough with the status quo to go find a better way. I’ve seen this play out in my own career more than once. At Texas A&M, a small group of technologists across departments pushed for infrastructure changes that ended up reshaping how the whole university operated. None of it was anyone’s job description. It happened because the people doing the work could see what needed to change, and they had enough autonomy to act on it.

The parallel isn’t perfect. Business processes are messier than server configurations, the tooling is less mature, and the organizational politics are thicker. But the underlying dynamic is the same. Too much repetitive work, not enough people to do it, and a set of tools that can absorb the repetitive parts if someone points them in the right direction.

The question for every organization is the same one IT has been facing since the use of computers grew beyond what any team could manage by hand. The work scales faster than the headcount. It always has. The difference is that now the rest of the business is running into the same wall, and the tools to do something about it are already here.