So everyone’s got a take on what AI in the enterprise should be, and it usually lands in one of two places. Either it’s agents, where you build a pile of them, orchestrate them, and the value shows up, or it’s simpler than that, just grab a Claude or OpenAI license and let your people knock out the boring stuff. Both are real, and I’m not knocking either, but they kind of miss the point, and the gap between what those pitches promise and what happens inside a real company is what I keep coming back to.
- Generic AI tools don't fail in the enterprise because the models are weak. They fail because they've never seen your undocumented systems, your tangled logic, or the knowledge living in three people's heads.
- The widely-quoted 95% pilot “failure” rate is a learning gap, not a model problem: rushed, underfunded pilots dropped into organizations they can't learn from.
- A horizontal model doesn't become your product just because it's smart. Specializing it to your domain is the entire job.
- The shape that fits is compound AI: validated, expert-owned knowledge sets the boundaries, the model reasons inside them, and it runs end-to-end inside the work, never as a bolt-on.
The demo never shows you the mess
Because here’s what the demo never shows you. An enterprise is a mess, decisions made years ago by people who mostly don’t work there now, held together by integrations that run for reasons nobody remembers. There’s a field somewhere that can’t be null because of something somebody decided back in 2017, and now three other systems quietly depend on it. The one person who knows why the billing logic does what it does is out on PTO, and none of it’s written down because everyone who needed to know already knew. So there’s something funny about watching a generic AI tool stroll into that, all confident, having read the whole internet but not your change tickets, and in an enterprise that’s where the truth lives. Nobody selling you enterprise AI has met your enterprise. The demo can’t show what it’s never seen.
An enterprise is a mess. Decisions made years ago by people who mostly don’t work there now, held together by integrations that run for reasons nobody remembers.
Rushed, underfunded, set up to fail
I’ve sat across from 80-plus IT leaders by now, and it almost always comes down to the same two things. There’s real pressure from the top to adopt AI yesterday, and there’s a brutal budget constraint sitting right next to it. Put those together and you’ve basically written the recipe for failure, a pile of rushed, underfunded pilots never set up to land. Which is why that 95% number from the MIT NANDA study last year never shocked me, the one where around 95% of enterprise GenAI pilots showed no measurable impact. Everybody read it as proof the tech’s overhyped, I read it the other way, because the report was clear the models weren’t the problem. It was a learning gap, tools dropped into organizations they didn’t understand and couldn’t learn from.
A horizontal layer is not your product
And here’s what people keep underestimating: enterprise IT isn’t just constraint heavy, it’s expertise heavy, and that combination is exactly why the obvious shortcut doesn’t work. The question I get more than any other is some version of “can’t Claude or ChatGPT just figure this out, isn’t that what they’re for?” Look, those are horizontal layers, general-purpose reasoning you can point at almost anything, and they’re incredible at that. But a horizontal layer doesn’t magically turn into a purpose-built product for your enterprise just because it’s smart.
The model was trained on the public internet. It has never seen your proprietary processes, your undocumented systems, the logic that lives in three people’s heads, and you can’t paste all that into a prompt either, because most of it was never written down and the rest is too tangled to fit. So when you hand it something mission critical under real constraints, it isn’t truly understanding the thing, it’s pattern-matching to whatever looks right, and looks right is what burns you when a bad config takes down billing at month end.
Getting a model to actually handle complex, mission-critical work isn’t a clever prompt, it’s a long, deliberate effort to specialize it to the domain, and that effort is basically the whole job.
“Looks right” is what burns you when a bad config takes down billing at month end.
Down the compound-AI rabbit hole
Honestly this is the whole reason I went down the compound AI rabbit hole. I’ve been chewing on it since early 2024, when the Berkeley folks put out that piece on the shift from models to compound AI systems, and it put words to something I’d felt for years doing integration work but never named. The future of useful AI inside a company was never going to be one bigger, smarter model. It was always going to be a system, the model’s reasoning working alongside retrieval and tools and some real control flow, all pointed at a specific job.
The only shape that fits the problem
Because a compound system is the only shape that fits the problem. The work is expertise heavy, so you need the real expertise in there, not loosely gestured at through a vector store, but curated and validated and signed off by the people who actually hold it, the seasoned hands who’ve done this across sectors and the people inside who know your specific shop.
And that part right there, the experts doing the curating and validating themselves, is honestly the whole reason it works. Somebody who genuinely knows the domain has to sit there and say yes this is right, no this is wrong, this is the edge case that’ll bite you. That validation loop is the difference between a system you’d trust with real work and a demo that falls apart the first time it hits something unexpected.
And in an enterprise that matters even more, because when it gets something wrong somebody has to point at why, and “the model figured it out” won’t fly. So the validated knowledge sets the boundaries, the model reasons inside them, and the expert stays right where their judgment has to be the final call. None of it comes free, it’s slow and unglamorous and the exact step the generic stuff skips, which is exactly why it stalls.
It can’t be a bolt-on
And it can’t just sit off to the side as one more thing people have to use. The version that sticks runs inside the work already happening, from figuring out what you’re even working with, to planning and sequencing it and catching the risks before one blows up a cutover at 2am, to holding the build against the constraints nobody documented. End to end, not a bolt-on. The moment it turns into one more tool somebody has to open, you’ve lost the team.
Everybody waiting on a smarter model to save them will keep learning the model was never the hard part.
The next frontier
So when somebody asks what AI in the enterprise actually means, my honest answer is it’s mostly not the thing being sold to you. It isn’t the marketplace of agents you snap together, and it isn’t the assistant clearing your inbox, both are fine, they’re just the shallow end.
The real stuff demos terribly and builds slowly, because it’s the model’s reasoning fused with real expertise from two places at once, the people who’ve spent years doing this across sectors, and the people inside your own walls who know how your specific mess runs. That outside know-how used to sit siloed in a few heads, and the real shift is it can finally be captured, validated, and put to work instead of walking out when somebody retires.
And this is the next frontier, the thing that decides what enterprise AI looks like over the next few years. It won’t be whoever ships the biggest model or has the most agents running, it’ll be whoever learns to compound real reasoning with that hard-won, validated, human-owned expertise and wire it into how the business runs. Everybody waiting on a smarter model to save them will keep learning the model was never the hard part, and the ones who get compound AI right will set the standard everyone else chases.
- “The GenAI Divide: State of AI in Business 2025,” MIT NANDA. nanda.media.mit.edu
- M. Zaharia et al., “The Shift from Models to Compound AI Systems,” Berkeley AI Research (BAIR), Feb 2024. bair.berkeley.edu
