IT'S NOT ABOUT THE MODEL. IT'S NEVER BEEN ABOUT THE MODEL.

candyandgrim
May 11
11 min read

April 2026 and everyone is shouting

I have never known a month like it in the AI space. GPT-5.5. Gemini updates. Apple quietly embedding intelligence into everything you already own. Microsoft doubling down on Copilot while half its enterprise users have quietly abandoned it. Grok making an acquisition move that raised more eyebrows than confidence. And Anthropic—the company behind the model I use every day—publishing a public post-mortem about a performance drop that most companies would have buried in silence.

Every week brings a new benchmark. A new claim. A new reason to switch. A new LinkedIn post telling you the model you're using is already obsolete.

I want to make a different argument.

Not about which model won this month. Not about benchmark scores that most people—including most professionals using these tools daily—cannot meaningfully interpret. But about what is actually happening in this industry, why the noise is getting louder not quieter, and why the most rational thing you can do in the middle of it is probably nothing.

This is not a review. It is a pattern I have lived through, and a conclusion I have arrived at the hard way.

My switching history...and what it actually tells you

I have used four AI tools seriously. Not dabbled. Not tested. Actually built them into my working life.

I started with Grammarly. Not recently—for years before GPT existed, before LLMs were part of any mainstream conversation. It was the right tool for a long time. A writing assistant that caught what I missed, quietly, in the background, without requiring anything from me beyond the work itself. I didn't leave because something shinier arrived. I left because the world moved and Grammarly didn't move with it. The ceiling became visible and I hit it.

ChatGPT was the obvious next move and for a while it was genuinely transformative. The capability jump was real. But the break came not from headlines or boardroom drama—it came from a specific professional failure. I was doing text-based video editing across sixteen hours of interview footage, searching transcripts for specific quotes to match specific narratives. It should have been straightforward. Instead ChatGPT repeatedly invented quotes that did not exist in the transcript. Plausible, well-formed, completely fabricated. I wasted significant time chasing words that were never said. That was the dealbreaker.

Grok came next. I was curious, and it succeeded immediately where ChatGPT had failed—fast, accurate, no hallucination on the transcript task. For a period it worked. But over time the values of the person running the company became impossible to separate from the product itself. When you are putting your thinking, your client work, your intellectual output through a tool—who owns that infrastructure matters. It stopped feeling like the right hands.

Then Claude. And I am still here.

What I notice looking back is that none of those moves were driven by a benchmark. None of them were because I read that a new model scored higher on some test I don't fully understand. Every single switch was a trust failure. A dealbreaker in the product, the company, or the person at the top. Three different kinds of trust failure. Three different dealbreakers:

A capability ceiling—Grammarly simply not being built for what the world became.

A product failure—ChatGPT hallucinating quotes in a high-stakes production context, costing real time and real money.

A values failure—Grok's leadership becoming impossible to separate from the tool itself.

That pattern is not unique to me. I have had versions of this conversation with people across industries, company sizes, and levels of technical sophistication. The tools people abandon are rarely abandoned because they stopped working. They are abandoned because something broke that had nothing to do with output quality.

Trust broke.

Your AI is like a marriage. Unless you are really shallow, you do not file for divorce because a more attractive model became available. You leave because something failed. Something that mattered.

The players, honestly

This is not a ranking. Rankings are for benchmarks and benchmarks are mostly noise. This is a character study—because character, it turns out, is what you are actually buying when you choose an AI platform.

Anthropic

The quietest of the major players and arguably the most deliberate. Founded by people who left OpenAI over safety concerns, which tells you something about values before you've seen a single product decision. Claude is not always the fastest or the flashiest. But when something went wrong earlier this year—a performance degradation that affected users across the platform—Anthropic published a forensic post-mortem. Named the specific changes. Named the dates. Explained the knock-on effects. Fixed it publicly.

That is not a PR move. That is an engineering culture applied to trust. In an industry built on confidence and opacity, it stands out.

OpenAI

Capable. Genuinely. The model quality is real and the pace of development is extraordinary. But the organisation is carrying significant weight right now—legal proceedings, internal turbulence, a leadership profile that generates as many headlines as the products do. For individual users none of that may matter. For enterprise procurement, for regulated industries, for any organisation that needs to answer a board question about who they are trusting with their data—it matters more than the benchmark.

GPT-5.5 launched this month. It is impressive. It is also arriving in a context that makes it harder to evaluate on its merits alone.

Google

The most underestimated player in the room. Gemini has not captured the cultural imagination the way Claude or ChatGPT have—but Google's infrastructure advantage, its search dominance, and its quiet embedding of AI into Workspace puts it in a position no pure-play AI company can match. It is playing a longer game than most people are watching.

The caveat is the same as Microsoft's—US-headquartered, subject to the same data sovereignty pressures, the same CLOUD Act exposure. Choosing Google over any other US platform does not resolve the underlying question. It just changes the logo.

Others are circling—some well-funded, some genuinely capable in specific contexts. But the top tier has largely settled.

The show and the runway

Fashion Week. Every season, new collections. Every season, the same question—who wore it best. The audience watches the models. The press scores the designers. Everyone has an opinion on who won.

But the most powerful people in the room aren't on the runway. They're not even in the front row. They're the ones who own the venue.

This is the play that separates two categories of AI player. There are those competing to put the best collection on the runway—and there are those quietly buying the building.

Microsoft has embedded Copilot across the entire Microsoft 365 stack—Word, Excel, PowerPoint, Outlook, Teams. Not because Copilot is the most capable model on the runway. It demonstrably isn't. But because if AI becomes the default layer through which people do their work, and that layer lives inside Microsoft's real estate, then Microsoft wins regardless of which model is technically the most stunning in any given season.

Apple is doing the same thing with more elegance and less noise. Intelligence embedded into the operating system. Into the keyboard. Into the camera. Into the apps already on the device already in your pocket. Apple is not building an AI people choose. It is building an AI people don't notice choosing.

But the most complete venue play belongs to Google—and it is the most underestimated position in this entire conversation. Gemini hasn't captured the cultural imagination the way Claude or ChatGPT have. But Google is simultaneously a model maker and a runway builder. On-device AI running locally on Pixel and across Android—billions of devices—requires no data centre, no cloud dependency, no credit consumption. Intelligence that lives on the device itself. And underneath all of it, Google owns the street everyone walks down to get to the venue. Search. The starting point for most human inquiry on the internet, now increasingly powered by AI before you've even clicked a result.

The collections change every season. The venue endures.

It is also worth noting that all three venues are US-owned, operating under the same legislative exposure and the same data sovereignty pressures. Choosing one over another doesn't resolve the underlying question of who ultimately controls the space your work lives in. It just changes the postcode.

The show is dazzling. But know who owns the building.

The chaos corner

Not every player in this story is making a coherent long-term bet. Some are just making noise.

Grok made headlines this month by securing the right to acquire Cursor—the AI coding tool that has quietly become the preferred choice for serious developers. On the surface this looks like an aggressive strategic move. Look closer and it tells a different story. Grok's own engineers were using Cursor because Grok wasn't good enough for the job. The acquisition is not a power play. It is an admission—dressed up as one.

You don't buy the tool your own team prefers over yours if you're winning.

Copilot is the new Clippy. That is not a throwaway insult—it is a precise observation. Clippy was not a bad idea. It was a useful concept, badly implemented, that promised more than it delivered and inserted itself where it wasn't wanted. Every organisation I have spoken to reports the same experience with Copilot: tested, underwhelmed, quietly abandoned. The promise of organisational AI intelligence is real. The product, in its current form, does not yet deliver it—particularly for anyone whose work happens outside the Microsoft ecosystem, which is everyone.

Neither of these is a reason to panic. Neither is a reason to switch anything. They are data points. Signals about where confidence is real and where it is performed.

The chaos corner is always the loudest part of the show. It is rarely where the important decisions get made.

The trust layer

Performance is what gets you in the room. Trust is what keeps you there.

This is the conversation the AI industry is not having loudly enough—and the one that matters most for anyone making a serious decision about which tools to build their organisation around.

Anthropic did something unusual this year. When Claude's performance degraded—noticeably, across several weeks—they published a forensic post-mortem. Named the specific changes. Named the dates. Explained the knock-on effects. Fixed it publicly. No vague reassurance. No corporate non-statement. A named caching bug with a ship date.

That is not a PR move. That is an engineering culture applied to trust. In an industry built on confidence and opacity, it stands out.

But the more significant signal came in February 2026.

The US Department of Defense demanded Anthropic remove its prohibitions on using Claude for fully autonomous lethal weapons without human oversight and for mass surveillance of Americans. Anthropic's CEO stated the company could not in good conscience grant the request. The deadline passed. Anthropic did not move.

The Trump administration responded by labelling Anthropic a supply chain risk—a designation never before applied to an American company—and ordered federal agencies to stop using Claude. A $200 million Pentagon contract. Its entire federal government business. Gone, because the company would not cross two lines it had drawn around autonomous weapons and mass surveillance.

Anthropic sued. The case is still live. The company is still fighting.

What happened next is where the trust contrast becomes impossible to ignore.

Hours after Anthropic was banned, OpenAI announced a Pentagon deal—a dramatic reversal from its original 2023 usage policy which explicitly prohibited military, weapons, and warfare applications. OpenAI had quietly revised its usage policies in January 2024 to remove the military prohibition, and spent the following two years systematically hiring defence insiders to position itself for exactly this kind of contract. Anthropic's exit was OpenAI's entrance. The groundwork had been laid long before the deadline arrived.

For individual users, none of this may feel relevant. For any organisation asking which AI company they want embedded in their workflows, their client data, their strategic thinking—this is the character test. Not the benchmark. Not the demo. The moment when holding the line cost something real.

And then there is Palantir.

Not an LLM company—but increasingly the infrastructure layer beneath enterprise and government AI decisions, which makes the stakes higher, not lower. The UK government has opened NHS patient data to an organisation whose business model benefits from demonstrating that public healthcare systems fail. That is not a privacy concern dressed up as politics. It is a structural conflict of interest baked into a government contract.

The point is not Palantir specifically. It is what it illustrates. When the stakes are high enough—health data, financial systems, national infrastructure—performance becomes almost irrelevant. The question is who owns the tool, how they behave under pressure, and what they are incentivised to do with access they have been granted.

Anthropic was offered a significant financial incentive to cross its own red lines. It declined.

That is the trust signal. Not the marketing. Not the benchmark. What a company does when saying no costs something.

The case for staying

There is a concept I have been building towards throughout this article that does not yet have a widely accepted name. I am going to call it symbiotic calibration.

It is not training—that word belongs to the engineers. It is not just memory—that is only one layer of it. It is the accumulated result of months of real work with a specific AI tool. Part rapport. Part institutional memory. Part agreed rules of engagement. The way a tool learns not just what you do but how you think. The shorthand that develops. The pushbacks that shaped the relationship. The micro-adjustments that happened across hundreds of sessions that were never written down because they didn't need to be—they just became part of how the work flows.

There are two layers to what symbiotic calibration actually contains.

The explicit layer—preferences, frameworks, tone of voice, project context, agreed output formats. The things that could theoretically be documented in a handover note. Painful to rebuild but possible.

The implicit layer—the things that cannot be documented because they emerged through friction. The moment something didn't land and got corrected. The understanding of how you think, not just what you think. The way a tool knows when to push back and when to execute. This layer cannot be exported. It cannot be summarised. It exists only in the pattern of the relationship itself.

This is the switching cost nobody talks about. Everyone discusses whether the new model scores higher. Nobody discusses what you are walking away from.

Switching is not just asking whether the grass is greener. It is deciding to abandon everything you have grown.

The consistency argument runs deeper than comfort or familiarity. Context compounds. A tool that knows your clients, your frameworks, your voice, your current projects, your preferred output formats, and your working style is not interchangeable with a tool that does not—regardless of how the benchmarks compare this week. The gap between a well-calibrated AI relationship and a new one is not a feature gap. It is a time gap. Months of work. Months of friction. Months of rebuilding something that already existed.

This is why the weekly noise about which model is currently ahead is largely irrelevant for anyone doing serious, sustained work. Marginal benchmark gains do not compensate for the loss of symbiotic calibration. The tool that knows you is almost always more valuable than the tool that scores slightly higher on a test you did not design, for a task that is not quite yours.

Your AI is like a marriage. Unless you are really shallow, you do not file for divorce because a more attractive model became available. You leave because something failed. Something that mattered.

The rational position—arrived at through experience, not brand loyalty—is to stay until there is a genuine reason to leave. A trust failure. A capability gap so fundamental it cannot be worked around. A values position so untenable it cannot be ignored.

Not a benchmark. Not a launch event. Not a LinkedIn post telling you the future has arrived and you are already behind.

Where I've landed

I am still on Claude.

Not because it won a benchmark this week. Not because a comparison article told me it was the best. Not because I am loyal to a brand or indifferent to alternatives.

Because it has not given me a reason to leave.

The post-mortem was honest. The Pentagon situation was clarifying. The work we have done together—the symbiotic calibration built across months of real projects, real clients, real thinking—is not something I am willing to discard on the basis of noise.

I have switched before. I will switch again if the reason is real. A trust failure. A fundamental capability gap. A values position I cannot stand behind. Those are the terms. They are not low bars—but they are the right ones.

What I will not do is switch because April 2026 was loud. Every month is loud now. The launches will keep coming. The benchmarks will keep moving. The LinkedIn posts will keep insisting that whatever you are using is already obsolete.

Most of it is weather. Not climate.

The question worth asking is not which AI is winning this week. It is which one you are building with—and whether what you have built together is worth protecting.

For me, the answer is still yes.