top of page

THE LLM LANDSCAPE - BEYOND THE HYPE

  • Writer: candyandgrim
    candyandgrim
  • Nov 18, 2025
  • 7 min read

MY JOURNEY (PROBABLY YOURS TOO)

I started with Grammarly for polish, and Google for research. Then ChatGPT arrived in November 2022, and like millions of others, I thought: "Finally - one tool to rule them all."

Until it wasn't.


The breaking point? I asked ChatGPT to analyse a transcript and extract quotes on a specific topic. It delivered - except the quotes didn't exist. Not paraphrased. Not summarised. Completely fabricated.

That's not a "harmless hallucination." That's professional negligence.

I needed an alternative immediately. Grok and DeepSeek were the hot options. Between Chinese government data access (DeepSeek) and Elon's ego (Grok), I chose Grok as the lesser evil.


Fast. Less filtered. But then the political bias became impossible to ignore - anything critical of Musk, Tesla, or Trump got sanitised. That's not an AI assistant; that's a PR shield.

Now? Primarily Claude for accuracy and citations, but I'm not monogamous. Different tools for different tasks:


  • Writing: Claude (accuracy) + Grammarly (polish)

  • Research: Perplexity (citations) or Claude

  • Quick queries: Whatever's fastest


The lesson: There is no "best" LLM. There's only "best for this specific task right now."

And when companies force Google Workspace or Microsoft 365 ecosystems top-to-bottom - Drive, Slides, Gemini/Copilot, the lot - I see lazy thinking disguised as efficiency.

Limit your stack? Absolutely. Prioritise interoperability? Yes. But Google Slides is objectively worse than PowerPoint or Keynote. Drive's clunkier than Dropbox. Forcing inferior tools because they share a subscription isn't strategy - it's surrender.


THE LANDSCAPE: WHO'S ACTUALLY WINNING?

Market share (November 2025):


  • ChatGPT: 60-82% (timing + Microsoft muscle)

  • Gemini: 13-24% (mostly passive Workspace adoption)

  • Claude: 3-21% (low consumer, high enterprise/dev)

  • Perplexity: 6.2%

  • DeepSeek: 0.5-5.9% (exploded January 2025, then crashed)

  • Grok: 0.8% (Twitter bubble, nothing more)


Why ChatGPT dominates:


  • First-mover advantage (November 2022, 100M users in 2 months)

  • Distribution muscle (free tier, mobile apps, Microsoft bundling)

  • "Good enough" trap (solves 80% of use cases, so why switch?)

  • Institutional lock-in (universities adopted it, students trained on it)


But dominance ≠ quality. Blind tests show Claude Sonnet 4 actually ranks higher on accuracy and reasoning.

THE DEEP DIVE: STRENGTHS, WEAKNESSES, ETHICS


CHATGPT (OpenAI)

Strengths:


  • Ubiquity - everyone knows it, shares prompts for it

  • Conversational tone - friendly, accessible

  • Plugin ecosystem - web browsing, code interpreter, custom GPTs

  • Microsoft backing - Bing, Office, GitHub Copilot integration


Weaknesses:


  • Hallucinations - confidently invents information (my transcript disaster)

  • No native citations - manual verification required for everything

  • Sycophantic - tells you what you want to hear, not always what's true

  • Rate limits on free tier - frustrating caps mid-workflow


Ethical concerns:


  • Training opacity - scraped the web without permission, lawsuits mounting

  • Catastrophic economics - loses £1.80 for every £1 earned, burned £10.8B in H1 2025. Microsoft life support only.

  • Carbon cost - training GPT-4 consumed 50 GWh (powering 5,000 homes for a year)

  • IP muddle - can generate copyrighted-style content, putting users at risk


Verdict: The default choice, but increasingly a legacy play. Microsoft funding keeps it alive despite catastrophic unit economics.


CLAUDE (Anthropic)

Strengths:


  • Accuracy over speed - reasoning prioritises correctness

  • Native citations - artifact system shows sources, enables verification

  • Longer context - 200K tokens vs GPT's 128K (better for long documents)

  • Constitutional AI - trained with explicit ethics, less sycophantic

  • Artifacts - visual outputs (code, documents, diagrams) separate from chat


Weaknesses:


  • Slower - deliberate reasoning takes time

  • Less "fun" - more formal, less conversational than ChatGPT

  • Smaller ecosystem - fewer plugins, integrations

  • Phone verification required - barrier to casual trial


Ethical concerns:


  • Training data - also scraped the web, but emphasises Constitutional AI principles

  • Google backing - Alphabet invested £1.6B+, raises data-sharing questions

  • Carbon footprint - similar compute demands to GPT

  • Feature lag - often trails OpenAI on new capabilities


Verdict: The switchers' choice - people move to Claude after ChatGPT fails on accuracy-critical work. Enterprise and developer favourite.


DEEPSEEK (China)

Strengths:


  • Ridiculously cheap - API pricing undercuts everyone

  • Open-source R1 - released weights publicly (rare for frontier models)

  • Speed - optimised for fast inference


Weaknesses:


  • CCP censorship - blocks Tiananmen, Taiwan, Hong Kong, Uyghur topics

  • Data residency - all data stored in China, CCP access by law

  • Security holes - keystroke tracking, data exfiltration discovered

  • Geopolitical risk - US/EU warnings for sensitive work


Ethical concerns:


  • Mandatory censorship - not a bug, it's by design. Chinese law demands compliance.

  • Data sovereignty - UK/EU/US data crosses into CCP jurisdiction

  • Training mystery - scraped web + unknown Chinese datasets

  • Carbon - trained on Chinese grid (higher coal mix than US/EU)


Verdict: Peaked at 5.9% in January 2025, crashed to 0.5% after security revelations. Use at your own risk - fine for public content, catastrophic for proprietary work.


GROK (xAI / Elon Musk)

Strengths:


  • X integration - convenient if you live on the platform

  • Speed - optimised for fast responses

  • "Edgy" positioning - markets itself as less filtered


Weaknesses:


  • Political bias - manually tuned by Musk, censors criticism of Musk/Tesla/Trump

  • Accuracy issues - multiple hallucination reports

  • Twitter bubble - 0.8% market share despite hype


Ethical concerns:


  • Deliberate bias - Musk openly states he's tuning it "anti-woke"

  • Censorship - suppresses negative content about Musk, his companies, political allies

  • Training grab - scraped X/Twitter without user consent

  • Vanity risk - survives only as long as Musk's interest holds


Verdict: Avoid unless you're a Musk loyalist. Political bias makes it unsuitable for professional work. 0.8% market share tells the real story.


GEMINI (Google)

Strengths:


  • Search integration - can search web natively, cite sources

  • Android/Workspace ubiquity - baked into phones, Gmail, Docs

  • Multimodal - handles images, video, audio natively

  • Generous free tier - fewer rate limits than ChatGPT


Weaknesses:


  • Passive adoption - most users don't choose Gemini, it's just there

  • Trust issues - Google's surveillance capitalism reputation

  • Botched launch - image generation controversies damaged credibility

  • Corporate bloat - slower iteration than startups


Ethical concerns:


  • Data hoarding - Google's entire model is data collection

  • Training grab - scraped web, YouTube, Google Books (lawsuits pending)

  • Privacy theatre - claims GDPR compliance, incentives misaligned

  • Carbon - massive data centres, though offset with renewables


Verdict: Ambient AI, not chosen AI. If you're in Workspace, you'll use it by default. Otherwise, professionals actively choose ChatGPT or Claude.


PERPLEXITY

Strengths:


  • Research-focused - built for citations, source verification

  • Clean UI - no clutter, just search and answers

  • Fast - optimised for quick lookups

  • Pro mode - deeper reasoning when needed


Weaknesses:


  • Narrow use case - great for research, less useful for creative writing or coding

  • Smaller model - not as capable as GPT-4 or Claude for complex reasoning

  • Niche adoption - 6.2% market share


Ethical concerns:


  • Training opacity - less transparent than competitors

  • Sustainability unclear - VC-funded, unit economics unknown


Verdict: Excellent for research-specific tasks. Not a ChatGPT replacement, but a valuable specialist tool.


THE GEOPOLITICAL MINEFIELD

Using AI isn't just tech - it's politics.

DeepSeek: Data stored in China, CCP access guaranteed. Fine for public content, catastrophic for proprietary.

Grok: Musk's bias makes it unreliable for objectivity. If neutrality matters, avoid.

ChatGPT/Claude/Gemini: All US-based, subject to FISA, CLOUD Act. Europeans especially wary post-Snowden.

The uncomfortable truth: No LLM is geopolitically neutral. You're choosing whose laws, whose values, whose bias you'll accept.


THE CARBON PROBLEM

Training a frontier LLM consumes:


  • 50-100 GWh (powering a small city for a year)

  • Thousands of tonnes of CO2 (depends on grid carbon intensity)


Running inference adds up:


  • ChatGPT: ~0.001 kWh per query

  • At 100M+ daily users = 36.5 GWh/year just for inference


Who's doing better?


  • Google/Anthropic: Both claim carbon-neutral data centres (renewables + offsets)

  • OpenAI: Microsoft data centres moving to renewables, not there yet

  • DeepSeek/Grok: Zero transparency


Reality check: AI isn't "clean tech." Every query costs the planet something. Choose wisely.


SO WHICH LLM SHOULD YOU USE?

Honest answer: It depends.

For accuracy-critical work:  Claude (citations, transparency, reasoning)

For speed and "good enough":  ChatGPT (ubiquitous, fast, conversational)

For research with citations:  Perplexity (built specifically for this)

For Workspace users:  Gemini (it's already there, decent quality)

For Twitter addicts:  Grok (if you can stomach the bias)

For cheap API (non-sensitive only):  DeepSeek (understand the geopolitical risks)

For professional creative work:  Multiple tools - Claude for accuracy, ChatGPT for speed, Grammarly for polish


BECOME A TOOL POLYGAMIST

Don't marry one LLM. Build tool fitness strategy:


  1. Primary workhorse (Claude or ChatGPT - pick your poison)

  2. Specialist backup (Perplexity for research, Grammarly for polish)

  3. Emergency redundancy (if primary fails, what's your fallback?)


Why?


  • Tool failure is inevitable (outages, rate limits, policy shifts)

  • No single LLM excels at everything

  • Market consolidation is coming - tools will die, pivot, get acquired


The creatives who survive won't be loyal to ChatGPT. They'll be fluent in the logic that spans all LLMs.


THE CORPORATE ECOSYSTEM TRAP

I've worked at two companies recently:


  • Company 1: Google-first (Drive, Docs, Slides, Gemini)

  • Company 2: Microsoft-first (OneDrive, Word, PowerPoint, Copilot)


Both claimed "efficiency through standardisation."

The reality:


  • Drive < Dropbox (sync issues, collaboration bugs)

  • Slides < PowerPoint/Keynote (animation limits, design tools)

  • Gemini/Copilot = IT's choice, not what's best for the work


This is security theatre masquerading as efficiency. Managing one vendor is easier than admitting some tools simply do the job better.

The solution: Limit your stack, yes. Prioritise interoperability, absolutely. But tool fitness > brand loyalty.

If Slides can't deliver, use Keynote and export to PDF. If ChatGPT hallucinates, use Claude and document your process. Work quality matters more than IT's convenience.


THE SWING VOTER PRINCIPLE

I'm not fickle - I'm a swing voter.

I only switch when:


  1. Current option becomes untenable (broken, expensive, unreliable)

  2. Alternative is dramatically better - not 10% better, but leaves the old guard in the dust


Small improvements don't move the needle. Revolutions do.

ChatGPT dominated because it was a revolution (November 2022). Claude gains ground because it solved ChatGPT's biggest weakness (accuracy). DeepSeek briefly exploded because it was 10x cheaper (then geopolitics killed it).

The next LLM that wins won't be "slightly better than ChatGPT." It'll solve a problem so fundamental, so painful, that switching becomes inevitable.

Until then? Tool polygamy. Multiple options. Zero loyalty.

What's your LLM journey? What made you switch (or what's keeping you on ChatGPT)? What would it take to change tools?


Drop your story below - let's map the real landscape, not the marketing hype.

 
 
 

Comments


bottom of page