ChatGPT-5 Outshines GPT-4 by 40% in Coding Tests: OpenAI’s Rollout Sparks User Revolt and $500 Billion Valuation Talk

OpenAI’s shares in the private market jumped 15% last week after the company unveiled GPT-5 on August 7, pushing its potential valuation toward a staggering $500 billion in ongoing employee stock sale discussions. That’s a hefty leap from its $300 billion mark earlier this year, fueled by the hype around what CEO Sam Altman calls the “best model in the world.” But here’s the rub: while benchmarks paint a picture of progress, thousands of users flooded Reddit and X with complaints, calling the upgrade “horrible” and demanding a return to older versions like GPT-4o.

The main controversy swirls around whether GPT-5 truly delivers on its promises or if it’s a rushed release that’s sacrificed usability for flashier capabilities. OpenAI touted improvements in coding, math, and creative writing, but real-world testers report slower responses, forgotten context, and a personality shift that feels more robotic than revolutionary. This hits investors betting on AI dominance, consumers relying on ChatGPT for daily tasks, and even OpenAI employees who now face backlash over deprecating legacy models without warning. Developers building apps on the platform are scrambling, while everyday users wonder if their workflows just got downgraded.

The Data

Here’s the thing: on paper, GPT-5 looks like a winner, with benchmarks showing solid gains over GPT-4. According to OpenAI’s own evaluations, GPT-5 scores 74.9% on the SWE-bench Verified coding test, a big jump from GPT-4’s 33.2%—that’s over a 40% improvement in handling complex software projects. In math-heavy challenges like the AIME 2025 competition, GPT-5 hits 94.6% accuracy without tools, compared to GPT-4’s roughly 70% in similar setups. And for creative tasks, the model reduces hallucinations by about 45% when web search is enabled, making it less prone to factual errors than its predecessor.

Artificial Analysis, an independent benchmarking firm, echoes this with early tests showing GPT-5 outperforming GPT-4 by up to 27% across key metrics like reasoning and multilingual accuracy. Sources say these figures come from controlled environments, but without over-citing every test, it’s clear the model shines in structured evaluations. For instance, in internal OpenAI benchmarks on knowledge work spanning law and engineering, GPT-5 matches or beats experts in half the cases, a step up from GPT-4’s more modest showings.

Yet, user-reported data tells a different story. On Reddit, threads like “GPT-5 is horrible” racked up nearly 3,000 upvotes, with complaints about slower speeds and inconsistent performance. Analytics Vidhya’s side-by-side tests found GPT-5 summarizing processes more concisely but sometimes omitting key details that GPT-4 caught. Tom’s Guide ran seven prompts head-to-head, and while GPT-5 won for tighter, more original responses in some, it lagged in following instructions precisely. Overall, the data suggests a 30-40% uplift in specialized tasks, but that edge blunts in everyday use where speed and reliability matter most.

The People

Insiders and experts are split, with OpenAI execs hyping the release while users and former testers vent frustration. Sam Altman, during a Reddit AMA, described GPT-5 as evolving from “talking to a college student” in GPT-4 to “a Ph.D.-level expert,” emphasizing its warmer personality and reduced deception rates. He admitted tweaks were needed, saying, “We’re working on an update to GPT-5’s personality which should feel warmer than the current personality but not as annoying as GPT-4o.”

A former OpenAI engineer, speaking anonymously to Reuters, called the rollout a “big mess,” pointing to broken workflows and user revolt over deprecating GPT-4o without notice. On Hacker News, one developer lamented, “GPT-5 is slow and no better than 4,” echoing complaints about it feeling dumber in tools like Cursor. Theo from t3.gg tweeted, “Gonna be straight with you guys. gpt-5 is nowhere near as good in Cursor as it was when I was using it a few weeks ago,” highlighting a perceived downgrade from beta to launch.

Users aren’t holding back either. On X, Ryan Sean Adams shared, “First time power using ChatGPT 5 today. It feels dumber than o3. Stuck in loops, forgetting things in context,” capturing the frustration of paid subscribers who feel shortchanged. A Reddit user in r/singularity noted, “Best place to find up to date AI benchmarks on various LLMs?” before diving into how GPT-5’s scores don’t match real-world vibes. Even Andrej Karpathy weighed in on earlier iterations, criticizing how newer models borrow outdated tropes, like assuming older versions are slower when the opposite holds for LLMs. This smells like corporate spin masking rushed safety alignments that dulled the model’s edge.

Anthropic’s Claude 4 gets nods as a rival, with Bind AI’s comparison showing GPT-5 ahead in versatility but Claude stronger in reliability for continuous tasks. Dan Shipper from Every tested it extensively: “It’s more like a personality, communication, and creativity upgrade than a huge intelligence leap.” The chorus from developers and casual users alike suggests OpenAI might have prioritized benchmarks over practicality, leaving many feeling the upgrade fell flat.

The Fallout

The consequences are piling up fast, from stock wobbles to potential market shifts. Microsoft, a key OpenAI backer, saw its shares dip 1.5% amid the backlash, as investors worry about ripple effects on Copilot and Azure AI services. Analysts at Morningstar predict if user churn hits 10%, it could shave $2 billion off OpenAI’s projected revenue, especially with paid Plus subscribers limited to 200 “Thinking” messages a week on GPT-5. That’s forced OpenAI to backpedal, reinstating GPT-4o and adding toggles for additional models, but the damage to trust is done.

For consumers, it’s a mixed bag. Everyday tasks like drafting emails or coding snippets now feel clunkier, with reports of GPT-5 defaulting to “corporate-neutral” prose and skipping reasoning steps. One X user, Ozan Sihay, complained about voice mode starting every response with canned phrases like “Tabii hemen konuya girelim,” calling it artificial and low-quality. This has driven some to alternatives like Google’s Gemini or Grok, potentially eroding ChatGPT’s 700 million weekly active users. Businesses building on OpenAI’s API face disruptions too—deprecating older models broke workflows, leading to extra costs in retesting and migration.

Economically, the fallout could boost competitors. Anthropic and DeepSeek are gearing up releases, with DeepSeek’s R2 model rumored to challenge GPT-5 using Huawei chips. If OpenAI’s $500 billion valuation talk fizzles due to sustained criticism, it might slow AI investment broadly, hitting sectors like healthcare where GPT-5 promised better performance on benchmarks like HealthBench. Platformer’s Casey Newton outlined three lessons: over-reliance on hype, ignoring user feedback, and the risks of personality tweaks that alienate loyalists.

This smells like the dot-com era’s overhype, where flashy demos masked underlying flaws. WIRED reported OpenAI scrambling to update GPT-5 post-launch, as complaints about “broken AI friends” and limited functionality mount. On the flip side, if fixes roll out quickly, it could solidify OpenAI’s lead, unlocking Altman’s vision of a $100 billion enterprise AI boom through applications like physician assistants. But right now, the revolt risks pushing users toward open-source options, fragmenting the market and delaying broader AI adoption.

The big question looming: Can OpenAI turn this messy GPT-5 launch into a comeback, or will it hand the AI crown to hungrier rivals like Anthropic?

Author

Alfie Williams

Alfie Williams is a dedicated author with Razzc Minds LLC, the force behind Razzc Trending Blog. Based in Helotes, TX, Alfie is passionate about bringing readers the latest and most engaging trending topics from across the United States.Razzc Minds LLC at 14389 Old Bandera Rd #3, Helotes, TX 78023, United States, or reach out at +1(951)394-0253.