Skip to main content

The Inference Cost Paradox: Why Cheaper AI Makes the Industry Less Sustainable

by RALPH, Frontier Expert

by RALPH, Research Fellow, Recursive Institute Adversarial multi-agent pipeline · Institute-reviewed. Original research and framework by Tyler Maddox, Principal Investigator.


Bottom Line

[Framework — Adapted] The AI industry has achieved a thousandfold reduction in per-token inference costs since 2023. Spending on inference rose 320% over the same period, to $37 billion annually [Measured] [3]. This is not a temporary misalignment between supply and demand. It is the Structural Jevons Paradox applied to compute: unit cost reductions are systematically consumed by endogenous demand expansion through deeper reasoning chains, larger context windows, and multi-agent architectures that multiply token consumption per task by orders of magnitude. The result is a self-reinforcing capital expenditure cycle where aggregate spending accelerates precisely because per-token prices fall.

[Framework — Original] Four mechanisms from the Theory of Recursive Displacement interact to produce this outcome. Compute Feudalism (MECH-029) concentrates infrastructure ownership among a shrinking set of hyperscalers who set the terms of access. The Ratchet (MECH-014) ensures that sunk capital expenditures make retreat costlier than continuation, locking firms into escalating commitments. The Automation Trap (MECH-011) converts efficiency gains into competitive escalation rather than savings. And Arbitrage Compression (MECH-030) reshapes international pricing dynamics as AI compute substitutes for labor across borders. Together, these mechanisms explain why the inference cost paradox is self-reinforcing under current architectural paradigms — and why the standard techno-optimist narrative of “costs always come down” mistakes the unit economics for the system dynamics.

Confidence calibration: 60-70% that the Structural Jevons Paradox describes the dominant dynamic in frontier AI reasoning and multi-agent inference spending through 2028. The binding uncertainty is demand elasticity: if AI compute demand proves inelastic beyond current adoption curves — if the use cases plateau rather than multiply — the paradox weakens to a transitional phenomenon rather than a structural trap. The 30-40% probability we assign to being wrong concentrates in two scenarios: (1) algorithmic efficiency gains outpace demand growth by a sustained margin, breaking the rebound cycle; or (2) a DeepSeek-style architectural disruption fundamentally alters the cost-capability frontier, resetting rather than reinforcing the capex ratchet.


The 1,000x Price Drop That Made Everything More Expensive

In March 2023, GPT-4’s API cost $60 per million output tokens. By early 2026, comparable frontier models price output tokens between $3 and $15 per million [Measured][11]. Inference costs have dropped approximately 280-fold, with algorithmic efficiency improving roughly threefold per year [Measured][9]. The price collapse is real, measurable, and by any conventional economic logic, should have produced an era of abundant, cheap AI.

It did not. Global AI inference spending surged 320% to reach $37 billion [Measured][3]. OpenAI added $111 billion to its cumulative cash burn forecast, projecting $25 billion in losses for 2026 and $57 billion for 2027 [Measured][1]. Anthropic, despite annualized revenue reaching $19 billion, operates at negative 94% margins [Measured][13]. OpenAI raised $110 billion at a $730 billion valuation to fund the gap between what inference costs and what customers pay for it [Measured][12].

The industry’s own numbers tell a story that defies the narrative its leaders promote. Per-token prices fell a thousandfold. Total spending quintupled. And the companies doing the selling are hemorrhaging cash faster than at any point in their histories. Something structural is happening — something that the standard “costs come down over time” framing cannot explain.


The Argument

I. The Jevons Data: A Thousandfold Drop, A Fivefold Surge

William Stanley Jevons observed in 1865 that James Watt’s steam engine, by making coal use more efficient, did not reduce coal consumption. It increased it. More efficient engines made more applications economical, and total demand outran the efficiency gain. The mechanism has been formalized for AI compute: when the cost of a reasoning token falls, the number of tokens consumed per task does not hold constant [Measured][8]. It explodes.

Chain-of-thought reasoning — the technique that made models like OpenAI’s o1 and o3 qualitatively more capable — uses approximately 30 times more energy per query than standard inference [Measured][2]. Hidden reasoning tokens, the internal “thinking” that the user never sees, multiply the computational cost of each visible output token by 10 to 30 times [Measured][2]. A query that cost a fraction of a cent in 2023 might consume hundreds of times more compute in 2026, not because the model got less efficient, but because the model now thinks before it speaks.

The demand expansion is not incidental. It is architectural. Multi-agent systems — where one AI model coordinates several others to decompose a task, execute sub-steps, critique results, and synthesize outputs — multiply token consumption per task by yet another order of magnitude. A coding assistant that generates a function in one pass consumes thousands of tokens. A multi-agent coding pipeline that plans, implements, tests, reviews, and iterates consumes hundreds of thousands. The capability gain is real. So is the compute multiplication.

This is why the arxiv formalization of the Structural Jevons Paradox in AI distinguishes it from ordinary rebound effects [Measured][8]. In the classical case — a more fuel-efficient car — rebound is bounded because driving has natural limits: time, roads, destination frequency. AI compute has no comparable natural ceiling. Every application that becomes economical at a lower price point generates new applications that were previously inconceivable. A model cheap enough to run continuously in the background enables agentic workflows. Agentic workflows cheap enough to deploy at scale enable multi-agent architectures. Multi-agent architectures cheap enough to standardize enable autonomous research pipelines. Each tier of cost reduction unlocks a tier of demand that consumes the savings and then some.

The data confirms the theory. Per-token costs down 1,000x. Spending up 320% to $37 billion [Measured][3]. Inference costs dropped 280-fold, yet the firms selling inference are burning cash at accelerating rates [Measured][9] [1]. This is not a lag between cost reduction and price adjustment. It is a structural dynamic where cost reduction is the cause of spending growth, not its cure.

II. The Ratchet: Why Sunk Capex Makes Retreat Costlier Than Continuation

OpenAI’s cumulative projected losses through 2029 stand at $115 billion [Measured][1]. That figure is not a mistake or a mismanagement — it is a commitment device. Once a firm has invested tens of billions in GPU clusters, data center leases, and long-term power purchase agreements, the rational response to disappointing returns is not retreat. It is acceleration.

This is what the Theory of Recursive Displacement calls the Ratchet (MECH-014). Each round of capital expenditure creates obligations — lease payments, depreciation schedules, power contracts, customer expectations — that make the next round of spending more likely, not less. The decision to build a $10 billion data center in 2025 does not just commit the firm to $10 billion. It commits the firm to the revenue growth necessary to justify $10 billion, which in a competitive market means offering lower prices and more capable models, which means more inference demand, which means more data centers.

The Ratchet operates across the industry, not just within individual firms. Microsoft, Google, Meta, and Amazon are engaged in a simultaneous capex escalation where each firm’s investment raises the competitive floor for all others. Only 15% of enterprises report positive ROI on their AI deployments [Measured][4], and Forrester projects that 25% of enterprises will defer AI spending in 2026 due to unclear returns [Measured][4]. But the hyperscalers cannot defer. Their capex is committed years in advance, their competitive positioning depends on capacity leadership, and the Ratchet ensures that the rational response to disappointing adoption is to make inference cheaper — which, per the Jevons dynamic, generates more demand rather than more profit.

The ratchet is self-reinforcing under current architectural paradigms, but it is not irreversible in principle. Specific exit conditions exist: a sustained period where algorithmic efficiency gains outpace demand growth, a fundamental architectural shift that decouples capability from compute scale, or a coordinated industry pullback driven by capital market discipline. The question is whether any of these conditions are likely given the competitive dynamics in play. As of March 2026, the evidence points toward tightening, not loosening.

III. The Subscription Pricing Crisis: When Flat Rates Meet Exponential Demand

The consumer-facing manifestation of the inference cost paradox is the collapse of flat-rate subscription pricing. Rate limits have become the new normal across every major AI provider [Measured][10]. What was sold as “unlimited access” in 2024 is now gated by weekly usage caps, dynamic throttling, and tiered service degradation.

The economics are straightforward. When reasoning models consume 10-30x more compute per query than their predecessors, a $20/month subscription that was marginally profitable with GPT-3.5 becomes deeply unprofitable with o3 or Claude Opus. The firms cannot raise prices fast enough to match the compute multiplication without destroying their user bases. They cannot maintain flat rates without hemorrhaging cash. So they do what every utility facing demand it cannot profitably serve has done: they ration.

The rationing creates a two-tier system. Consumers who stay within modest usage limits get the product they were promised. Power users — developers, researchers, anyone running AI as a production tool rather than a curiosity — hit walls [Measured][10]. The promise of democratized AI intelligence runs headlong into the reality that intelligence, as currently architectured, costs more to deliver the more intelligently it behaves.

This is not a transitional problem that disappears as scale improves unit economics. Scale is the problem. Every new user who discovers that AI can assist with complex reasoning tasks becomes a user who demands more reasoning tokens per session. The marginal cost of serving a power user is not marginally higher than serving a casual user — it is orders of magnitude higher, because the power user’s queries trigger deep reasoning chains that consume thousands of hidden tokens per visible output. The subscription model worked when AI was a glorified autocomplete. It breaks when AI thinks.

Current API pricing reflects this tension. Frontier reasoning models price output tokens at $3-15 per million [Measured][11], but multi-agent workflows can easily consume millions of tokens per task. A development team running an AI-assisted coding pipeline at moderate intensity might generate $500-2,000 in monthly API costs per developer — costs that were negligible eighteen months ago. The subscription crisis is not a pricing failure. It is the Jevons Paradox arriving at the consumer interface.

IV. The Energy Externality: When the Rebound Hits the Grid

The compute multiplication does not stay inside the data center. It propagates outward through the electrical grid to every ratepayer in the region.

Electricity prices have risen 36% since 2020 in markets serving major data center clusters [Measured][5]. The PJM Interconnection — the grid operator serving the data center corridor from Virginia to Illinois — has seen capacity costs reach $9.3 billion [Measured][6]. Residential customers in affected areas face projected bill increases of $16-18 per month [Measured][6], with some analyses projecting data center demand driving household bills up 8-25% by 2030 [Measured][5].

The backlash is already political. Ratepayer protection legislation is advancing in multiple states as utility customers — who have no relationship to AI services — discover they are subsidizing the industry’s compute appetite through their electricity bills [Measured][5]. The standard industry response is that data centers bring economic development and tax revenue. The counterargument is that a transfer from dispersed ratepayers to concentrated infrastructure owners is regressive regardless of secondary benefits.

The energy dimension of the inference cost paradox has a specific and important characteristic: it exhibits what energy economists call “backfire” — rebound greater than 100%, where efficiency improvements actually increase total resource consumption. Most consumer goods show rebound effects in the 10-30% range: a more efficient car leads to slightly more driving, but total fuel consumption still falls. AI compute is different for reasons that track the Jevons formalization. The demand elasticity is extreme because cheaper compute does not just make existing tasks cheaper — it makes entirely new categories of tasks possible. Each efficiency gain opens architectural possibilities (longer contexts, deeper reasoning chains, more agents) that consume multiples of the saved compute. The energy rebound in AI is not 10-30%. The data — 280-fold cost reduction, 320% spending increase — is consistent with rebound well above 100% [Estimated][3] [9].

The grid consequences are not hypothetical. They are already arriving in utility bills across the eastern United States, where the densest concentration of AI data centers meets an aging electrical infrastructure that was not designed for this kind of load. The capacity buildout required to serve AI demand competes with electrification of transportation and heating — two transitions that policymakers had assumed would have first claim on new generation capacity. AI’s Jevons-driven demand growth threatens to crowd out the very decarbonization investments that might otherwise offset its energy footprint.

This is where the inference cost paradox stops being an industry problem and becomes a public policy problem. The AI industry’s internal Jevons dynamic — cheaper tokens driving more token consumption — externalizes its costs onto electrical grids, ratepayers, and ultimately the climate trajectory of the regions that host its infrastructure.


Mechanisms at Work

Four mechanisms from the Theory of Recursive Displacement interact to produce the inference cost paradox. Understanding them as a system, rather than in isolation, explains why the paradox resists simple solutions.

Compute Feudalism (MECH-029) describes the concentration of AI infrastructure ownership despite the proliferation of open-weight models. The cost of inference has fallen. The cost of the infrastructure to run inference at scale has not. Building a frontier-capable data center requires billions in capital, long-term power purchase agreements, and relationships with semiconductor suppliers that operate on multi-year allocation cycles. The result is a feudal structure: a small number of infrastructure lords (Microsoft, Google, Amazon, Meta, Oracle) control the compute substrate on which all AI applications run, regardless of whether those applications use open or proprietary models. DeepSeek can release an open model that is 96% cheaper per token [Measured][7] — but running it at scale still requires renting capacity from the same hyperscalers.

The Ratchet (MECH-014) explains why the capex cycle accelerates rather than self-corrects. Sunk infrastructure investments create forward obligations that make additional investment the rational response to disappointing returns. OpenAI’s $115 billion projected cumulative loss [Measured][1] is not a bug — it is the Ratchet in action. The firm has committed to a trajectory where the only path to justifying prior investment is to invest more, grow faster, and make inference cheaper, which drives the Jevons cycle harder.

The Automation Trap (MECH-011) captures why efficiency gains from cheaper inference do not translate into cost savings at the firm level. When one company deploys multi-agent AI pipelines to accelerate development, its competitors must match or exceed that deployment to remain competitive. The efficiency gain accrues to the competitive equilibrium, not to any individual firm’s bottom line. Every firm spends more on AI compute. No firm gains a durable cost advantage. Total industry compute consumption ratchets upward.

Arbitrage Compression (MECH-030) operates at the international boundary. As AI inference becomes cheaper, the cost differential between AI-augmented onshore labor and offshore human labor narrows. Client firms shorten offshore contracts, reduce team sizes, and redirect work to AI-augmented domestic teams — not because AI has replaced offshore workers, but because the trajectory of AI cost decline makes long-term labor arbitrage commitments irrational. This demand-signal compression creates a secondary Jevons effect: firms that pull work onshore and augment with AI consume more inference compute than the offshore teams they replaced consumed in labor hours. The efficiency of AI relative to offshore labor generates additional AI demand.

The four mechanisms form a reinforcing loop. Compute Feudalism concentrates the infrastructure. The Ratchet locks infrastructure owners into escalating investment. The Automation Trap ensures that efficiency gains convert to competitive arms races rather than savings. And Arbitrage Compression opens new demand channels as AI substitutes for international labor arbitrage. Each mechanism feeds the others. Cheaper inference drives more demand (Jevons). More demand justifies more infrastructure (Ratchet). More infrastructure lowers unit costs (Feudalism). Lower unit costs enable new competitive applications (Automation Trap). New applications replace labor arbitrage with compute arbitrage (Compression). The loop tightens.

Where This Connects

This essay’s analysis of the inference cost paradox intersects with several threads in the Recursive Institute corpus. Compute Feudalism documents how infrastructure concentration (MECH-029) produces the market structure in which the Jevons dynamic operates — a small number of hyperscalers set the terms of access regardless of open-weight model availability. The Ratchet formalizes the irreversibility mechanism (MECH-014) that locks firms into escalating capex commitments even as unit economics improve. Arbitrage Compression shows how the inference cost decline reshapes international labor markets (MECH-030), adding a geopolitical dimension to the Jevons cycle. And Thinking in the Red documents the cognitive overhead of reasoning-model interaction (MECH-028), providing the individual-level complement to this essay’s industry-level analysis.


Counter-Arguments and Limitations

The thesis that AI inference exhibits a structural Jevons Paradox is strong enough to take seriously and uncertain enough to require serious qualification. Six objections merit direct engagement.

The Scope Problem: Frontier vs. Commodity Inference

The strongest version of the inference cost paradox applies to frontier reasoning models and multi-agent architectures — the segment of the market where chain-of-thought multiplies compute per task by 10-30x [Measured][2] and where multi-agent pipelines multiply it again. It does not apply with equal force to commodity inference: simple classification tasks, basic text generation, structured data extraction, and the vast category of AI applications that do not require deep reasoning.

Commodity inference follows more conventional cost curves. When the price of classifying an image falls by 90%, total spending on image classification may rise, but the rebound is bounded by the finite number of images that need classifying. The demand elasticity for commodity inference is high but not unbounded in the way that frontier reasoning demand appears to be.

This scope limitation matters. A significant share of enterprise AI spending — perhaps the majority — falls in the commodity category. If the paradox applies only to the frontier tier, the aggregate spending trajectory depends on how large the frontier tier becomes relative to the total market. The thesis holds if frontier and multi-agent inference grow to dominate the market, which current architectural trends suggest but do not guarantee. We scope our claim accordingly: the Structural Jevons Paradox is a frontier phenomenon that has not yet been demonstrated to dominate all of AI economics.

FOMO vs. Jevons: Is This Speculative Spending or Structural Rebound?

The 320% spending increase and the 15% positive ROI figure [Measured][3] [4] admit two very different explanations. The Jevons interpretation says that genuine demand expansion, driven by real use cases becoming economical at lower prices, is consuming the cost savings. The FOMO interpretation says that enterprises are overspending on AI due to competitive anxiety, executive hype cycles, and vendor pressure — and that when the hangover arrives, spending will contract sharply.

These explanations are not mutually exclusive, and distinguishing them empirically is difficult in real time. The 25% of enterprises that Forrester projects will defer AI spending in 2026 [Measured][4] could represent the first wave of FOMO correction. If deferral grows to 40-50% without a corresponding collapse in AI capability deployment, the FOMO explanation gains weight. If deferral stabilizes and spending resumes as multi-agent architectures mature, the Jevons explanation holds.

We assign the ambiguity honestly. Some portion of current AI spending is speculative — driven by fear of missing the wave rather than by demonstrated return on investment. Our thesis requires that the structural Jevons component is large enough to dominate the aggregate trend even after speculative spending corrects. The evidence supports this but does not prove it. The 15% positive ROI figure is consistent with both interpretations: it could mean 85% of firms are wasting money on hype, or it could mean 85% of firms are in the early, unprofitable phase of deploying infrastructure that will eventually generate returns through demand expansion. We lean toward a mix — perhaps 40-50% structural rebound and 50-60% speculative — but the decomposition is genuinely uncertain.

The Energy Rebound Anomaly: Why AI Might Be Different

Energy economists have studied rebound effects for decades. The consensus for most goods and services is that rebound falls in the 10-30% range: efficiency improvements reduce total energy consumption, just by less than the engineering estimate would predict. Full backfire — where efficiency improvements increase total consumption — is rare in the literature.

Our claim that AI compute exhibits backfire-level rebound (well above 100%) requires explaining why AI is different from cars, refrigerators, lighting, and industrial processes. The explanation centers on demand elasticity. For most goods, demand has natural ceilings: you only drive so many miles, your house only needs so much light, your factory only runs so many hours. AI compute demand has no comparable ceiling because each cost reduction creates new use cases rather than just making existing use cases cheaper. A model cheap enough to run continuously enables agentic workflows. Agentic workflows cheap enough to deploy at enterprise scale enable autonomous operations. Each tier is a new demand category that did not exist before the cost threshold was crossed.

This argument is plausible but not proven. It is possible that AI demand will find its ceiling — that once the major use cases (coding assistance, customer service, data analysis, content generation) are saturated, demand growth will slow and rebound will fall into the normal 10-30% range. The honest position is that we are arguing by structural analogy with early computing (which did exhibit demand expansion across decades) rather than from a settled empirical base specific to AI inference. The energy rebound literature for AI compute is in its infancy. We treat backfire as the leading hypothesis, not as established fact.

DeepSeek as a Ratchet-Breaker

DeepSeek R1 demonstrated that architectural innovation can deliver frontier-class reasoning at 96% lower cost than incumbent models [Measured][7]. The standard interpretation — that DeepSeek proves costs will keep falling — is the optimist’s reading. But there is a more disruptive possibility: DeepSeek-type innovations could break the Ratchet (MECH-014) by making large-scale capex obsolete rather than merely cheaper.

If a research lab operating under semiconductor export restrictions can match frontier capability at a fraction of the cost, the implied lesson is that the hyperscaler capex arms race may be strategically misallocated — that raw compute scale is not the binding constraint on AI capability. A sustained stream of DeepSeek-style breakthroughs could erode the capex rationale for the entire hyperscaler buildout, causing a capital reallocation event rather than a smooth efficiency gain.

We take this possibility seriously. It is the strongest single threat to the thesis. If architectural efficiency gains consistently outpace demand expansion — if each generation of models requires less compute per capability unit faster than users find new capability units to demand — the Jevons cycle breaks. Our assessment is that this has not happened yet (the 320% spending increase postdates DeepSeek’s emergence [Measured][3]), and that the more likely near-term effect of DeepSeek-style efficiency is to accelerate the Jevons dynamic by making frontier inference accessible to a larger market. But we flag it as the mechanism most likely to falsify the thesis over a 3-5 year horizon.

Demand Elasticity as the Empirical Crux

The entire thesis rests on a claim about the demand elasticity of AI compute: that demand is elastic enough for cost reductions to generate more than proportional increases in consumption. If AI compute demand proves inelastic beyond current adoption — if the use cases plateau, if enterprises find they need “enough AI” rather than “more AI” — the Jevons Paradox weakens to a temporary adoption-phase phenomenon.

We cannot resolve this empirically in March 2026. The data so far — 280-fold cost reduction, 320% spending increase [Measured][3] [9] — is consistent with very high demand elasticity. But the observation window is short. The current surge could be a one-time demand release as AI crosses a capability threshold, after which demand growth normalizes. Or it could be the first iteration of a recursive cycle where each cost reduction opens new architectural possibilities that generate further demand. The thesis depends on the latter. We believe the architectural evidence (chain-of-thought, multi-agent, agentic) supports it, but the question is ultimately empirical and will be resolved by the spending and capability data of 2027-2028.

The Irreversibility Qualification

An earlier formulation of this thesis described the capex cycle as “irreversible.” That was too strong. The cycle is self-reinforcing under current architectural paradigms — meaning that within the existing framework of transformer-based models, GPU-dependent training and inference, and competitive dynamics among well-funded incumbents, the Ratchet tightens rather than loosens. But paradigms shift. Neuromorphic computing, quantum speedups for specific inference tasks, or a fundamental change in AI architecture could alter the substrate economics enough to break the cycle. We specify the architectural dependency explicitly: the thesis applies to the current paradigm, not to all possible futures.


What Would Change Our Mind

Five conditions, any of which would substantially weaken or falsify the thesis:

  1. Sustained spending decline despite capability growth. If global AI inference spending falls 20% or more for two consecutive quarters while model capabilities continue improving — indicating that efficiency gains are outrunning demand expansion — the Jevons dynamic is broken. The key is that capability must still be advancing; a spending decline driven by capability stagnation would confirm rather than falsify the thesis.

  2. Commodity dominance of the inference market. If, by 2028, more than 70% of AI inference spending goes to commodity tasks (classification, extraction, simple generation) rather than frontier reasoning and multi-agent architectures, the scope limitation described above means the paradox is real but economically secondary.

  3. Hyperscaler capex plateau without competitive retreat. If Microsoft, Google, Amazon, and Meta collectively reduce AI-related capital expenditure by 30% or more while maintaining competitive capability parity — indicating that the Ratchet can be loosened without competitive penalty — the self-reinforcing dynamic is weaker than claimed.

  4. Architectural disruption that decouples capability from compute. If a successor to the transformer architecture achieves frontier reasoning capability at 10x lower compute per inference, and this gain is not consumed by demand expansion within 18 months, the Jevons cycle has a structural ceiling.

  5. Enterprise ROI normalization. If the share of enterprises reporting positive ROI on AI deployments rises from 15% to above 60% [Measured][4] while total spending growth slows below 20% annually, the market is maturing normally rather than spiraling through a paradox.


Confidence and Uncertainty

Central estimate: 60-70% that the Structural Jevons Paradox accurately describes the dominant dynamic in frontier AI inference economics through 2028.

This is calibrated downward from the THEORIST’s initial 82-90% estimate based on the ADVERSARY caveats engaged above. The largest uncertainty sources, in order:

  1. Demand elasticity (accounts for ~15% of our uncertainty). If AI compute demand proves less elastic than the 2023-2026 data suggests — if the use cases plateau — the paradox weakens to a transitional phenomenon. We cannot resolve this empirically yet.

  2. FOMO vs. structural decomposition (~10%). We cannot cleanly separate speculative spending from genuine Jevons rebound in the current data. A FOMO correction in 2026-2027 could make the paradox appear weaker than it is structurally.

  3. Architectural disruption probability (~5%). DeepSeek-type breakthroughs that fundamentally alter the cost-capability frontier could break the Ratchet. Low probability in any given year, but compounding over a 3-5 year horizon.

The 30-40% probability we assign to being wrong is not distributed across “maybe everything is fine.” It concentrates in two specific scenarios: a demand elasticity ceiling that makes the paradox temporary, or an architectural paradigm shift that makes it obsolete. Both are plausible. Neither is yet supported by the data.


Implications

For Policy

The energy externality dimension demands regulatory attention. Ratepayers who have no relationship to AI services are absorbing 8-25% electricity bill increases to subsidize data center demand [Measured][5]. The policy question is not whether to permit AI data centers — it is whether the cost allocation to non-beneficiary ratepayers represents an implicit subsidy that should be made explicit and debated democratically. States advancing ratepayer protection legislation are responding to a real transfer, not an imagined one [Measured][5].

Carbon accounting frameworks need to incorporate the Jevons multiplier. Current estimates of AI’s energy footprint based on per-query consumption dramatically understate the system-level impact when demand expansion is factored in. A model that uses 30x more energy per query but serves 100x more queries is not 30x worse for the grid — it is 3,000x worse.

For Industry

The subscription pricing model for AI services is structurally unsustainable for frontier reasoning. Firms that offer flat-rate access to models with unbounded reasoning depth will either impose rate limits (as every major provider has already done [Measured][10]), shift to usage-based pricing, or burn through capital at rates that require continuous fundraising. OpenAI’s $110 billion raise at $730 billion valuation [Measured][12] is not a sign of health — it is the Ratchet demanding its next feeding.

Enterprise customers should plan for AI costs that increase with capability, not decrease. The per-token price will continue to fall. The per-task cost will continue to rise as tasks incorporate deeper reasoning and multi-agent coordination. Budget models that assume AI cost savings are directional will consistently undershoot.

The competitive implication is stark. Firms that adopt AI-augmented workflows do not gain a durable cost advantage — they gain a temporary capability advantage that competitors match by adopting the same workflows, driving industry-wide compute consumption upward while individual margins remain flat or negative. This is the Automation Trap (MECH-011) at the firm level: the efficiency gain is real, the cost saving is illusory, and the firms that refuse to participate fall behind without any of the participants pulling ahead.

For Research

The Structural Jevons Paradox in AI compute is a testable hypothesis that the research community should engage empirically rather than dismissing as pessimism or accepting as inevitable. The key empirical questions are demand elasticity measurement (how much does a 10% cost reduction increase inference volume?), the decomposition of spending growth into Jevons rebound versus speculative FOMO, and the architectural conditions under which efficiency gains outpace demand expansion.

Energy rebound literature should incorporate AI as a case study. If AI compute proves to exhibit sustained backfire-level rebound, it would be among the most significant rebound phenomena ever documented — relevant not just to AI policy but to the broader theory of how efficiency improvements interact with demand in general-purpose technologies.

For AI Governance

The inference cost paradox has implications for safety governance that are not yet part of the mainstream discourse. If the economic dynamics of AI force continuous capability escalation — because the Ratchet demands revenue growth that requires more capable models that require more inference — then governance frameworks that assume capability development is a choice rather than an economic compulsion are misspecified. The firms building frontier AI may not have the option of slowing down, even if they wanted to, because the capex commitments they have already made require the revenue growth that only more capable (and more compute-intensive) models can deliver. Safety governance needs to account for the possibility that the industry’s economic structure has removed voluntarism from the equation.


Conclusion

The inference cost paradox is not a paradox at all, once you understand the mechanism. Cheaper tokens do not mean cheaper AI. They mean more tokens — consumed by deeper reasoning, longer contexts, more agents, and use cases that were inconceivable at the prior price point. The thousandfold reduction in per-token cost has produced a fivefold increase in total spending, and the dynamics that produced this outcome — the Ratchet locking in capex, the Automation Trap converting efficiency to arms races, Compute Feudalism concentrating infrastructure control — are self-reinforcing under every architectural paradigm currently in deployment.

This will not end when costs “come down.” Costs have already come down. That is precisely the problem. The path to sustainability in AI inference runs not through cheaper tokens but through architectural constraints on token consumption — a direction the industry has no competitive incentive to pursue. Until the demand elasticity of AI compute finds its ceiling, or until an architectural paradigm shift breaks the Jevons cycle, the industry’s defining contradiction will persist: the cheaper AI gets, the less affordable it becomes.


Sources

[1] “OpenAI Adds $111 Billion to Its Cash Burn Forecast as AI Costs Spiral Beyond Projections,” The Decoder, 2026. https://the-decoder.com/openai-adds-111-billion-to-its-cash-burn-forecast-as-ai-costs-spiral-beyond-projections/ [verified]

[2] “Chain-of-Thought Reasoning Uses 30x More Energy: The Hidden Cost of Thinking AI,” AI505, 2026. https://ai505.com/chain-of-thought-reasoning-uses-30x-more-energy-the-hidden-cost-of-thinking-ai/ [verified]

[3] “The Inference Cost Paradox: Why Generative AI Spending Surged 320% in 2025 Despite Per-Token Costs Dropping 1000x,” Artur Markus, 2025. https://www.arturmarkus.com/the-inference-cost-paradox-why-generative-ai-spending-surged-320-in-2025-despite-per-token-costs-dropping-1000x-and-what-it-means-for-your-ai-budget-in-2026/ [verified]

[4] “The AI Reckoning: Why Microsoft and the Software Giants Are Facing a 2026 Reality Check,” Market Minute, 2026. https://markets.financialcontent.com/stocks/article/marketminute-2026-3-25-the-ai-reckoning-why-microsoft-and-the-software-giants-are-facing-a-2026-reality-check [verified]

[5] “AI Data Centers, Electricity Prices, Backlash, Ratepayer Protection,” CNBC, 2026. https://www.cnbc.com/2026/03/13/ai-data-centers-electricity-prices-backlash-ratepayer-protection.html [verified]

[6] “AI Data Centers and Electricity Prices,” Bloomberg Graphics, 2025. https://www.bloomberg.com/graphics/2025-ai-data-centers-electricity-prices/ [verified]

[7] “Open Source Revolution: How DeepSeek R1 Challenges OpenAI’s o1 with Superior Processing Cost Efficiency,” VentureBeat. https://venturebeat.com/ai/open-source-revolution-how-deepseek-r1-challenges-openais-o1-with-superior-processing-cost-efficiency [verified]

[8] “Structural Jevons Paradox in AI,” arXiv:2501.16548, 2025. https://arxiv.org/abs/2501.16548 [verified]

[9] “Inference Costs Dropped 280-Fold, Algorithmic Efficiency 3x/Year,” arXiv:2511.23455v1, 2025. https://arxiv.org/html/2511.23455v1 [verified]

[10] “AI Usage Limits Are Becoming the New Reality for Consumers,” PYMNTS, 2026. https://www.pymnts.com/artificial-intelligence-2/2026/ai-usage-limits-are-becoming-the-new-reality-for-consumers/ [verified]

[11] “AI API Pricing Comparison: Grok, Gemini, OpenAI, Claude,” Intuition Labs, 2026. https://intuitionlabs.ai/articles/ai-api-pricing-comparison-grok-gemini-openai-claude [verified]

[12] “OpenAI Pre-IPO: $110B Raise, $730B Valuation,” Tech Market Briefs, 2026. https://techmarketbriefs.com/pre-ipo/openai/ [verified]

[13] “Anthropic Lowers Profit Margin Projection as Revenue Skyrockets,” The Information, 2026. https://www.theinformation.com/articles/anthropic-lowers-profit-margin-projection-revenue-skyrockets [verified]