“From Subscriptions to Usage-Based Billing”: After Anthropic and OpenAI, Microsoft Revamps Pricing as the Era of ‘AI Maxxing’ Nears

Picture

Member for

11 months

Real name

Siobhán Delaney

Bio

Siobhán Delaney is a Dublin-based writer for The Economy, focusing on culture, education, and international affairs. With a background in media and communication from University College Dublin, she contributes to cross-regional coverage and translation-based commentary. Her work emphasizes clarity and balance, especially in contexts shaped by cultural difference and policy translation.

Authored On

Jun 26, 2026 04:04

Modified

Jun 26, 2026 04:06

Flat-Rate Pricing Officially Ends as Usage-Based Billing Gains Ground
Explosive Token Consumption Driven by AI Replacing Human Workflows
Companies Rein in Unlimited Token Usage, Reallocate AI Models by Task

The pricing structure for generative artificial intelligence (AI) models is undergoing a fundamental transformation. As more enterprise customers use AI agents under subscription plans costing only a few dozen dollars per month to perform workloads equivalent to those handled by a full-time employee, subscription-based pricing alone has become increasingly insufficient to cover the underlying computing costs. With major AI companies rapidly expanding usage-based billing models, enterprise customers are likewise reshaping their AI deployment strategies around productivity per dollar spent.

Microsoft Overhauls Flat-Rate AI Pricing, Expands Pay-as-You-Go Copilot

According to Microsoft on June 25 (local time), Copilot Cowork, an AI agent capable of autonomously carrying out tasks across Word, Excel, PowerPoint, Outlook and other Microsoft applications, has officially launched after completing a three-month preview period. During the preview, Copilot Cowork was included in the $30-per-month Microsoft 365 Copilot license at no additional cost. Since its official launch on June 18, however, customers have been charged separately on a pay-as-you-go basis in addition to the existing subscription fee.

Pricing depends on factors including which AI model performs the task, how long execution takes, and how much data is processed. Lightweight tasks such as organizing schedules cost approximately $1 to $3, intermediate tasks such as creating presentation materials from email content cost roughly $4 to $7, while advanced tasks including analyzing six months of historical data are priced at more than $7.

Microsoft described the move as its first major pricing overhaul in nearly two decades. Just as the company shifted from perpetual software licenses to subscription-based services in 2010, it said the latest change represents another fundamental transformation of its business model. Charles Lamanna, Microsoft's Corporate Vice President for Business and Industry Copilot, Agents and Platforms, told media outlets that "for Microsoft, which has operated a monthly subscription business for nearly 20 years, this represents a meaningful and significant evolution."

Microsoft's transition to usage-based billing reflects the mounting cost of AI computation. Lamanna noted that some users complete hundreds of tasks per week using Copilot Cowork, adding, "That's exactly what we want to see, and their productivity is tremendous, but our costs can also increase dramatically." He added that "usage-based pricing is the only way to make the Copilot Cowork model sustainable."

AI Usage Rewrites the Revenue Formula

The spread of AI agents is accelerating the industry's shift toward consumption-based pricing. Unlike chatbots that simply answer a user's prompt once, AI agents repeatedly execute multiple steps, dramatically increasing token consumption. Agentic AI repeatedly calls multiple AI models while processing user requests, simultaneously searching enterprise databases and interacting with external systems. As tasks become more complex and execution times lengthen, graphics processing unit (GPU) workloads increase accordingly. According to Anthropic, enterprise developers spend an average of $13 per day using Claude Code, indicating that profitability is difficult to sustain under flat-rate plans priced at $20 to $30 per month.

In response, Anthropic converted its Claude Enterprise offering from a flat-rate subscription to a usage-based pricing model in April. Previously, customers paid up to $200 per user each month with token usage included. Under the new structure, users pay a base fee of $20 per month plus additional charges based on actual computing consumption. The growing popularity of compute-intensive products such as Claude Code had turned the previous flat-rate model into a burden on profitability. OpenAI has also recently introduced usage-based pricing for its Codex AI coding agent. The company simultaneously expanded its Pro subscription lineup by adding a $100 monthly option alongside its existing $200 plan. Both Pro tiers retain the same branding and continue to provide access to the flagship GPT-5.4 Pro reasoning model, unlimited file uploads, and other core capabilities.

The AI industry increasingly views this transition as a turning point in the economics of generative AI. Following the launch of ChatGPT in 2023, AI companies focused on attracting users through flat-rate subscriptions priced around $20 to $30 per month. A larger user base accelerated ecosystem growth and made it easier to secure enterprise customers. At that stage, expanding market share took priority over the heavy cost of GPU investment.

The widespread adoption of AI agents has fundamentally altered that cost structure. While chatbots typically involve a single exchange of prompts and responses, AI agents repeatedly plan tasks, gather information, and validate results. A single assignment—whether drafting emails, organizing meeting notes, searching documents, analyzing data, or producing presentations—may trigger dozens of model invocations. As AI usage rises, revenue increases alongside it, but infrastructure expenses have begun climbing at the same pace. Consequently, AI companies have reassessed their pricing logic. More enterprise customers are assigning AI workloads equivalent to more than one employee while paying only $20 to $30 per month, leading providers to conclude that existing pricing no longer accurately reflects the value delivered.

From Token Competition to Productivity Competition

The shift is also reshaping how enterprise customers deploy AI. Under flat-rate subscriptions, customers benefited by maximizing usage within a fixed monthly fee. Under usage-based billing, however, token consumption, model selection, task complexity, and execution time all translate directly into costs. Competitive advantage is therefore shifting away from companies that simply consume the most AI resources toward those capable of deploying AI efficiently for the most appropriate tasks.

This changing environment has also reframed the Silicon Valley concept of "token maxxing." Initially, the term referred to maximizing productivity by invoking AI models as frequently as possible. Companies including Meta and Amazon even published internal token-consumption leaderboards to encourage AI adoption among developers. As usage-based pricing spreads, however, the meaning of token maxxing is evolving. Productivity per task and cost efficiency, rather than raw token consumption, are increasingly determining corporate competitiveness. Deciding which workloads justify premium frontier models and which can be handled by lower-cost alternatives is becoming an integral part of enterprise AI strategy.

Within enterprises, token consumption has already become a managed operating expense. According to the Financial Times, companies including Amazon, Walmart, Cisco, Uber, and Meta have introduced usage caps and budget controls as employee spending on AI tools has risen rapidly. Uber has established a monthly token limit of $1,500 per employee for AI coding tools, while Walmart and Cisco have implemented workload-specific usage policies. Some organizations have also begun assigning legacy or open-source models instead of the latest high-performance models for certain tasks in an effort to reduce costs.

The practice of matching AI models to task complexity is also becoming increasingly common. Legal AI startup Harvey provides a prominent example. Harvey's monthly token usage surged from approximately 1 trillion tokens in January to between 12 trillion and 13 trillion tokens in May. The company said premium models remain appropriate for sophisticated legal analysis, while inexpensive models are sufficient for straightforward document summarization. As token usage continues climbing, companies have little choice but to optimize both model allocation by workload complexity and return on investment (ROI).

For example, OpenAI's GPT-5.5 costs $5 per one million input tokens and $30 per one million output tokens. GPT-5.4, by comparison, costs $2.50 for input tokens and $15 for output tokens—roughly half as much. The strategy is to assign GPT-5.4 to routine queries and calculations while reserving GPT-5.5 for more sophisticated tasks. "In the past, the key question was how extensively companies used cutting-edge AI," one technology industry official said. "Going forward, the focus will shift to how effectively AI is deployed to generate measurable business value." Industry observers also expect AI selection across the broader market to become increasingly differentiated according to practical use cases. Citadel Securities said earlier this month that frontier AI would become concentrated among a relatively small group of companies possessing the financial resources, research capabilities, and commercial applications necessary to justify the expense, while the remainder of the market would increasingly rely on simpler AI models, resulting in what it described as an "AI bifurcation."

Picture

Member for

11 months

Real name

Siobhán Delaney

Bio

Siobhán Delaney is a Dublin-based writer for The Economy, focusing on culture, education, and international affairs. With a background in media and communication from University College Dublin, she contributes to cross-regional coverage and translation-based commentary. Her work emphasizes clarity and balance, especially in contexts shaped by cultural difference and policy translation.