Agentic AI Gets Metered: Vertex AI Agent Engine Billing Goes Live

On January 28, 2026, Google Cloud will begin billing for three core components of Vertex AI Agent Engine: Sessions, Memory Bank, and Code Execution. This change makes agent state, persistence, and sandboxed execution first-class, metered resources rather than implicitly bundled conveniences.

Vertex AI Agent Engine, formerly known as the Reasoning Engine, has been generally available since 2025, with runtime compute billed based on vCPU and memory usage. But key elements of agent behavior, including session history, long-term memory, and sandboxed code execution, operated without explicit pricing during preview and early GA phases.

In December 2025, Google updated both the Vertex AI pricing page and Agent Builder release notes to confirm that these components would become billable starting January 28, 2026. With SKUs and pricing units now published, the platform moves from a partially bundled cost model to one where agent state and behavior are directly metered.

How The Mechanism Works

Billing for Vertex AI Agent Engine splits across compute execution and agent state persistence.

Runtime compute is billed using standard Google Cloud units. Agent Engine runtime consumes vCPU hours and GiB-hours of RAM, metered per second with idle time excluded. Each project receives a monthly free tier of 50 vCPU hours and 100 GiB-hours of RAM, after which usage is charged at published rates.

Sessions are billed based on stored session events that contain content. Sessions are not billed by duration, but by the number of content-bearing events retained. Billable events include user messages, model responses, function calls, and function responses. System control events, such as checkpoints, are explicitly excluded. Pricing is expressed as a per-event storage model, illustrated using per-1,000-event examples, rather than compute time.

Memory Bank is billed based on the number of memories stored and returned. Unlike session events, which capture raw conversational turns, Memory Bank persists distilled, long-term information extracted from sessions. Configuration options determine what content is considered meaningful enough to store. Each stored or retrieved memory contributes to billable usage.

Code Execution allows agents to run code in an isolated sandbox. This sandbox is metered similarly to runtime compute, using per-second vCPU and RAM consumption, with no charges for idle time. Code Execution launched in preview in 2025 and begins billing alongside Sessions and Memory Bank in January 2026.

What This Looks Like In Practice

Consider a customer service agent handling 10,000 conversations per month. Each conversation averages 12 events: a greeting, three customer messages, three agent responses, two function calls to check order status, two function responses, and a closing message.

That is 120,000 billable session events per month, before accounting for Memory Bank extractions or any code execution. If the agent also stores a memory for each returning customer and retrieves it on subsequent visits, memory operations add another layer of metered usage.

Now scale that to five agents across three departments, each with different verbosity levels and tool dependencies. The billing surface area expands across sessions, memory operations, and compute usage, and without instrumentation, teams may not see the accumulation until the invoice arrives.

Analysis

This change matters because it alters the economic model of agent design. During preview, teams could retain long session histories, extract extensive long-term memories, and rely heavily on sandboxed code execution without seeing distinct cost signals for those choices.

By introducing explicit billing for sessions and memories, Google is making agent state visible as a cost driver. The platform now treats conversational history, long-term context, and tool execution as resources that must be managed, not just features that come for free with inference.

Implications For Enterprises

For platform and engineering teams, cost management becomes a design concern rather than a post-deployment exercise. Session length, verbosity, and event volume directly affect spend. Memory policies such as summarization, deduplication, and selective persistence now have financial as well as architectural consequences.

From an operational perspective, autoscaling settings, concurrency limits, and sandbox usage patterns influence both performance and cost. Long-running agents, multi-agent orchestration, and tool-heavy workflows can multiply runtime hours, stored events, and memory usage.

For governance and FinOps teams, agent state becomes something that must be monitored, budgeted, and potentially charged back internally. Deleting unused sessions and memories is not just a data hygiene task but the primary way to stop ongoing costs.

The Bigger Picture

Google is not alone in moving toward granular agent billing. As agentic architectures become production workloads, every major cloud provider faces the same question: how do you price something that thinks, remembers, and acts?

Token-based billing made sense when AI was stateless. But agents accumulate context over time, persist memories across sessions, and invoke tools that consume compute independently of inference. Metering these components separately reflects a broader industry shift: agents are not just models. They are systems, and systems have operational costs.

Similar pricing structures are increasingly plausible across AWS, Azure, and independent agent platforms as agentic workloads mature. The teams that build cost awareness into their agent architectures now will have an advantage when granular agent billing becomes standard.

Risks and Open Questions

Several uncertainties remain. Google documentation does not yet clearly define default retention periods for sessions or memories, nor how quickly deletions translate into reduced billing. This creates risk for teams that assume short-lived state by default.

Forecasting costs may also be challenging. Session and memory usage scales with user behavior, response verbosity, and tool invocation patterns, making spend less predictable than token-based inference alone.

Finally, as agent systems grow more complex, attributing costs to individual agents or workflows becomes harder, especially in multi-agent or agent-to-agent designs. This complicates optimization, internal chargeback, and accountability.

Blogs

White Papers

Case Studies

Blogs

White Papers

Case Studies

Blogs

White Papers

Case Studies

Blogs

White Papers

Case Studies

Agentic AI Gets Metered: Vertex AI Agent Engine Billing Goes Live

Agentic AI Gets Metered: Vertex AI Agent Engine Billing Goes Live

How The Mechanism Works

What This Looks Like In Practice

Analysis

Implications For Enterprises

The Bigger Picture

Risks and Open Questions

Further Reading

Contact Us

Contact Us