AI Model Tracker

OpenAI

GPT-5.5

Released: April 24, 2026

Context1.05M tokens

Input$5.00 / 1M

Output$30.00 / 1M

Cached input$0.50 / 1M

ModalityText + Image in

Reasoning5-level (none→xhigh)

Max output128K tokens

Knowledge cutoffDec 1, 2025

Anthropic

Claude Sonnet 4.6

Released: February 17, 2026

Context1M tokens (beta)

Input$3.00 / 1M

Output$15.00 / 1M

Max output128K tokens

ModalityText + Image in

ReasoningHybrid (extended thinking)

Computer useYes (improved)

Caching savingsUp to 90%

Google

Gemini 3.1 Pro

Released: February 19, 2026

Context1M tokens

Input (≤200K)$2.00 / 1M

Output (≤200K)$12.00 / 1M

Input (>200K)$4.00 / 1M

ModalityText, Image, Video, Audio

Reasoning3 thinking levels

ARC-AGI-277.1%

Max output64K tokens

Side-by-Side Comparison

Dimension	GPT-5.4	Claude Sonnet 4.6	Gemini 3.1 Pro
Release date	Mar 5, 2026	Feb 17, 2026	Feb 19, 2026
Context window	1.05M tokens	1M (beta, API only)	1M tokens
Max output	Not disclosed	128K tokens	64K tokens
Input price / 1M	$2.50	$3.00	$2.00
Output price / 1M	$15.00	$15.00	$12.00
Cached input price	$0.25 / 1M	90% savings (est. ~$0.30)	Supported (tiered)
Input modalities	Text, Image	Text, Image	Text, Image, Video, Audio
Reasoning control	5-level configurable	Hybrid extended thinking	3 thinking levels
Computer use	Native API	Improved in 4.6	Not available
SWE-bench Verified	~80.0%	Approaches Opus-level	Not reported
ARC-AGI-2	Not reported	Not reported	77.1%
GPQA (reasoning)	93.2% (Pro variant)	Not reported	Not reported
Batch pricing	50% off standard	50% off standard	Available
Availability	API + ChatGPT Pro/Plus	API + Claude.ai + Bedrock + Vertex + Foundry	API + Gemini App + Vertex
Pro / Premium tier	$30 / $180 per 1M (GPT-5.4 Pro)	$15 / $75 per 1M (Opus 4.6)	Deep Think (Ultra subs)

What Matters For Your Stack

⚙ Coding

Best pick: GPT-5.4 for complex, multi-file codebases. ~80% SWE-bench and 5-level reasoning control let you dial cost vs. quality per request. Sonnet 4.6 is the value play at $3 input — devs in early access often preferred it over Opus for frontend and financial code. Gemini 3.1 Pro is strong for agentic coding workflows with its efficient token usage.

🤖 Agents

Best pick: GPT-5.4 leads with native computer-use API and configurable reasoning (dial down for fast tool-calling loops, dial up for planning). Sonnet 4.6 has improved computer use and agent planning — great for multi-step browser automation. Gemini 3.1 Pro excels at long-horizon tool orchestration with its medium thinking mode.

📚 RAG

Best pick: Gemini 3.1 Pro — cheapest per-token ($2/$12) with a full 1M context window and native multimodal grounding (text, image, video, audio). Great for document-heavy pipelines. GPT-5.4 at 1.05M context is the largest window but costs more. Sonnet 4.6's 128K max output is ideal when you need long-form synthesis from retrieved docs.

🎨 Multimodal Apps

Best pick: Gemini 3.1 Pro is the clear leader — native text, image, video, and audio input with unified embeddings (gemini-embedding-2). GPT-5.4 handles text + image but no video/audio. Sonnet 4.6 is text + image only. For multimodal RAG or video analysis, Gemini is the only real option at the frontier.

Running Changelog

May 19

Google gemini-3.5-flash GA + Managed Agents preview + Antigravity Agent

Three releases at once. gemini-3.5-flash is GA — next-gen mid-tier model positioned between Flash-Lite and 3.1 Pro for general coding/RAG/agent work. Managed Agents (public preview) runs autonomous, stateful agents inside secure Google-hosted Linux sandboxes — Google operates the runtime, you orchestrate. antigravity-preview-05-2026 is a Managed Agent that can plan, code, manage files, and browse the web inside the sandbox. Together this is Google's enterprise answer to OpenAI Agents SDK and Anthropic Managed Agents — fully hosted, no sandbox infrastructure to run. Worth evaluating against the OpenAI/Anthropic stacks if you've been blocked on agent runtime ops. (Gemini API changelog)

May 19

OpenAI Secure MCP Tunnel GA for enterprise

Account-led GA of Secure MCP Tunnel — lets ChatGPT web, Codex, the Responses API, and AgentKit connect to private/on-prem MCP servers via a customer-hosted tunnel client. No public exposure of internal MCP endpoints, no inbound firewall holes, and works across the full OpenAI agent surface. Material unlock for regulated industries (finance, healthcare, gov) that couldn't expose internal tools to public AI APIs. Pair with self-hosted MCP servers for the most secure tool-use posture available from OpenAI today. (OpenAI API changelog)

May 19

Anthropic Claude Managed Agents: self-hosted sandboxes + MCP tunnels

Two major adds to Claude Managed Agents. Self-hosted sandboxes (public beta): tool execution runs in your infrastructure (or via Cloudflare / Daytona / Modal / Vercel) while orchestration stays on Anthropic — data and code never leave customer infra. MCP tunnels (research preview): lightweight gateway uses a single outbound connection (no inbound firewall rules, no public endpoints, encrypted end-to-end) and works in both Managed Agents and the Messages API. Same enterprise security pattern OpenAI shipped today — the three vendors are now competing on identical agent-runtime feature surfaces. (Claude Managed Agents update)

May 12

OpenAI DALL·E 2 and DALL·E 3 snapshots shut down today

All dall-e-2 and dall-e-3 snapshots are shut down as of May 12, 2026. Migrate any remaining image-generation calls to gpt-image-1 (frontier quality) or gpt-image-1-mini (cost/latency-optimized). Anything still pointing at DALL·E snapshots will now error — audit production code, marketing pipelines, and any user-facing image features. (OpenAI deprecations)

May 11

Anthropic Claude Platform launches on AWS

Anthropic announced Claude Platform on AWS — native AWS billing and IAM, with Messages, Files, Batches, Managed Agents, Skills, code execution, and tool use all supported. Big unlock for AWS-native shops: consolidated invoicing, IAM-scoped Claude access, and no separate vendor onboarding. Same Claude models (Sonnet 4.6, Opus 4.6/4.7) under the hood. Worth evaluating if you've been routing around procurement to use Anthropic direct. (Claude API release notes)

May 11

OpenAI return_token_budget for Responses API web search

New return_token_budget parameter on the Responses API web search tool allows GPT-5+ to spend more tokens reasoning over web search results before returning an answer. Directly relevant for deep-research agents and any RAG-over-web pipeline where the model was previously cutting reasoning short. Tune higher for harder multi-source questions; expect higher latency and token cost in exchange for answer quality. (OpenAI API changelog)

May 7

OpenAI Realtime 2 + Realtime Translate + Realtime Whisper released

Three new realtime/voice models for streaming speech agents. Realtime 2 is a speech-to-speech model with configurable reasoning. Realtime Translate handles streaming speech translation. Realtime Whisper is streaming speech-to-text. Together they round out the voice-agent stack — directly relevant for call-center bots, meeting assistants, and real-time copilots where latency and translation quality matter. (OpenAI API changelog)

May 7

OpenAI Self-serve fine-tuning eligibility tightened — sunset by Jan 6, 2027

OpenAI is winding down self-serve fine-tuning. Effective May 7, 2026: orgs that have never run fine-tuning can no longer create training jobs. Effective July 2, 2026: orgs that haven't run inference on a fine-tuned model in the past 60 days can no longer create new jobs. Hard cutoff Jan 6, 2027: even active customers can't create new fine-tuning jobs (inference continues until the base model is deprecated). If you depend on FT for task-specific coding/RAG models, schedule periodic inference to keep eligibility, or move to alternative customization (RFT, prompting, distillation) before July 2. (OpenAI deprecations)

May 7

Google gemini-3.1-flash-lite goes GA; preview shuts down May 25

gemini-3.1-flash-lite is now GA — Google's speed/scale/cost workhorse for high-throughput RAG, embedding pipelines, and agent tool-routing where cost and latency dominate over peak quality. Migration timeline: gemini-3.1-flash-lite-preview deprecates May 11 and shuts down May 25. If you're still on the preview SKU, switch to the GA name within ~2 weeks to avoid outage. (Gemini API changelog)

May 7

OpenAI OpenAI Developers plugin for Codex

A small but useful plugin for Codex that surfaces Platform access and API setup guidance directly inside the coding workflow. No API breaking changes — just smoother onboarding for devs building against OpenAI inside Codex. (OpenAI API changelog)

May 6

Anthropic Claude Opus API rate limits raised (effective immediately)

Anthropic announced significant API rate limit increases for Claude Opus models, effective immediately. Not a model spec or pricing change, but it directly reduces throttling pain for high-volume Opus workloads (long-context coding agents, document understanding, agent orchestration). Doesn't change Sonnet 4.6 limits. (Anthropic news)

May 6

Google Interactions API schema change — breaking, effective May 20

Gemini Interactions API is renaming outputs to steps and changing response_format configuration. The new schema becomes default on May 20, 2026, and the legacy schema is removed entirely on June 6, 2026. This is a breaking change for any agent framework built on the Interactions API — plan migration tests before May 20 and confirm production is updated before June 6 to avoid outages. (Gemini API changelog)

May 6

OpenAI Agents SDK now available in TypeScript

The OpenAI Agents SDK has shipped a TypeScript implementation alongside the existing Python SDK, including sandbox agents and the open-source harness. Significant for Node/TS-native agent stacks — reduces friction for web apps integrating tool-using agents and standardizes harnessing/testing patterns. If you maintain agent infrastructure in TypeScript, evaluate parity with your Python setup before migrating production workloads. (OpenAI API changelog)

May 5

OpenAI chat-latest snapshot alias released

New moving-target alias that tracks the current Instant model used in ChatGPT. Useful for evaluating ChatGPT-style behavior and quick regression checks, but explicitly NOT for production — the underlying snapshot rotates without notice. For production, keep pinning specific snapshots (or use GPT-5.5 for the recommended frontier line). (OpenAI API changelog)

May 5

Google Gemini File Search adds multimodal support + better grounding

Gemini API File Search now supports image embedding and retrieval via gemini-embedding-2 — first-party image RAG without rolling your own pipeline. Grounding metadata also improved with new media_id and page_numbers fields, enabling proper image- and page-level citations in attributed-answer UIs. If you're using Gemini File Search for RAG, consider indexing images and surfacing the new metadata as inline citations. (Gemini API changelog)

May 4

Google Gemini API adds Webhooks for Batch / long-running operations

Event-driven completion notifications for the Batch API and other long-running operations — no more polling. Cuts idle compute and simplifies orchestration for async workloads (batch evals, offline RAG indexing, large multimodal processing). Worth migrating any polling-based batch workflows to webhooks. (Gemini API changelog)

Apr 24

OpenAI GPT-5.5 launched — frontier reasoning at 2x the price

GPT-5.5 (snapshot gpt-5.5-2026-04-23) is now available in the Responses and Chat Completions APIs. 1,050,000 token context window, 128K max output, Dec 1 2025 knowledge cutoff. Pricing: $5/1M input, $0.50 cached, $30/1M output (standard, <272K context). Prompts >272K input tokens are billed at 2x input / 1.5x output for the full session. Reasoning effort supports none/low/medium/high/xhigh. Marketed as OpenAI's strongest model for spreadsheet creation, polished frontend code, hard math, document understanding, tool use, and multi-source research. Roughly 2x the price of GPT-5.4 — community reports suggest output efficiency gains may partially offset the increase. Plus tier users hit token limits in minutes on heavy workloads, so budget carefully. (Introducing GPT-5.5, API docs)

Apr 24

OpenAI GPT-5.5 Pro released for highest-accuracy work

GPT-5.5 Pro is available in the Responses API (not Chat Completions). Pricing: $30/1M input, $180/1M output (standard, <272K context); $60/$270 per 1M for long context. Same 1M context window as GPT-5.5. Use case: high-stakes reasoning, agent orchestration, and tasks where accuracy beats latency or cost. At $180/1M output, reserve for jobs where a wrong answer is more expensive than the model. Same price as GPT-5.4 Pro — the upgrade is free at the Pro tier. (OpenAI pricing)

Apr 22

Google gemini-embedding-2 goes GA

gemini-embedding-2 is now generally available via the Gemini API. Production-ready multimodal embeddings for RAG, semantic search, and classification workloads — no longer preview-only.

Apr 22

OpenAI Legacy model snapshot deprecations

OpenAI added new deprecation entries for legacy model snapshots. Check the deprecations tab and update any pinned snapshot references in production code.

Apr 21

Google Deep Research Preview + Max Preview models

deep-research-preview-04-2026 and deep-research-max-preview-04-2026 released. New capabilities: collaborative planning, visualization support, MCP server integration, and File Search. Designed for multi-step research agents that need tool access and structured outputs.

Apr 21

OpenAI GPT Image 2 released (gpt-image-2)

OpenAI launched gpt-image-2 for /v1/images/generations and /v1/images/edits. Features token-based image pricing and Batch API support (50% discount). Replaces prior image generation endpoints for production use.

Apr 16

Anthropic Claude Opus 4.7 launched

Anthropic released Claude Opus 4.7 via the Claude Platform. Successor to Opus 4.6 with improved reasoning and coding performance.

Apr 15

Google Gemini 3.1 Flash TTS Preview (text-to-speech)

Google released Gemini 3.1 Flash TTS Preview, adding native text-to-speech capabilities to the Gemini API. Relevant for voice-first agents and multimodal applications.

Apr 15

OpenAI Agents SDK: sandbox execution, memory controls

OpenAI Agents SDK now supports running agents in controlled sandboxes for safer code execution, plus explicit controls over when memories are created and where they’re stored. Key for compliance, per-tenant isolation, and secure coding agents.

Apr 14

Google Gemini Robotics ER 1.6 Preview; 1.5 Preview shutdown Apr 30

gemini-robotics-er-1.6-preview released for embodied reasoning in robotics. gemini-robotics-er-1.5-preview shut down April 30, 2026 at 9AM PST — production workloads must use 1.6.

Source: Gemini API changelog

Apr 7

Anthropic Project Glasswing: Claude Mythos Preview (gated research)

Anthropic launched a gated research preview of Claude Mythos Preview for defensive cybersecurity via Project Glasswing. Pricing for participants: $25/MTok input, $125/MTok output. Available via Claude API, AWS Bedrock, Vertex AI, and Microsoft Foundry.

Apr 6

Anthropic Google + Broadcom compute partnership (multi-GW TPU, 2027+)

Anthropic expanded its partnership with Google and Broadcom for multiple gigawatts of next-generation TPU compute starting in 2027. Signals massive infrastructure buildout for future Claude models.

Apr 2

Google Gemma 4 models available via Gemini API

Gemma 4 open-weight models are now accessible through the Gemini API, expanding options for self-hosted and fine-tuned deployments.

Apr 1

Google New inference tiers: Flex and Priority

Google introduced Flex and Priority inference tiers for Gemini API, giving developers control over latency vs. cost tradeoffs in production workloads.

Mar 31

Google Veo 3.1 Lite Preview + gemini-2.5-flash-lite shutdown

Veo 3.1 Lite Preview (veo-3.1-lite-generate-preview) launched for cost-efficient video generation. gemini-2.5-flash-lite-preview-09-2025 has been shut down — migrate to gemini-3.1-flash-lite-preview.

Mar 26

Google Gemini 3.1 Flash Live Preview (real-time audio)

gemini-3.1-flash-live-preview enables audio-to-audio real-time conversations via the Live API. Key for agent-to-agent (A2A) and voice-first applications.

Mar 25

Google Lyria 3 music generation models

Lyria 3 models generate music from text and image prompts. Expands Gemini's multimodal capabilities into audio generation.

Mar 24

OpenAI Sora 2 models + Videos API deprecated (shutdown Sep 24)

OpenAI announced deprecation of Sora 2 model aliases/snapshots and the entire Videos API. Shutdown date: September 24, 2026. Plan migrations now if using Sora in production.

Mar 23

Google AI Studio billing: Prepay and Postpay plans

Google introduced Prepay and Postpay billing plans for AI Studio, giving developers more flexible payment options for Gemini API usage.

Mar 18

Google Built-in tools + function calling in one call; Maps grounding for Gemini 3

Gemini API now supports using Gemini’s built-in tools alongside custom function calling tools in a single request; Google Maps grounding is now supported for Gemini 3 models going forward.

Mar 17

OpenAI GPT-5.4 mini + nano released

GPT-5.4 mini brings GPT-5.4-class capability to high-volume workloads with tool search, computer use, and compaction; GPT-5.4 nano is optimized for simple, high-volume tasks and supports compaction only.

Mar 16

Google Revamped Usage Tiers & billing spend caps

Google introduced new usage tier structure with billing account spend caps for Gemini API. Better control for production workloads.

Mar 12

Google AI Studio project-level spend caps

Added project-level spend caps for Gemini API billing in AI Studio, enabling tighter cost control for production experimentation.

Mar 10

Google Multimodal embedding model: gemini-embedding-2-preview

First multimodal embedding model supporting text, image, video, audio, and PDF inputs in a unified embedding space. Major for multimodal RAG and unified retrieval.

Mar 10

Google gemini-2.5-flash-lite-preview shutdown Mar 31

Deprecated. Will be shut down March 31, 2026. Migrate to gemini-3.1-flash-lite-preview.

Mar 9

Google gemini-3-pro-preview alias redirected

Gemini 3 Pro Preview was shut down; the gemini-3-pro-preview alias now points to gemini-3.1-pro-preview.

Mar 12

OpenAI Sora API: 20s generations, 1080p, edits endpoint; remix deprecated

Sora API expanded with character references, longer generations up to 20s, 1080p for sora-2-pro (billed at $0.70/sec), Batch API support, and a new /v1/videos/edits endpoint. /v1/videos/{video_id}/remix will be deprecated in 6 months.

Mar 5

OpenAI GPT-5.4 launched

1.05M context, configurable 5-level reasoning, native computer-use API. First general-purpose model with built-in computer use. Priced at $2.50/$15 per 1M tokens. GPT-5.4 Pro at $30/$180.

Mar 4

OpenAI Legacy GPT-4 models deprecated Mar 26

gpt-4-0314, gpt-4-1106-preview, gpt-4-0125-preview, gpt-4-turbo-preview all sunset March 26, 2026. Requests auto-route to gpt-4.1.

Mar 3

Google Gemini 3.1 Flash-Lite Preview

First Flash-Lite model in the Gemini 3 series. 1M context, $0.25/$1.50 pricing. Ultra-efficient for high-volume, cost-sensitive workloads.

Feb 26

Google Nano Banana 2 image model + Gemini 3 Pro deprecated

Nano Banana 2 (gemini-3.1-flash-image-preview) optimized for speed. Gemini 3 Pro Preview shut down March 9.

Feb 19

Google Gemini 3.1 Pro launched

Frontier reasoning model. 1M context, $2/$12 per 1M tokens. Medium thinking level for cost/speed/performance balance. Enhanced agentic coding and multimodal analysis.

Feb 17

Anthropic Claude Sonnet 4.6 launched

Most capable Sonnet yet. 1M context (beta), 128K max output, improved coding + computer use + long-context reasoning. $3/$15 per 1M tokens. Approaches Opus intelligence at Sonnet pricing.

Feb 13

OpenAI GPT-4o, GPT-4.1, o4-mini retired from ChatGPT

GPT-4o, GPT-4.1, GPT-4.1 mini, o4-mini, and GPT-5 (Instant/Thinking) removed from ChatGPT. Still available in API. GPT-4o in Custom GPTs until March 31.

Feb 12

Google Gemini 3 Deep Think major upgrade

Specialized reasoning mode upgraded to solve modern science, research, and engineering challenges. Available to Google AI Ultra subscribers.

Side-by-Side Comparison

What Matters For Your Stack

⚙ Coding

🤖 Agents

📚 RAG

🎨 Multimodal Apps

Running Changelog

Upcoming Deprecations & Deadlines

OpenAI — May 12, 2026 (SHUT DOWN)

Google — May 20, 2026 (BREAKING)

Google — May 25, 2026

OpenAI — July 2, 2026 → Jan 6, 2027

OpenAI — March 26, 2026

OpenAI — March 31, 2026

Google — April 30, 2026

Google — March 31, 2026

OpenAI — September 24, 2026

OpenAI — May 7, 2026

OpenAI — 2026 (date TBD)