AI Model Tracker

GPT-5.5 · Claude Sonnet 4.6 · Gemini 3.1 Pro

Updated: May 20, 2026
OpenAI
GPT-5.5
Released: April 24, 2026
Context1.05M tokens
Input$5.00 / 1M
Output$30.00 / 1M
Cached input$0.50 / 1M
ModalityText + Image in
Reasoning5-level (none→xhigh)
Max output128K tokens
Knowledge cutoffDec 1, 2025
Anthropic
Claude Sonnet 4.6
Released: February 17, 2026
Context1M tokens (beta)
Input$3.00 / 1M
Output$15.00 / 1M
Max output128K tokens
ModalityText + Image in
ReasoningHybrid (extended thinking)
Computer useYes (improved)
Caching savingsUp to 90%
Google
Gemini 3.1 Pro
Released: February 19, 2026
Context1M tokens
Input (≤200K)$2.00 / 1M
Output (≤200K)$12.00 / 1M
Input (>200K)$4.00 / 1M
ModalityText, Image, Video, Audio
Reasoning3 thinking levels
ARC-AGI-277.1%
Max output64K tokens

Side-by-Side Comparison

Dimension GPT-5.4 Claude Sonnet 4.6 Gemini 3.1 Pro
Release date Mar 5, 2026 Feb 17, 2026 Feb 19, 2026
Context window 1.05M tokens 1M (beta, API only) 1M tokens
Max output Not disclosed 128K tokens 64K tokens
Input price / 1M $2.50 $3.00 $2.00
Output price / 1M $15.00 $15.00 $12.00
Cached input price $0.25 / 1M 90% savings (est. ~$0.30) Supported (tiered)
Input modalities Text, Image Text, Image Text, Image, Video, Audio
Reasoning control 5-level configurable Hybrid extended thinking 3 thinking levels
Computer use Native API Improved in 4.6 Not available
SWE-bench Verified ~80.0% Approaches Opus-level Not reported
ARC-AGI-2 Not reported Not reported 77.1%
GPQA (reasoning) 93.2% (Pro variant) Not reported Not reported
Batch pricing 50% off standard 50% off standard Available
Availability API + ChatGPT Pro/Plus API + Claude.ai + Bedrock + Vertex + Foundry API + Gemini App + Vertex
Pro / Premium tier $30 / $180 per 1M (GPT-5.4 Pro) $15 / $75 per 1M (Opus 4.6) Deep Think (Ultra subs)

What Matters For Your Stack

Coding

Best pick: GPT-5.4 for complex, multi-file codebases. ~80% SWE-bench and 5-level reasoning control let you dial cost vs. quality per request. Sonnet 4.6 is the value play at $3 input — devs in early access often preferred it over Opus for frontend and financial code. Gemini 3.1 Pro is strong for agentic coding workflows with its efficient token usage.

🤖 Agents

Best pick: GPT-5.4 leads with native computer-use API and configurable reasoning (dial down for fast tool-calling loops, dial up for planning). Sonnet 4.6 has improved computer use and agent planning — great for multi-step browser automation. Gemini 3.1 Pro excels at long-horizon tool orchestration with its medium thinking mode.

📚 RAG

Best pick: Gemini 3.1 Pro — cheapest per-token ($2/$12) with a full 1M context window and native multimodal grounding (text, image, video, audio). Great for document-heavy pipelines. GPT-5.4 at 1.05M context is the largest window but costs more. Sonnet 4.6's 128K max output is ideal when you need long-form synthesis from retrieved docs.

🎨 Multimodal Apps

Best pick: Gemini 3.1 Pro is the clear leader — native text, image, video, and audio input with unified embeddings (gemini-embedding-2). GPT-5.4 handles text + image but no video/audio. Sonnet 4.6 is text + image only. For multimodal RAG or video analysis, Gemini is the only real option at the frontier.

Running Changelog

May 19
Google gemini-3.5-flash GA + Managed Agents preview + Antigravity Agent
Three releases at once. gemini-3.5-flash is GA — next-gen mid-tier model positioned between Flash-Lite and 3.1 Pro for general coding/RAG/agent work. Managed Agents (public preview) runs autonomous, stateful agents inside secure Google-hosted Linux sandboxes — Google operates the runtime, you orchestrate. antigravity-preview-05-2026 is a Managed Agent that can plan, code, manage files, and browse the web inside the sandbox. Together this is Google's enterprise answer to OpenAI Agents SDK and Anthropic Managed Agents — fully hosted, no sandbox infrastructure to run. Worth evaluating against the OpenAI/Anthropic stacks if you've been blocked on agent runtime ops. (Gemini API changelog)
May 19
OpenAI Secure MCP Tunnel GA for enterprise
Account-led GA of Secure MCP Tunnel — lets ChatGPT web, Codex, the Responses API, and AgentKit connect to private/on-prem MCP servers via a customer-hosted tunnel client. No public exposure of internal MCP endpoints, no inbound firewall holes, and works across the full OpenAI agent surface. Material unlock for regulated industries (finance, healthcare, gov) that couldn't expose internal tools to public AI APIs. Pair with self-hosted MCP servers for the most secure tool-use posture available from OpenAI today. (OpenAI API changelog)
May 19
Anthropic Claude Managed Agents: self-hosted sandboxes + MCP tunnels
Two major adds to Claude Managed Agents. Self-hosted sandboxes (public beta): tool execution runs in your infrastructure (or via Cloudflare / Daytona / Modal / Vercel) while orchestration stays on Anthropic — data and code never leave customer infra. MCP tunnels (research preview): lightweight gateway uses a single outbound connection (no inbound firewall rules, no public endpoints, encrypted end-to-end) and works in both Managed Agents and the Messages API. Same enterprise security pattern OpenAI shipped today — the three vendors are now competing on identical agent-runtime feature surfaces. (Claude Managed Agents update)
May 12
OpenAI DALL·E 2 and DALL·E 3 snapshots shut down today
All dall-e-2 and dall-e-3 snapshots are shut down as of May 12, 2026. Migrate any remaining image-generation calls to gpt-image-1 (frontier quality) or gpt-image-1-mini (cost/latency-optimized). Anything still pointing at DALL·E snapshots will now error — audit production code, marketing pipelines, and any user-facing image features. (OpenAI deprecations)
May 11
Anthropic Claude Platform launches on AWS
Anthropic announced Claude Platform on AWS — native AWS billing and IAM, with Messages, Files, Batches, Managed Agents, Skills, code execution, and tool use all supported. Big unlock for AWS-native shops: consolidated invoicing, IAM-scoped Claude access, and no separate vendor onboarding. Same Claude models (Sonnet 4.6, Opus 4.6/4.7) under the hood. Worth evaluating if you've been routing around procurement to use Anthropic direct. (Claude API release notes)
May 11
OpenAI return_token_budget for Responses API web search
New return_token_budget parameter on the Responses API web search tool allows GPT-5+ to spend more tokens reasoning over web search results before returning an answer. Directly relevant for deep-research agents and any RAG-over-web pipeline where the model was previously cutting reasoning short. Tune higher for harder multi-source questions; expect higher latency and token cost in exchange for answer quality. (OpenAI API changelog)
May 7
OpenAI Realtime 2 + Realtime Translate + Realtime Whisper released
Three new realtime/voice models for streaming speech agents. Realtime 2 is a speech-to-speech model with configurable reasoning. Realtime Translate handles streaming speech translation. Realtime Whisper is streaming speech-to-text. Together they round out the voice-agent stack — directly relevant for call-center bots, meeting assistants, and real-time copilots where latency and translation quality matter. (OpenAI API changelog)
May 7
OpenAI Self-serve fine-tuning eligibility tightened — sunset by Jan 6, 2027
OpenAI is winding down self-serve fine-tuning. Effective May 7, 2026: orgs that have never run fine-tuning can no longer create training jobs. Effective July 2, 2026: orgs that haven't run inference on a fine-tuned model in the past 60 days can no longer create new jobs. Hard cutoff Jan 6, 2027: even active customers can't create new fine-tuning jobs (inference continues until the base model is deprecated). If you depend on FT for task-specific coding/RAG models, schedule periodic inference to keep eligibility, or move to alternative customization (RFT, prompting, distillation) before July 2. (OpenAI deprecations)
May 7
Google gemini-3.1-flash-lite goes GA; preview shuts down May 25
gemini-3.1-flash-lite is now GA — Google's speed/scale/cost workhorse for high-throughput RAG, embedding pipelines, and agent tool-routing where cost and latency dominate over peak quality. Migration timeline: gemini-3.1-flash-lite-preview deprecates May 11 and shuts down May 25. If you're still on the preview SKU, switch to the GA name within ~2 weeks to avoid outage. (Gemini API changelog)
May 7
OpenAI OpenAI Developers plugin for Codex
A small but useful plugin for Codex that surfaces Platform access and API setup guidance directly inside the coding workflow. No API breaking changes — just smoother onboarding for devs building against OpenAI inside Codex. (OpenAI API changelog)
May 6
Anthropic Claude Opus API rate limits raised (effective immediately)
Anthropic announced significant API rate limit increases for Claude Opus models, effective immediately. Not a model spec or pricing change, but it directly reduces throttling pain for high-volume Opus workloads (long-context coding agents, document understanding, agent orchestration). Doesn't change Sonnet 4.6 limits. (Anthropic news)
May 6
Google Interactions API schema change — breaking, effective May 20
Gemini Interactions API is renaming outputs to steps and changing response_format configuration. The new schema becomes default on May 20, 2026, and the legacy schema is removed entirely on June 6, 2026. This is a breaking change for any agent framework built on the Interactions API — plan migration tests before May 20 and confirm production is updated before June 6 to avoid outages. (Gemini API changelog)
May 6
OpenAI Agents SDK now available in TypeScript
The OpenAI Agents SDK has shipped a TypeScript implementation alongside the existing Python SDK, including sandbox agents and the open-source harness. Significant for Node/TS-native agent stacks — reduces friction for web apps integrating tool-using agents and standardizes harnessing/testing patterns. If you maintain agent infrastructure in TypeScript, evaluate parity with your Python setup before migrating production workloads. (OpenAI API changelog)
May 5
OpenAI chat-latest snapshot alias released
New moving-target alias that tracks the current Instant model used in ChatGPT. Useful for evaluating ChatGPT-style behavior and quick regression checks, but explicitly NOT for production — the underlying snapshot rotates without notice. For production, keep pinning specific snapshots (or use GPT-5.5 for the recommended frontier line). (OpenAI API changelog)
May 5
Google Gemini File Search adds multimodal support + better grounding
Gemini API File Search now supports image embedding and retrieval via gemini-embedding-2 — first-party image RAG without rolling your own pipeline. Grounding metadata also improved with new media_id and page_numbers fields, enabling proper image- and page-level citations in attributed-answer UIs. If you're using Gemini File Search for RAG, consider indexing images and surfacing the new metadata as inline citations. (Gemini API changelog)
May 4
Google Gemini API adds Webhooks for Batch / long-running operations
Event-driven completion notifications for the Batch API and other long-running operations — no more polling. Cuts idle compute and simplifies orchestration for async workloads (batch evals, offline RAG indexing, large multimodal processing). Worth migrating any polling-based batch workflows to webhooks. (Gemini API changelog)
Apr 24
OpenAI GPT-5.5 launched — frontier reasoning at 2x the price
GPT-5.5 (snapshot gpt-5.5-2026-04-23) is now available in the Responses and Chat Completions APIs. 1,050,000 token context window, 128K max output, Dec 1 2025 knowledge cutoff. Pricing: $5/1M input, $0.50 cached, $30/1M output (standard, <272K context). Prompts >272K input tokens are billed at 2x input / 1.5x output for the full session. Reasoning effort supports none/low/medium/high/xhigh. Marketed as OpenAI's strongest model for spreadsheet creation, polished frontend code, hard math, document understanding, tool use, and multi-source research. Roughly 2x the price of GPT-5.4 — community reports suggest output efficiency gains may partially offset the increase. Plus tier users hit token limits in minutes on heavy workloads, so budget carefully. (Introducing GPT-5.5, API docs)
Apr 24
OpenAI GPT-5.5 Pro released for highest-accuracy work
GPT-5.5 Pro is available in the Responses API (not Chat Completions). Pricing: $30/1M input, $180/1M output (standard, <272K context); $60/$270 per 1M for long context. Same 1M context window as GPT-5.5. Use case: high-stakes reasoning, agent orchestration, and tasks where accuracy beats latency or cost. At $180/1M output, reserve for jobs where a wrong answer is more expensive than the model. Same price as GPT-5.4 Pro — the upgrade is free at the Pro tier. (OpenAI pricing)
Apr 22
Google gemini-embedding-2 goes GA
gemini-embedding-2 is now generally available via the Gemini API. Production-ready multimodal embeddings for RAG, semantic search, and classification workloads — no longer preview-only.
Apr 22
OpenAI Legacy model snapshot deprecations
OpenAI added new deprecation entries for legacy model snapshots. Check the deprecations tab and update any pinned snapshot references in production code.
Apr 21
Google Deep Research Preview + Max Preview models
deep-research-preview-04-2026 and deep-research-max-preview-04-2026 released. New capabilities: collaborative planning, visualization support, MCP server integration, and File Search. Designed for multi-step research agents that need tool access and structured outputs.
Apr 21
OpenAI GPT Image 2 released (gpt-image-2)
OpenAI launched gpt-image-2 for /v1/images/generations and /v1/images/edits. Features token-based image pricing and Batch API support (50% discount). Replaces prior image generation endpoints for production use.
Apr 16
Anthropic Claude Opus 4.7 launched
Anthropic released Claude Opus 4.7 via the Claude Platform. Successor to Opus 4.6 with improved reasoning and coding performance.
Apr 15
Google Gemini 3.1 Flash TTS Preview (text-to-speech)
Google released Gemini 3.1 Flash TTS Preview, adding native text-to-speech capabilities to the Gemini API. Relevant for voice-first agents and multimodal applications.
Apr 15
OpenAI Agents SDK: sandbox execution, memory controls
OpenAI Agents SDK now supports running agents in controlled sandboxes for safer code execution, plus explicit controls over when memories are created and where they’re stored. Key for compliance, per-tenant isolation, and secure coding agents.
Apr 14
Google Gemini Robotics ER 1.6 Preview; 1.5 Preview shutdown Apr 30
gemini-robotics-er-1.6-preview released for embodied reasoning in robotics. gemini-robotics-er-1.5-preview shut down April 30, 2026 at 9AM PST — production workloads must use 1.6.
Apr 7
Anthropic Project Glasswing: Claude Mythos Preview (gated research)
Anthropic launched a gated research preview of Claude Mythos Preview for defensive cybersecurity via Project Glasswing. Pricing for participants: $25/MTok input, $125/MTok output. Available via Claude API, AWS Bedrock, Vertex AI, and Microsoft Foundry.
Apr 6
Anthropic Google + Broadcom compute partnership (multi-GW TPU, 2027+)
Anthropic expanded its partnership with Google and Broadcom for multiple gigawatts of next-generation TPU compute starting in 2027. Signals massive infrastructure buildout for future Claude models.
Apr 2
Google Gemma 4 models available via Gemini API
Gemma 4 open-weight models are now accessible through the Gemini API, expanding options for self-hosted and fine-tuned deployments.
Apr 1
Google New inference tiers: Flex and Priority
Google introduced Flex and Priority inference tiers for Gemini API, giving developers control over latency vs. cost tradeoffs in production workloads.
Mar 31
Google Veo 3.1 Lite Preview + gemini-2.5-flash-lite shutdown
Veo 3.1 Lite Preview (veo-3.1-lite-generate-preview) launched for cost-efficient video generation. gemini-2.5-flash-lite-preview-09-2025 has been shut down — migrate to gemini-3.1-flash-lite-preview.
Mar 26
Google Gemini 3.1 Flash Live Preview (real-time audio)
gemini-3.1-flash-live-preview enables audio-to-audio real-time conversations via the Live API. Key for agent-to-agent (A2A) and voice-first applications.
Mar 25
Google Lyria 3 music generation models
Lyria 3 models generate music from text and image prompts. Expands Gemini's multimodal capabilities into audio generation.
Mar 24
OpenAI Sora 2 models + Videos API deprecated (shutdown Sep 24)
OpenAI announced deprecation of Sora 2 model aliases/snapshots and the entire Videos API. Shutdown date: September 24, 2026. Plan migrations now if using Sora in production.
Mar 23
Google AI Studio billing: Prepay and Postpay plans
Google introduced Prepay and Postpay billing plans for AI Studio, giving developers more flexible payment options for Gemini API usage.
Mar 18
Google Built-in tools + function calling in one call; Maps grounding for Gemini 3
Gemini API now supports using Gemini’s built-in tools alongside custom function calling tools in a single request; Google Maps grounding is now supported for Gemini 3 models going forward.
Mar 17
OpenAI GPT-5.4 mini + nano released
GPT-5.4 mini brings GPT-5.4-class capability to high-volume workloads with tool search, computer use, and compaction; GPT-5.4 nano is optimized for simple, high-volume tasks and supports compaction only.
Mar 16
Google Revamped Usage Tiers & billing spend caps
Google introduced new usage tier structure with billing account spend caps for Gemini API. Better control for production workloads.
Mar 12
Google AI Studio project-level spend caps
Added project-level spend caps for Gemini API billing in AI Studio, enabling tighter cost control for production experimentation.
Mar 10
Google Multimodal embedding model: gemini-embedding-2-preview
First multimodal embedding model supporting text, image, video, audio, and PDF inputs in a unified embedding space. Major for multimodal RAG and unified retrieval.
Mar 10
Google gemini-2.5-flash-lite-preview shutdown Mar 31
Deprecated. Will be shut down March 31, 2026. Migrate to gemini-3.1-flash-lite-preview.
Mar 9
Google gemini-3-pro-preview alias redirected
Gemini 3 Pro Preview was shut down; the gemini-3-pro-preview alias now points to gemini-3.1-pro-preview.
Mar 12
OpenAI Sora API: 20s generations, 1080p, edits endpoint; remix deprecated
Sora API expanded with character references, longer generations up to 20s, 1080p for sora-2-pro (billed at $0.70/sec), Batch API support, and a new /v1/videos/edits endpoint. /v1/videos/{video_id}/remix will be deprecated in 6 months.
Mar 5
OpenAI GPT-5.4 launched
1.05M context, configurable 5-level reasoning, native computer-use API. First general-purpose model with built-in computer use. Priced at $2.50/$15 per 1M tokens. GPT-5.4 Pro at $30/$180.
Mar 4
OpenAI Legacy GPT-4 models deprecated Mar 26
gpt-4-0314, gpt-4-1106-preview, gpt-4-0125-preview, gpt-4-turbo-preview all sunset March 26, 2026. Requests auto-route to gpt-4.1.
Mar 3
Google Gemini 3.1 Flash-Lite Preview
First Flash-Lite model in the Gemini 3 series. 1M context, $0.25/$1.50 pricing. Ultra-efficient for high-volume, cost-sensitive workloads.
Feb 26
Google Nano Banana 2 image model + Gemini 3 Pro deprecated
Nano Banana 2 (gemini-3.1-flash-image-preview) optimized for speed. Gemini 3 Pro Preview shut down March 9.
Feb 19
Google Gemini 3.1 Pro launched
Frontier reasoning model. 1M context, $2/$12 per 1M tokens. Medium thinking level for cost/speed/performance balance. Enhanced agentic coding and multimodal analysis.
Feb 17
Anthropic Claude Sonnet 4.6 launched
Most capable Sonnet yet. 1M context (beta), 128K max output, improved coding + computer use + long-context reasoning. $3/$15 per 1M tokens. Approaches Opus intelligence at Sonnet pricing.
Feb 13
OpenAI GPT-4o, GPT-4.1, o4-mini retired from ChatGPT
GPT-4o, GPT-4.1, GPT-4.1 mini, o4-mini, and GPT-5 (Instant/Thinking) removed from ChatGPT. Still available in API. GPT-4o in Custom GPTs until March 31.
Feb 12
Google Gemini 3 Deep Think major upgrade
Specialized reasoning mode upgraded to solve modern science, research, and engineering challenges. Available to Google AI Ultra subscribers.

Upcoming Deprecations & Deadlines

OpenAI — May 12, 2026 (SHUT DOWN)

  • dall-e-2 and dall-e-3 snapshots — shut down May 12, 2026. Migrate to gpt-image-1 or gpt-image-1-mini. (OpenAI deprecations)

Google — May 20, 2026 (BREAKING)

  • Gemini Interactions API schema change — new schema (outputssteps, updated response_format) becomes default May 20, 2026. Legacy schema removed June 6, 2026. Update agent frameworks before then. (Gemini API changelog)

Google — May 25, 2026

  • gemini-3.1-flash-lite-preview — deprecates May 11, shuts down May 25. Migrate to GA gemini-3.1-flash-lite (Gemini API changelog)

OpenAI — July 2, 2026 → Jan 6, 2027

  • Self-serve fine-tuning eligibility — May 7, 2026: orgs that never fine-tuned can’t create new jobs. July 2, 2026: orgs without FT inference in past 60 days lose access. Jan 6, 2027: all customers blocked from creating new fine-tuning jobs (existing inference continues until base model deprecates). (OpenAI deprecations)

OpenAI — March 26, 2026

  • gpt-4-0314 — sunset, auto-routes to gpt-4.1
  • gpt-4-1106-preview — sunset, auto-routes to gpt-4.1
  • gpt-4-0125-preview — sunset, auto-routes to gpt-4.1
  • gpt-4-turbo-preview — sunset, auto-routes to gpt-4.1

OpenAI — March 31, 2026

  • GPT-4o in Custom GPTs — fully retired across Business, Enterprise, Edu plans

Google — April 30, 2026

  • gemini-robotics-er-1.5-preview — shut down Apr 30, 2026 at 9AM PST (now offline). Migrate to gemini-robotics-er-1.6-preview (Gemini API changelog)

Google — March 31, 2026

  • gemini-2.5-flash-lite-preview-09-2025 — shutdown. Migrate to gemini-3.1-flash-lite-preview

OpenAI — September 24, 2026

  • Sora 2 models — all Sora 2 aliases and snapshots deprecated. Full shutdown Sep 24, 2026
  • Videos API (/v1/videos/*) — entire endpoint being removed. Migrate video generation workflows

OpenAI — May 7, 2026

  • Realtime API Beta — deprecated and will be removed. Migrate to production Realtime API

OpenAI — 2026 (date TBD)

  • Assistants API — being replaced by Responses API. Sunset date in 2026