GPT-5.5 · Claude Sonnet 4.6 · Gemini 3.1 Pro
| Dimension | GPT-5.4 | Claude Sonnet 4.6 | Gemini 3.1 Pro |
|---|---|---|---|
| Release date | Mar 5, 2026 | Feb 17, 2026 | Feb 19, 2026 |
| Context window | 1.05M tokens | 1M (beta, API only) | 1M tokens |
| Max output | Not disclosed | 128K tokens | 64K tokens |
| Input price / 1M | $2.50 | $3.00 | $2.00 |
| Output price / 1M | $15.00 | $15.00 | $12.00 |
| Cached input price | $0.25 / 1M | 90% savings (est. ~$0.30) | Supported (tiered) |
| Input modalities | Text, Image | Text, Image | Text, Image, Video, Audio |
| Reasoning control | 5-level configurable | Hybrid extended thinking | 3 thinking levels |
| Computer use | Native API | Improved in 4.6 | Not available |
| SWE-bench Verified | ~80.0% | Approaches Opus-level | Not reported |
| ARC-AGI-2 | Not reported | Not reported | 77.1% |
| GPQA (reasoning) | 93.2% (Pro variant) | Not reported | Not reported |
| Batch pricing | 50% off standard | 50% off standard | Available |
| Availability | API + ChatGPT Pro/Plus | API + Claude.ai + Bedrock + Vertex + Foundry | API + Gemini App + Vertex |
| Pro / Premium tier | $30 / $180 per 1M (GPT-5.4 Pro) | $15 / $75 per 1M (Opus 4.6) | Deep Think (Ultra subs) |
Best pick: GPT-5.4 for complex, multi-file codebases. ~80% SWE-bench and 5-level reasoning control let you dial cost vs. quality per request. Sonnet 4.6 is the value play at $3 input — devs in early access often preferred it over Opus for frontend and financial code. Gemini 3.1 Pro is strong for agentic coding workflows with its efficient token usage.
Best pick: GPT-5.4 leads with native computer-use API and configurable reasoning (dial down for fast tool-calling loops, dial up for planning). Sonnet 4.6 has improved computer use and agent planning — great for multi-step browser automation. Gemini 3.1 Pro excels at long-horizon tool orchestration with its medium thinking mode.
Best pick: Gemini 3.1 Pro — cheapest per-token ($2/$12) with a full 1M context window and native multimodal grounding (text, image, video, audio). Great for document-heavy pipelines. GPT-5.4 at 1.05M context is the largest window but costs more. Sonnet 4.6's 128K max output is ideal when you need long-form synthesis from retrieved docs.
Best pick: Gemini 3.1 Pro is the clear leader — native text, image, video, and audio input with unified embeddings (gemini-embedding-2). GPT-5.4 handles text + image but no video/audio. Sonnet 4.6 is text + image only. For multimodal RAG or video analysis, Gemini is the only real option at the frontier.
dall-e-2 and dall-e-3 snapshots are shut down as of May 12, 2026. Migrate any remaining image-generation calls to gpt-image-1 (frontier quality) or gpt-image-1-mini (cost/latency-optimized). Anything still pointing at DALL·E snapshots will now error — audit production code, marketing pipelines, and any user-facing image features. (OpenAI deprecations)return_token_budget parameter on the Responses API web search tool allows GPT-5+ to spend more tokens reasoning over web search results before returning an answer. Directly relevant for deep-research agents and any RAG-over-web pipeline where the model was previously cutting reasoning short. Tune higher for harder multi-source questions; expect higher latency and token cost in exchange for answer quality. (OpenAI API changelog)outputs to steps and changing response_format configuration. The new schema becomes default on May 20, 2026, and the legacy schema is removed entirely on June 6, 2026. This is a breaking change for any agent framework built on the Interactions API — plan migration tests before May 20 and confirm production is updated before June 6 to avoid outages. (Gemini API changelog)media_id and page_numbers fields, enabling proper image- and page-level citations in attributed-answer UIs. If you're using Gemini File Search for RAG, consider indexing images and surfacing the new metadata as inline citations. (Gemini API changelog)