OpenAI latest flagship, GPQA 92.4%, AIME 100%, First ARC-AGI 90%+
Strengths
Math 100Reasoning 98Coding 95
Best For
Complex ReasoningMath ProofLong Context
Avoid
Budget SensitiveSimple Tasks
400K Context$1.75/1M input, $14/1M output
Extended reasoning version with deeper thinking capabilities
Strengths
Math 100Reasoning 99Coding 96
Best For
Extreme ReasoningScientific ResearchComplex Planning
Avoid
Real-time ChatCost Sensitive
400K ContextHigher (xhigh reasoning tier)
Previous flagship, still powerful with better cost-performance
Strengths
Math 94Reasoning 92Multimodal 92
Best For
General PurposeDaily CodingMultimodal
Avoid
Cutting-edge Reasoning
256K Context$1.25/1M input, $10/1M output
First to reach SWE-bench 80%+, top-tier coding and reasoning
Strengths
Coding 99Reasoning 98Math 96
Best For
Complex CodingSystem ArchitectureDeep Reasoning
Avoid
Simple TasksHigh Frequency
200K Context$15/1M input, $75/1M output
Claude Sonnet 4.5
Anthropic
Best coding model, SWE-bench 77.2%, Top choice for Agents and Computer Use
Strengths
Coding 98Reasoning 95Math 94
Best For
Coding TasksAgent DevelopmentComputer Use
Avoid
Extremely Budget Sensitive
200K Context$3/1M input, $15/1M output
Reliable coding model with excellent cost-performance
Strengths
Coding 94Reasoning 92Creativity 90
Best For
Daily CodingDocument ProcessingStability Priority
Avoid
Cutting-edge Capability Needs
200K Context$3/1M input, $15/1M output
Google strongest model, leading multimodal and reasoning, 1M context
Strengths
Context 100Multimodal 98Reasoning 96
Best For
Multimodal TasksVideo UnderstandingLong Documents
Avoid
Low Latency Scenarios
1M Context$1.25/1M input, $5/1M output
Ultra-fast response, super high cost-performance, 1M context
Strengths
Context 100Speed 98Value 95
Best For
High ConcurrencyReal-time AppsBatch Processing
Avoid
Deep ReasoningComplex Coding
1M Context$0.075/1M input, $0.3/1M output
xAI latest flagship, GPQA 87.5%, real-time information
Strengths
Reasoning 92Creativity 92Math 90
Best For
Real-time InfoCreative ChatSocial Analysis
Avoid
Enterprise ComplianceCost Sensitive
128K Context$5/1M input, $15/1M output
King of open source, coding capability near top-tier, extremely low cost
Strengths
Value 98Coding 92Math 90
Best For
Budget SensitiveCoding TasksMath
Avoid
MultimodalCreative Writing
128K Context$0.27/1M input, $1.1/1M output
Open source reasoning model with strong chain-of-thought
Strengths
Math 94Reasoning 92Value 92
Best For
Complex ReasoningMath ProofCoding Problems
Avoid
MultimodalCreative Tasks
128K Context$0.55/1M input, $2.2/1M output
Multimodal flagship, strong audio/video understanding
Strengths
Multimodal 95Creativity 88Speed 88
Best For
Multimodal TasksReal-time ChatImage Understanding
Avoid
Text-only Deep ReasoningCutting-edge Coding
128K Context$2.5/1M input, $10/1M output
Lightweight and fast, high value for simple tasks
Strengths
Speed 95Value 92Multimodal 82
Best For
Simple TasksHigh ConcurrencyClassification/Extraction
Avoid
Complex ReasoningProfessional Coding
128K Context$0.15/1M input, $0.6/1M output
Claude 3.5 Sonnet
Anthropic
Classic value choice, stable and reliable
Strengths
Context 90Coding 88Reasoning 85
Best For
Daily CodingDocument ProcessingHigh Stability Requirements
Avoid
Cutting-edge Capability Needs
200K Context$3/1M input, $15/1M output
Meta latest open source, significant multimodal improvements
Strengths
Value 100Multimodal 88Coding 86
Best For
Local DeploymentPrivacy SensitiveMultimodal
Avoid
Cutting-edge Performance Needs
128K ContextFree Open Source
Alibaba latest flagship, strongest Chinese and coding capabilities
Strengths
Value 98Coding 92Math 92
Best For
Chinese ScenariosCodingMath
Avoid
English Creative Writing
128K ContextFree Open Source
European open source flagship, strong multilingual and compliance
Strengths
Reasoning 86Coding 85Math 84
Best For
European ComplianceMultilingualLocal Deployment
Avoid
Ultra-long ContextChinese Scenarios
128K Context$2/1M input, $6/1M output