-
Notifications
You must be signed in to change notification settings - Fork 26
Open
Description
According to Epoch.AI, Claude Opus 4.6 is the first model by Anthropic that works pretty good on hard math tasks (see https://x.com/epochairesearch/status/2019852613672665193).
Is it possible to add Claude Opus 4.6 to some recent benchmarks (Project Euler, ArxivMath, Apex, Kanagroo 2025)? For some benchmarks you added previous models of Anthropic, but, as far as I see, you don’t add their new models anymore.
Anthropic is one of the industry leaders, it would be cool to see the results for their best model also, as you support it for OpenAI/Google and open-source models if you have enough credits for it.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels