Claude Opus 4.6 and MathArena

According to Epoch.AI, Claude Opus 4.6 is the first model by Anthropic that works pretty good on hard math tasks (see https://x.com/epochairesearch/status/2019852613672665193).

Is it possible to add Claude Opus 4.6 to some recent benchmarks (Project Euler, ArxivMath, Apex, Kanagroo 2025)? For some benchmarks you added previous models of Anthropic, but, as far as I see, you don’t add their new models anymore.

Anthropic is one of the industry leaders, it would be cool to see the results for their best model also, as you support it for OpenAI/Google and open-source models if you have enough credits for it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Claude Opus 4.6 and MathArena #25

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Claude Opus 4.6 and MathArena #25

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions