Skip to content

Claude Opus 4.6 and MathArena #25

@Oktai15

Description

@Oktai15

According to Epoch.AI, Claude Opus 4.6 is the first model by Anthropic that works pretty good on hard math tasks (see https://x.com/epochairesearch/status/2019852613672665193).

Is it possible to add Claude Opus 4.6 to some recent benchmarks (Project Euler, ArxivMath, Apex, Kanagroo 2025)? For some benchmarks you added previous models of Anthropic, but, as far as I see, you don’t add their new models anymore.

Anthropic is one of the industry leaders, it would be cool to see the results for their best model also, as you support it for OpenAI/Google and open-source models if you have enough credits for it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions