Skip to content

Popular repositories Loading

  1. skillsbench skillsbench Public

    SkillsBench evaluates how well skills work and how effective agents are at using them

    PDDL 278 174

  2. benchflow benchflow Public

    AI benchmark runtime framework that allows you to integrate and evaluate AI tasks using Docker-based benchmarks.

    Python 178 16

  3. pokemon-gym pokemon-gym Public

    Python 88 7

  4. jfkarena jfkarena Public

    TypeScript 7

  5. llm-builds-linux llm-builds-linux Public

    Python 6 1

  6. paperbench paperbench Public

    Python 5 1

Repositories

Showing 7 of 7 repositories
  • skillsbench Public

    SkillsBench evaluates how well skills work and how effective agents are at using them

    benchflow-ai/skillsbench’s past year of commit activity
    PDDL 278 Apache-2.0 174 15 212 Updated Jan 29, 2026
  • benchflow-ai/llm-builds-linux’s past year of commit activity
    Python 6 1 0 8 Updated Dec 20, 2025
  • benchflow Public

    AI benchmark runtime framework that allows you to integrate and evaluate AI tasks using Docker-based benchmarks.

    benchflow-ai/benchflow’s past year of commit activity
    Python 178 MIT 16 0 0 Updated Dec 19, 2025
  • pokemon-gym Public
    benchflow-ai/pokemon-gym’s past year of commit activity
    Python 88 7 0 0 Updated Jun 30, 2025
  • paperbench Public
    benchflow-ai/paperbench’s past year of commit activity
    Python 5 MIT 1 0 0 Updated Apr 15, 2025
  • jfkarena Public
    benchflow-ai/jfkarena’s past year of commit activity
    TypeScript 7 0 0 0 Updated Apr 1, 2025
  • jfk-ocr-demo Public
    benchflow-ai/jfk-ocr-demo’s past year of commit activity
    Python 0 0 0 0 Updated Mar 25, 2025

Most used topics

Loading…