SRE-bench

We are building an SRE agent benchmark inspired by SWE-bench - an open and reproducible framework designed to evaluate agents on Kubernetes tasks: incident response, infra changes, observability triage, and reliability improvements. The repo will host modular scenarios (fault injectors, manifests, observability specs), an evaluation harness, and baseline agents.

The goal is to measure practical agent capabilities like time-to-diagnose, safe remediation rate, MTTR, and explainability.

Purpose

This repository also serves as:

Benchmarking platform for evaluating SRE agent performance
Agentkube POC environment for testing autonomous Kubernetes agents
Community-driven scenario library - users can contribute diverse scenarios to test their own agents

See scenario documentation for available test cases and contribution guidelines.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
docs		docs
manifests		manifests
scenerio		scenerio
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SRE-bench

Purpose

About

Uh oh!

Releases

Packages

Languages

License

agentkube/SRE-bench

Folders and files

Latest commit

History

Repository files navigation

SRE-bench

Purpose

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages