This repository is my COMP 4905 Honours Project, for my B.C.S at Carleton University.
ChaosSpec is a prototype resilience testing framework that enables developers to create scenarios, such as network faults and hardware stress tests, directly within existing test suites without knowledge of specialized tooling or infrastructure.
Today, resilience testing is uncommon outside large organizations because existing tools often require specialized knowledge and the cost of authoring scenarios often outweighs their perceived value compared to building new features.
ChaosSpec addresses these problems by hiding operational complexity behind a declarative interface, allowing developers to describe services, faults, traffic, and assertions while the runtime handles orchestration and fault injection automatically, enabling developers to write resilience tests as quick as any other unit or integration test.
In short: ChaosSpec enables resilience testing that feels like writing unit tests.
The repository is organized as follows with each subdirectory containing its own respective README.md:
| 📁 Directory | Description |
|---|---|
| 📦 chaos-spec-lib | The ChaosSpec library, including runtime implementation and API. |
| 📊 evaluation | Artifacts like execution log time measurements and tests for ease-of-use comparison. |
| 🧪 example-service | A sample application used throughout the example tests and evaluations. |
| 📋 report.pdf | The final report documenting the project, its findings, and future work. |
To run the example ChaosSpec tests:
- Make sure you have Docker desktop installed
- Ensure you have Node.js installed
cdintochaos-spec-lib- Run
npm ito install necessary dependencies - Run
npm run test:chaosto run the ChaosSpec examples inchaos-spec-lib/test
Here is a simple ChaosSpec network latency example:
describe("when redis experiences 500ms latency", () => {
it("will still meet the SLO of 2000ms for a read request", async () => {
const SLO_MS = 2000;
// ======================================================
// 1. Setup the services under test
// ======================================================
const redis = await createService("redis", {
image: "redis:7.0-alpine",
portToProxy: 6379,
});
const service = await createService("example-service", {
image: EXAMPLE_SERVICE_IMAGE,
portToExpose: 5000,
environment: {
REDIS_URL: `redis://${redis.getHost()}:${redis.getProxyPort()}`,
},
});
// ======================================================
// 2. Inject 500ms of network latency
// ======================================================
await redis.startNetworkLatency({ latencyMs: 500 });
// ======================================================
// 3. Send HTTP request and measure duration
// ======================================================
const serviceUrl = `http://localhost:${service.getMappedPort(5000)}/api/books`;
const [response, duration] = await timedHttpRequest(serviceUrl, {
method: "GET",
headers: { "Content-Type": "application/json" },
});
// ======================================================
// 4. Assert that the measured duration is within the SLO
// ======================================================
expect(response.ok).toBe(true);
expect(duration).toBeLessThanOrEqual(SLOms);
});
});