Bloodhound

A deterministic simulation testing platform for hunting bugs in distributed systems.

Bloodhound uses a modified QEMU hypervisor to provide perfect reproducibility for containerized applications, enabling systematic exploration of failure scenarios that are impossible to reproduce with traditional testing.

Version 0.2.0 - OCI image integration, registry auth, container auto-translation, property checking. See Release Notes.

Research Note: This project is co-authored with Claude (Anthropic) as an experiment in AI-assisted systems programming. See CLAUDE.md for project philosophy and AI collaboration guidelines.

Platform Support: Linux x86_64 fully tested with real VMs. macOS supports harness/simulation mode only. See Platform Support for details.

Features

Deterministic Execution: Same seed produces identical execution every time
Language Agnostic: Test any containerized application (Go, Rust, Java, Python, etc.)
Fault Injection: Network partitions, disk failures, process crashes, clock skew
Time-Travel Debugging: Full replay capability with GDB integration
Coverage-Guided Exploration: Intelligent state space exploration
Docker Compose Integration: Test multi-service stacks directly

Architecture

┌─────────────────────────────────────────────────────────────┐
│                       BLOODHOUND                            │
│              (Modified QEMU Hypervisor)                     │
├─────────────────────────────────────────────────────────────┤
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
│  │ Virtual Time │  │ Fault Inject │  │ State Snap   │      │
│  │ (TSC, HPET)  │  │ (Net, Disk)  │  │ (CoW, Tree)  │      │
│  └──────────────┘  └──────────────┘  └──────────────┘      │
├─────────────────────────────────────────────────────────────┤
│                    GUEST VMs (Containers)                   │
│  ┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐           │
│  │  web   │  │ redis  │  │postgres│  │  ...   │           │
│  └────────┘  └────────┘  └────────┘  └────────┘           │
└─────────────────────────────────────────────────────────────┘

Quick Start

Installation

Option 1: Download Release Binary (Recommended)

# Linux x86_64
curl -LO https://github.com/nerdsane/bloodhound/releases/latest/download/bloodhound-linux-amd64.tar.gz
tar -xzf bloodhound-linux-amd64.tar.gz
sudo mv bloodhound /usr/local/bin/

# macOS Intel
curl -LO https://github.com/nerdsane/bloodhound/releases/latest/download/bloodhound-darwin-amd64.tar.gz
tar -xzf bloodhound-darwin-amd64.tar.gz
sudo mv bloodhound /usr/local/bin/

# macOS Apple Silicon
curl -LO https://github.com/nerdsane/bloodhound/releases/latest/download/bloodhound-darwin-arm64.tar.gz
tar -xzf bloodhound-darwin-arm64.tar.gz
sudo mv bloodhound /usr/local/bin/

# Verify installation
bloodhound --version

Option 2: Install from Source

# Using cargo
cargo install --git https://github.com/nerdsane/bloodhound

# Or build manually
git clone https://github.com/nerdsane/bloodhound.git
cd bloodhound
cargo build --release

Option 3: Full Setup with Real VMs (Linux Only)

# Clone the repository
git clone https://github.com/nerdsane/bloodhound.git
cd bloodhound

# Build the CLI
cargo build --release

# Build the deterministic kernel
./scripts/build-guest.sh

# Clone and build the patched QEMU hypervisor
git clone https://github.com/nerdsane/qemu.git
cd qemu
mkdir build && cd build
../configure --target-list=x86_64-softmmu --enable-bloodhound
make -j$(nproc)
cd ../..

# Verify installation
./target/release/bloodhound --version

Basic Usage

# Run with specific seed (deterministic)
bloodhound run --compose docker-compose.yml --seed 42

# Explore for bugs (coverage-guided)
bloodhound explore --compose docker-compose.yml --seeds 10000 --timeout 1h

# Debug a failure with time-travel debugging
bloodhound debug --seed 42 --gdb-port 1234

# Run as part of CI/CD
bloodhound test --compose docker-compose.yml --coverage-threshold 80%

Configuration

Create a bloodhound.yaml file in your project directory:

# Docker Compose file to test
compose: docker-compose.yml

# Simulation settings
simulation:
  max_time: 5m        # Maximum simulation time
  time_step: 10ms     # Time step granularity
  seeds: 1000         # Number of seeds to explore

# Workload configuration
workload:
  driver: http
  config:
    target: http://lb:80
    qps: 100
    duration: 60s
    operations:
      - type: put
        weight: 40
        key_pattern: "key-{random:1-10000}"
      - type: get
        weight: 50
      - type: delete
        weight: 10

# Properties to verify
properties:
  - name: no-data-loss
    kind: safety
    description: "Acknowledged writes must not be lost"
    check:
      type: linearizability
      operations: [put, get]

  - name: partition-recovery
    kind: liveness
    description: "Cluster must recover within 30s after partition heals"
    timeout: 30s
    check:
      type: http
      endpoint: http://lb:80/health
      expect:
        status: 200

# Fault injection
faults:
  network:
    drop_rate: 0.01
    delay_ms: 50
    delay_rate: 0.05
    partition_probability: 0.001
    partition_duration_ms: [1000, 10000]

  disk:
    write_fail_rate: 0.001
    partial_write_rate: 0.0005
    fsync_fail_rate: 0.001

  process:
    crash_probability: 0.0001
    pause_probability: 0.0005
    oom_probability: 0.00001

# Exploration settings
exploration:
  strategy: coverage-guided  # bfs, dfs, random, coverage-guided
  max_depth: 1000
  max_states: 100000
  parallel_workers: 8
  prioritize_coverage: true
  prioritize_near_violation: true

# Output settings
output:
  dir: ./output
  save_violation_traces: true
  save_interesting_seeds: true
  coverage_format: html
  summary_report: true

# Debugging
debug:
  gdb_enabled: true
  gdb_port: 1234
  trace_level: normal
  trace_events:
    - network_send
    - network_recv
    - disk_write
    - process_crash

Property Types

Safety Properties

Safety properties assert that "bad things never happen." They are checked continuously throughout execution.

- name: no-data-loss
  kind: safety
  check:
    type: linearizability
    operations: [put, get]

Liveness Properties

Liveness properties assert that "good things eventually happen." They include a timeout.

- name: leader-election
  kind: liveness
  timeout: 5s
  check:
    type: custom
    script: ./checks/has-leader.sh

Invariants

Invariants are properties that must hold at every state.

- name: single-leader-per-term
  kind: invariant
  check:
    type: custom
    script: ./checks/single-leader.sh

Fault Injection

Bloodhound can inject various types of faults deterministically:

Network Faults

Packet drop: Randomly drop network packets
Packet delay: Add latency to network communication
Packet corruption: Corrupt packet data
Network partitions: Isolate nodes from each other

Disk Faults

Write failures: Fail disk writes
Partial writes: Simulate torn writes (power failure during write)
fsync failures: Fail fsync calls
Read corruption: Return corrupted data on reads

Process Faults

Crashes: Kill processes (SIGKILL)
Pauses: Pause processes (SIGSTOP)
OOM kills: Simulate out-of-memory conditions

Time-Travel Debugging

When a bug is found, Bloodhound can replay the exact execution:

# Start debugging session
bloodhound debug --seed 42 --gdb-port 1234

# In another terminal, connect with GDB
gdb -ex "target remote :1234"

The debugger supports:

Step forward/backward: Navigate through execution
Breakpoints: Set breakpoints that work across time
Watchpoints: Watch variables change over time
Reverse execution: Step backward to find bug origins

CI/CD Integration

GitHub Actions

name: Bloodhound Tests
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: datadog/bloodhound-action@v1
        with:
          compose: docker-compose.yml
          seeds: 1000
          timeout: 30m
          coverage-threshold: 80%

Examples

See the examples/ directory for complete examples:

redis-rust-demo/: Bug reproduction demo - Find and reproduce a CRDT consistency bug using async-VM mode. See Demo Documentation.
redis-rust/: Simple redis-rust cluster example
distributed-kv/: A distributed key-value store with Raft consensus
message-queue/: A distributed message queue
cache-cluster/: A distributed cache with consistent hashing

How It Works

Deterministic Hypervisor: Bloodhound uses a modified QEMU with TCG (Tiny Code Generator) mode to ensure deterministic execution. All sources of non-determinism (time, random numbers, I/O ordering) are controlled.
Virtual Time: Time is virtualized so simulations run faster than real-time while maintaining correct behavior. A 5-minute simulation might complete in seconds.
Snapshot Tree: Bloodhound maintains a tree of VM snapshots using copy-on-write, enabling efficient exploration of different execution paths from the same starting point.
Coverage-Guided Exploration: Like fuzzing, Bloodhound prioritizes execution paths that discover new code coverage, efficiently finding edge cases.
Property Checking: User-defined properties are checked at each step, with violations triggering detailed trace capture.

Execution Modes

Mode	Status	Description
Simulation	Working	In-memory deterministic simulation with fault injection
Docker	Working	Container-based testing with real Docker Compose stacks
Async-VM	Working	Real QEMU VMs with deterministic execution via patched QEMU

Async-VM Mode

Async-VM mode runs real QEMU virtual machines with deterministic time control and fault injection via the custom bloodhound-ctrl QMP command implemented in the QEMU fork.

Features:

Deterministic time control (set_time, advance_time, freeze_time)
Deterministic RNG seeding (set_seed)
Fault injection (inject_network_fault, inject_disk_fault)
VM snapshots and restore

Example:

# Build the patched QEMU (required for async-VM mode)
git clone https://github.com/nerdsane/qemu.git
cd qemu && git checkout bloodhound
mkdir build && cd build
../configure --target-list=x86_64-softmmu --enable-bloodhound
make -j$(nproc)
cd ../..

# Run async-VM exploration
bloodhound explore \
  --compose docker-compose.yml \
  --async-vm \
  --base-image disk.qcow2 \
  --kernel guest/build/bzImage \
  --initrd guest/build/initramfs.cpio.gz \
  --qemu qemu/build/qemu-system-x86_64 \
  --seeds 10 \
  --max-depth 5

Other commands:

# Simulation mode (no QEMU required)
bloodhound explore --compose docker-compose.yml --seeds 1000

# Docker-based testing
bloodhound test --compose docker-compose.yml --config bloodhound.yaml

# Determinism verification
bloodhound verify --kernel guest/build/bzImage --initrd guest/build/initramfs.cpio.gz

Platform Support

Platform	Harness Mode	Real VM Mode	Status
Linux x86_64	✅ Full	✅ Full	Fully Tested
macOS Intel	✅ Full	⚠️ Experimental	Harness mode tested
macOS ARM	✅ Full	⚠️ Experimental	Harness mode tested

Linux x86_64 (Recommended)

Full support for all features
Real VM mode with QEMU/KVM tested and working
296 unit tests + 44 integration tests passing

macOS (Harness Mode Only)

Harness/simulation mode fully works (all unit tests pass)
Real VM mode (--async-vm) is untested and requires:
- QEMU built with HVF (Hypervisor.framework) support
- Different VM configuration than Linux
Use bloodhound test --actor-mode for simulation testing

Requirements

Linux: KVM support recommended (x86_64)
macOS: Harness mode works out of the box
Docker (for container-based tests)
8GB+ RAM recommended
SSD recommended for snapshot storage

Optional: Patched QEMU (Linux Only)

For full deterministic async-VM execution, build the patched QEMU:

git clone https://github.com/nerdsane/qemu.git
cd qemu
mkdir build && cd build
../configure --target-list=x86_64-softmmu --enable-bloodhound
make -j$(nproc)

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

MIT License. See LICENSE for details.

Documentation

Bug Reproduction Demo - Start here! End-to-end walkthrough finding and reproducing a real bug
FAQ - Why hypervisor over kernel, SMP limitations, TLB handling, and more
Known Limitations - Current limitations and workarounds
Async-VM Mode - Async-VM mode setup and bloodhound-ctrl integration
Deterministic Kernel - Building the guest kernel
QEMU Fork - Patched QEMU specification with bloodhound-ctrl QMP

Acknowledgments

Bloodhound is inspired by:

Antithesis - Deterministic simulation testing
FoundationDB Testing - Simulation testing at Apple
TigerBeetle - Tiger Style and simulation testing
Jepsen - Distributed systems testing

Disclaimer

This is a research project exploring deterministic simulation testing techniques. It is:

Not production-ready: Core features work but edge cases exist
Under active development: APIs may change without notice
Co-authored with AI: Developed collaboratively with Claude (Anthropic)

The goal is to explore what's possible with human-AI collaboration on complex systems programming, not to provide a production-ready tool. Use at your own risk.

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
.github/workflows		.github/workflows
benches		benches
ci		ci
crates/bloodhound-protocol		crates/bloodhound-protocol
docs		docs
examples		examples
guest		guest
gvisor-patches		gvisor-patches
qemu-patches		qemu-patches
scripts		scripts
specs/tla		specs/tla
src		src
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

License

nerdsane/bloodhound

Folders and files

Latest commit

History

Repository files navigation

Bloodhound

Features

Architecture

Quick Start

Installation

Basic Usage

Configuration

Property Types

Safety Properties

Liveness Properties

Invariants

Fault Injection

Network Faults

Disk Faults

Process Faults

Time-Travel Debugging

CI/CD Integration

GitHub Actions

Examples

How It Works

Execution Modes

Async-VM Mode

Platform Support

Linux x86_64 (Recommended)

macOS (Harness Mode Only)

Requirements

Optional: Patched QEMU (Linux Only)

Contributing

License

Documentation

Acknowledgments

Disclaimer

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 2

Uh oh!

Languages

Packages