A deterministic simulation testing platform for hunting bugs in distributed systems.
Bloodhound uses a modified QEMU hypervisor to provide perfect reproducibility for containerized applications, enabling systematic exploration of failure scenarios that are impossible to reproduce with traditional testing.
Version 0.2.0 - OCI image integration, registry auth, container auto-translation, property checking. See Release Notes.
Research Note: This project is co-authored with Claude (Anthropic) as an experiment in AI-assisted systems programming. See CLAUDE.md for project philosophy and AI collaboration guidelines.
Platform Support: Linux x86_64 fully tested with real VMs. macOS supports harness/simulation mode only. See Platform Support for details.
- Deterministic Execution: Same seed produces identical execution every time
- Language Agnostic: Test any containerized application (Go, Rust, Java, Python, etc.)
- Fault Injection: Network partitions, disk failures, process crashes, clock skew
- Time-Travel Debugging: Full replay capability with GDB integration
- Coverage-Guided Exploration: Intelligent state space exploration
- Docker Compose Integration: Test multi-service stacks directly
┌─────────────────────────────────────────────────────────────┐
│ BLOODHOUND │
│ (Modified QEMU Hypervisor) │
├─────────────────────────────────────────────────────────────┤
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Virtual Time │ │ Fault Inject │ │ State Snap │ │
│ │ (TSC, HPET) │ │ (Net, Disk) │ │ (CoW, Tree) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
├─────────────────────────────────────────────────────────────┤
│ GUEST VMs (Containers) │
│ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │
│ │ web │ │ redis │ │postgres│ │ ... │ │
│ └────────┘ └────────┘ └────────┘ └────────┘ │
└─────────────────────────────────────────────────────────────┘
Option 1: Download Release Binary (Recommended)
# Linux x86_64
curl -LO https://github.com/nerdsane/bloodhound/releases/latest/download/bloodhound-linux-amd64.tar.gz
tar -xzf bloodhound-linux-amd64.tar.gz
sudo mv bloodhound /usr/local/bin/
# macOS Intel
curl -LO https://github.com/nerdsane/bloodhound/releases/latest/download/bloodhound-darwin-amd64.tar.gz
tar -xzf bloodhound-darwin-amd64.tar.gz
sudo mv bloodhound /usr/local/bin/
# macOS Apple Silicon
curl -LO https://github.com/nerdsane/bloodhound/releases/latest/download/bloodhound-darwin-arm64.tar.gz
tar -xzf bloodhound-darwin-arm64.tar.gz
sudo mv bloodhound /usr/local/bin/
# Verify installation
bloodhound --versionOption 2: Install from Source
# Using cargo
cargo install --git https://github.com/nerdsane/bloodhound
# Or build manually
git clone https://github.com/nerdsane/bloodhound.git
cd bloodhound
cargo build --releaseOption 3: Full Setup with Real VMs (Linux Only)
# Clone the repository
git clone https://github.com/nerdsane/bloodhound.git
cd bloodhound
# Build the CLI
cargo build --release
# Build the deterministic kernel
./scripts/build-guest.sh
# Clone and build the patched QEMU hypervisor
git clone https://github.com/nerdsane/qemu.git
cd qemu
mkdir build && cd build
../configure --target-list=x86_64-softmmu --enable-bloodhound
make -j$(nproc)
cd ../..
# Verify installation
./target/release/bloodhound --version# Run with specific seed (deterministic)
bloodhound run --compose docker-compose.yml --seed 42
# Explore for bugs (coverage-guided)
bloodhound explore --compose docker-compose.yml --seeds 10000 --timeout 1h
# Debug a failure with time-travel debugging
bloodhound debug --seed 42 --gdb-port 1234
# Run as part of CI/CD
bloodhound test --compose docker-compose.yml --coverage-threshold 80%Create a bloodhound.yaml file in your project directory:
# Docker Compose file to test
compose: docker-compose.yml
# Simulation settings
simulation:
max_time: 5m # Maximum simulation time
time_step: 10ms # Time step granularity
seeds: 1000 # Number of seeds to explore
# Workload configuration
workload:
driver: http
config:
target: http://lb:80
qps: 100
duration: 60s
operations:
- type: put
weight: 40
key_pattern: "key-{random:1-10000}"
- type: get
weight: 50
- type: delete
weight: 10
# Properties to verify
properties:
- name: no-data-loss
kind: safety
description: "Acknowledged writes must not be lost"
check:
type: linearizability
operations: [put, get]
- name: partition-recovery
kind: liveness
description: "Cluster must recover within 30s after partition heals"
timeout: 30s
check:
type: http
endpoint: http://lb:80/health
expect:
status: 200
# Fault injection
faults:
network:
drop_rate: 0.01
delay_ms: 50
delay_rate: 0.05
partition_probability: 0.001
partition_duration_ms: [1000, 10000]
disk:
write_fail_rate: 0.001
partial_write_rate: 0.0005
fsync_fail_rate: 0.001
process:
crash_probability: 0.0001
pause_probability: 0.0005
oom_probability: 0.00001
# Exploration settings
exploration:
strategy: coverage-guided # bfs, dfs, random, coverage-guided
max_depth: 1000
max_states: 100000
parallel_workers: 8
prioritize_coverage: true
prioritize_near_violation: true
# Output settings
output:
dir: ./output
save_violation_traces: true
save_interesting_seeds: true
coverage_format: html
summary_report: true
# Debugging
debug:
gdb_enabled: true
gdb_port: 1234
trace_level: normal
trace_events:
- network_send
- network_recv
- disk_write
- process_crashSafety properties assert that "bad things never happen." They are checked continuously throughout execution.
- name: no-data-loss
kind: safety
check:
type: linearizability
operations: [put, get]Liveness properties assert that "good things eventually happen." They include a timeout.
- name: leader-election
kind: liveness
timeout: 5s
check:
type: custom
script: ./checks/has-leader.shInvariants are properties that must hold at every state.
- name: single-leader-per-term
kind: invariant
check:
type: custom
script: ./checks/single-leader.shBloodhound can inject various types of faults deterministically:
- Packet drop: Randomly drop network packets
- Packet delay: Add latency to network communication
- Packet corruption: Corrupt packet data
- Network partitions: Isolate nodes from each other
- Write failures: Fail disk writes
- Partial writes: Simulate torn writes (power failure during write)
- fsync failures: Fail fsync calls
- Read corruption: Return corrupted data on reads
- Crashes: Kill processes (SIGKILL)
- Pauses: Pause processes (SIGSTOP)
- OOM kills: Simulate out-of-memory conditions
When a bug is found, Bloodhound can replay the exact execution:
# Start debugging session
bloodhound debug --seed 42 --gdb-port 1234
# In another terminal, connect with GDB
gdb -ex "target remote :1234"The debugger supports:
- Step forward/backward: Navigate through execution
- Breakpoints: Set breakpoints that work across time
- Watchpoints: Watch variables change over time
- Reverse execution: Step backward to find bug origins
name: Bloodhound Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: datadog/bloodhound-action@v1
with:
compose: docker-compose.yml
seeds: 1000
timeout: 30m
coverage-threshold: 80%See the examples/ directory for complete examples:
redis-rust-demo/: Bug reproduction demo - Find and reproduce a CRDT consistency bug using async-VM mode. See Demo Documentation.redis-rust/: Simple redis-rust cluster exampledistributed-kv/: A distributed key-value store with Raft consensusmessage-queue/: A distributed message queuecache-cluster/: A distributed cache with consistent hashing
-
Deterministic Hypervisor: Bloodhound uses a modified QEMU with TCG (Tiny Code Generator) mode to ensure deterministic execution. All sources of non-determinism (time, random numbers, I/O ordering) are controlled.
-
Virtual Time: Time is virtualized so simulations run faster than real-time while maintaining correct behavior. A 5-minute simulation might complete in seconds.
-
Snapshot Tree: Bloodhound maintains a tree of VM snapshots using copy-on-write, enabling efficient exploration of different execution paths from the same starting point.
-
Coverage-Guided Exploration: Like fuzzing, Bloodhound prioritizes execution paths that discover new code coverage, efficiently finding edge cases.
-
Property Checking: User-defined properties are checked at each step, with violations triggering detailed trace capture.
| Mode | Status | Description |
|---|---|---|
| Simulation | Working | In-memory deterministic simulation with fault injection |
| Docker | Working | Container-based testing with real Docker Compose stacks |
| Async-VM | Working | Real QEMU VMs with deterministic execution via patched QEMU |
Async-VM mode runs real QEMU virtual machines with deterministic time control and fault injection via the custom bloodhound-ctrl QMP command implemented in the QEMU fork.
Features:
- Deterministic time control (
set_time,advance_time,freeze_time) - Deterministic RNG seeding (
set_seed) - Fault injection (
inject_network_fault,inject_disk_fault) - VM snapshots and restore
Example:
# Build the patched QEMU (required for async-VM mode)
git clone https://github.com/nerdsane/qemu.git
cd qemu && git checkout bloodhound
mkdir build && cd build
../configure --target-list=x86_64-softmmu --enable-bloodhound
make -j$(nproc)
cd ../..
# Run async-VM exploration
bloodhound explore \
--compose docker-compose.yml \
--async-vm \
--base-image disk.qcow2 \
--kernel guest/build/bzImage \
--initrd guest/build/initramfs.cpio.gz \
--qemu qemu/build/qemu-system-x86_64 \
--seeds 10 \
--max-depth 5Other commands:
# Simulation mode (no QEMU required)
bloodhound explore --compose docker-compose.yml --seeds 1000
# Docker-based testing
bloodhound test --compose docker-compose.yml --config bloodhound.yaml
# Determinism verification
bloodhound verify --kernel guest/build/bzImage --initrd guest/build/initramfs.cpio.gz| Platform | Harness Mode | Real VM Mode | Status |
|---|---|---|---|
| Linux x86_64 | ✅ Full | ✅ Full | Fully Tested |
| macOS Intel | ✅ Full | Harness mode tested | |
| macOS ARM | ✅ Full | Harness mode tested |
- Full support for all features
- Real VM mode with QEMU/KVM tested and working
- 296 unit tests + 44 integration tests passing
- Harness/simulation mode fully works (all unit tests pass)
- Real VM mode (
--async-vm) is untested and requires:- QEMU built with HVF (Hypervisor.framework) support
- Different VM configuration than Linux
- Use
bloodhound test --actor-modefor simulation testing
- Linux: KVM support recommended (x86_64)
- macOS: Harness mode works out of the box
- Docker (for container-based tests)
- 8GB+ RAM recommended
- SSD recommended for snapshot storage
For full deterministic async-VM execution, build the patched QEMU:
git clone https://github.com/nerdsane/qemu.git
cd qemu
mkdir build && cd build
../configure --target-list=x86_64-softmmu --enable-bloodhound
make -j$(nproc)Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
MIT License. See LICENSE for details.
- Bug Reproduction Demo - Start here! End-to-end walkthrough finding and reproducing a real bug
- FAQ - Why hypervisor over kernel, SMP limitations, TLB handling, and more
- Known Limitations - Current limitations and workarounds
- Async-VM Mode - Async-VM mode setup and bloodhound-ctrl integration
- Deterministic Kernel - Building the guest kernel
- QEMU Fork - Patched QEMU specification with bloodhound-ctrl QMP
Bloodhound is inspired by:
- Antithesis - Deterministic simulation testing
- FoundationDB Testing - Simulation testing at Apple
- TigerBeetle - Tiger Style and simulation testing
- Jepsen - Distributed systems testing
This is a research project exploring deterministic simulation testing techniques. It is:
- Not production-ready: Core features work but edge cases exist
- Under active development: APIs may change without notice
- Co-authored with AI: Developed collaboratively with Claude (Anthropic)
The goal is to explore what's possible with human-AI collaboration on complex systems programming, not to provide a production-ready tool. Use at your own risk.