A high-performance, persistent key-value store inspired by Bitcask. KVStash provides a simple HTTP API for storing and retrieving data with strong durability guarantees, automatic compaction, and data corruption detection.
- Features
- Quick Start
- API Reference
- Architecture
- Design Decisions
- Load Testing
- Future Enhancements
- License
- Append-only log design - Simple, fast writes with strong durability
- In-memory index - O(1) lookups without scanning disk
- Tombstone-based deletion - Delete keys with persistent tombstone records
- Automatic log rotation - Prevents unbounded file growth
- Automatic compaction - Periodic garbage collection reclaims disk space
- Dual checksum validation - SHA-256 checksums for both metadata and data
- Thread-safe operations - Concurrent reads, writes, and deletes supported
- Corruption detection - Automatic detection and handling of corrupted data
- Graceful degradation - Tolerates corruption in active log during crash recovery
- Crash recovery - Automatic backup and recovery mechanisms
cd src
go build -o kvstash
./kvstashThe server will start on http://localhost:8080
Edit src/constants/metadata.go and src/constants/segment.go:
// Database configuration
DBPath = "../db" // Database directory
MaxKeySize = 256 // Maximum key size (bytes)
MaxValueSize = 1048576 // Maximum value size (1 MB)
MaxKeysPerSegment = 3 // Writes per segment before rotation
CompactionInterval = 60 // Compaction interval (seconds)Endpoint: POST /kvstash
Request:
{
"key": "username",
"value": "john_doe"
}Response (201 Created):
{
"success": true,
"message": "",
"data": null
}Error Responses:
400 Bad Request- Empty key, key/value too large, or invalid JSON500 Internal Server Error- Write failure
Endpoint: GET /kvstash
Request:
{
"key": "username"
}Response (200 OK):
{
"success": true,
"message": "",
"data": {
"key": "username",
"value": "john_doe"
}
}Error Responses:
404 Not Found- Key doesn't exist500 Internal Server Error- Read failure or data corruption
Endpoint: DELETE /kvstash
Request:
{
"key": "username"
}Response (200 OK):
{
"success": true,
"message": "",
"data": null
}Error Responses:
400 Bad Request- Empty key or key too large404 Not Found- Key doesn't exist500 Internal Server Error- Delete failure
How Deletion Works (Soft Delete):
- Writes a tombstone record to the append-only log with the
FlagDeletedmarker - Marks the key in the in-memory index with
Deleted=true(soft delete) - Key remains in index pointing to tombstone location (ensures compaction works even when all keys deleted)
- GET operations return 404 Not Found for soft-deleted keys
- Tombstone is replayed during recovery to restore the
Deleted=truestate - During compaction, soft-deleted entries are skipped and not copied to the new store
- Physical disk space is reclaimed when old segments are removed during compaction
# Set a value
curl -X POST http://localhost:8080/kvstash \
-H "Content-Type: application/json" \
-d '{"key":"user:1","value":"Alice"}'
# Get a value
curl -X GET http://localhost:8080/kvstash \
-H "Content-Type: application/json" \
-d '{"key":"user:1"}'
# Update a value (same as set)
curl -X POST http://localhost:8080/kvstash \
-H "Content-Type: application/json" \
-d '{"key":"user:1","value":"Bob"}'
# Delete a key
curl -X DELETE http://localhost:8080/kvstash \
-H "Content-Type: application/json" \
-d '{"key":"user:1"}'KVStash uses an append-only log with the following structure:
[Metadata 112 bytes][Value N bytes][Metadata 112 bytes][Value M bytes]...
Metadata Structure (120 bytes):
- Offset (8 bytes) - Byte position of value data
- Size (8 bytes) - Length of value data
- Flags (8 bytes) - Operation flags (bit 0 = deleted/tombstone)
- SegmentFile (32 bytes) - Name of containing file
- Checksum (32 bytes) - SHA-256 of value data
- MChecksum (32 bytes) - SHA-256 of metadata
KVStash uses a soft-delete approach where deleted keys remain in the index but are marked as deleted.
Tombstone Structure:
- Metadata with
FlagDeleted(bit 0 set) - Value contains only the key:
{"key":"username","value":""} - Size reflects the JSON-encoded key size (~15-20 bytes)
Soft Delete Flow:
- DELETE request received for key "foo"
- Tombstone written to active log with FlagDeleted=true
- Index entry updated with
Deleted=true(soft delete, entry stays in index) - GET operations check the
Deletedflag and return 404 if true - On server restart: log replay processes tombstone and creates entry with
Deleted=true - During compaction: entries with
Deleted=trueare skipped (not copied) - After compaction: old segments with tombstones are removed (space reclaimed)
Why Soft Delete:
- Ensures compaction works correctly even when all keys are deleted
- Allows tracking of disk space used by tombstones
- Maintains consistency between runtime and post-restart states
- Simplifies corruption handling (mark corrupted entries as deleted)
Example Log After Delete:
[Metadata][{"key":"foo","value":"bar"}] ← Original SET
[Metadata][{"key":"foo","value":"baz"}] ← UPDATE
[Metadata+FlagDeleted][{"key":"foo","value":""}] ← DELETE (tombstone)
When the active log reaches MaxKeysPerSegment writes:
- Current active log is closed (becomes an archived segment)
- New segment file is created (e.g., seg0.log → seg1.log → seg2.log)
- activeLogCount resets to 0
- Writes continue to the new active log
Segment naming: seg0.log, seg1.log, seg2.log, etc. (0-indexed)
In-memory hash map for O(1) lookups with soft-delete support:
map[string]*IndexEntry {
"active_key" -> {
SegmentFile: "seg2.log",
Offset: 1000,
Size: 256,
Checksum: [32]byte{...},
Deleted: false // Live key
},
"deleted_key" -> {
SegmentFile: "seg3.log",
Offset: 2000,
Size: 18,
Checksum: [32]byte{...},
Deleted: true // Soft-deleted (tombstone)
}
}Key Points:
- Deleted entries remain in the index with
Deleted=true - GET operations check the
Deletedflag and return 404 if true - Compaction skips entries with
Deleted=trueto reclaim space - This ensures compaction works even when all keys are deleted
Dual Checksum System:
- Value checksum = SHA-256(offset || size || fileName || data)
- Metadata checksum = SHA-256(offset || size || fileName || valueChecksum)
On Write:
- Both checksums computed and stored with metadata
- Offset rollback on partial write failures
On Read:
- Metadata checksum validated during index building
- Value checksum validated on every read operation
- Corrupted entries automatically purged from index
Active Log Corruption:
- Tolerates corruption in the active log (expected during crashes)
- Reads all valid entries before corruption point
- Logs error but continues startup
- Allows graceful degradation
Archived Segment Corruption:
- Fails fast on corruption in archived segments (unexpected)
- Clears entire index and returns error
- Prevents serving potentially incorrect data
- Requires manual intervention
KVStash implements periodic compaction to reclaim disk space from old/updated values.
Compaction Process:
- Backup Creation - Database copied to
BackupDBPathbefore any modifications - New Store Creation - Temporary store created at
TmpDBPath - Data Copy - Only current (non-deleted, non-stale) key-value pairs copied to new store
- Atomic Swap - Old database replaced with compacted version
- Cleanup - Backup removed on success, restored on failure
What Gets Removed:
- Old values for updated keys
- Tombstones for deleted keys
- Deleted key data (keys not in current index)
Lock Strategy:
- Global store lock held during entire compaction cycle
- Ensures data consistency during database swap
- Trade-off: All Get/Set operations blocked during compaction
- Compaction duration depends on number of active keys
Timing:
- Runs automatically every
CompactionIntervalseconds (default: 60s) - Only spawned for main database (not temporary stores)
- Single compaction goroutine per store instance
Disk Space Savings:
- Eliminates old values for updated keys
- Removes tombstones and deleted key data
- Removes stale entries from rotated segments
- Defragments data across fewer segment files
- Example: 1000 writes to same key → compacted to 1 entry
- Example: Deleted key with tombstone → completely removed
Recovery Mechanisms:
Disaster Recovery (On Startup):
Database missing but backup exists → Restore from backup → Delete backup
- Handles crash during compaction after backup created
- Automatic recovery, no manual intervention needed
- Panics if recovery fails (database unrecoverable)
Compaction Failure Recovery:
Compaction fails → Close new store → Restore from backup → Recreate writer
- Backup restored if compaction swap fails
- Database returned to pre-compaction state
- Panics if backup restoration fails
Error Handling:
- Backup creation failure: Skip cycle, retry next interval
- New store creation failure: Skip cycle, retry next interval
- Data copy failure: Clean up resources, skip cycle
- Database swap failure: Restore from backup
- Recovery failure: Panic (unrecoverable state)
Performance Impact:
- During compaction: Latency spike visible in P95/P99 metrics
- After compaction: Improved read/write performance due to fewer segment files
- Throughput: Temporary drop during lock acquisition
- Disk I/O: Spike during backup creation and data copy
Monitoring Compaction:
Watch server logs for compaction messages:
autoCompact: done # Successful compaction completed
autoCompact: backup failed: <error> # Backup failed, skipping
autoCompact: creating new store failed: <error> # Store creation failed
autoCompact: failed to fetch <key>: <error> # Data copy failed
autoCompact: failed to rename tmp db: <error> # Swap failed, recovering
autoCompact: skipping store replacement # Cleanup after failure
Success Indicator:
- Successful compaction logs
autoCompact: doneafter completion - Deleted entries are removed from disk
- Index is updated with the compacted store's metadata
- Simplicity - No in-place updates, no fragmentation
- Crash safety - Partial writes don't corrupt existing data
- Fast writes - Sequential I/O is faster than random
- Easy recovery - Replay log to rebuild index
- Fast lookups - O(1) without disk seeks
- Small footprint - Only metadata, not actual values
- Quick startup - Index rebuilt by scanning logs once
- Bounded file size - Prevents single file from growing too large
- Easier compaction - Can merge/compact old segments separately
- Better performance - Smaller files for OS to manage
- Metadata protection - Detect corruption during index building
- Value protection - Detect corruption during reads
- Fail fast - Identify corruption early rather than serving bad data
- Append-only compatibility - Deletions fit naturally into the log structure
- Crash safety - Tombstones are replayed on recovery to restore delete operations
- Compaction reliability - Soft-deleted entries ensure compaction works even when all keys deleted
- Space tracking - Index tracks disk space used by tombstones for monitoring
- Corruption handling - Corrupted entries can be marked as deleted and cleaned up during compaction
- Consistency - Same behavior before and after restart (deleted keys always have
Deleted=true) - Eventual cleanup - Deleted entries physically removed during next compaction cycle
- Compaction blocks operations - Global lock during compaction blocks all reads/writes/deletes
- Memory overhead - Entire index must fit in RAM (including soft-deleted entries until compaction)
- Single server - No replication or clustering (yet)
- Windows file handles - Requires delays for directory operations on Windows
- Tombstone overhead - Deleted keys occupy both disk space and memory until next compaction cycle
Advantages:
- ✅ Automatic disk space reclamation
- ✅ Backup/recovery mechanisms built-in
- ✅ No external compaction tools needed
- ✅ Guaranteed data consistency during swap
Disadvantages:
- ❌ All operations blocked during compaction (global lock)
- ❌ Periodic latency spikes every
CompactionIntervalseconds - ❌ Requires 2x disk space (original + backup)
- ❌ Compaction time increases with number of keys
When Compaction Hurts Performance:
- High-frequency read/write workloads
- Latency-sensitive applications (SLA < 100ms)
- Large datasets (millions of keys)
- Low
CompactionIntervalvalues
Optimization Strategies:
- Increase
CompactionIntervalto reduce frequency - Increase
MaxKeysPerSegmentto reduce segment count - Run compaction during low-traffic windows
- Consider lock-free compaction (future enhancement)
- Lock-free compaction (background incremental compaction)
- Range queries
- Point-in-time snapshots
- Replication and clustering
- Compression
- Bloom filters for faster negative lookups
- Metrics endpoint (Prometheus format)
- Optimized tombstone handling (batch deletion during compaction)
This is a personal learning project.