High-performance, thread-safe content-defined chunking (CDC) library for Go using the FastCDC algorithm with Gear hash.
- High Performance: ~1350 MB/s throughput, fastest Go implementation of FastCDC
- Low Allocations: ~4 allocations/op with convenient API, 0 allocations/op with advanced API
- Thread-Safe: Per-instance hash tables eliminate data races
- Dual API: Simple streaming API for convenience, zero-allocation API for performance
- Normalized Chunking: Two-phase boundary detection for better chunk distribution
- Pure Go: No external dependencies, works on all platforms
go get github.com/kalbasit/fastcdcpackage main
import (
"fmt"
"io"
"os"
"github.com/kalbasit/fastcdc"
)
func main() {
file, _ := os.Open("largefile.dat")
defer file.Close()
chunker, _ := fastcdc.NewChunker(file, fastcdc.WithTargetSize(64*1024))
for {
chunk, err := chunker.Next()
if err == io.EOF {
break
}
if err != nil {
panic(err)
}
fmt.Printf("Chunk at offset %d: %d bytes, hash=%x\n",
chunk.Offset, chunk.Length, chunk.Hash)
// Process chunk.Data (valid until next Next() call)
processChunk(chunk.Data)
}
}For performance-critical code where you manage buffers manually:
package main
import (
"fmt"
"os"
"github.com/kalbasit/fastcdc"
)
func main() {
file, _ := os.Open("largefile.dat")
defer file.Close()
core, _ := fastcdc.NewChunkerCore(fastcdc.WithTargetSize(64*1024))
buf := make([]byte, 1*1024*1024) // 1 MiB buffer
for {
n, err := file.Read(buf)
if n == 0 {
break
}
offset := 0
for offset < n {
boundary, hash, found := core.FindBoundary(buf[offset:n])
if found {
chunkData := buf[offset:offset+boundary]
fmt.Printf("Chunk: %d bytes, hash=%x\n", len(chunkData), hash)
processChunk(chunkData)
offset += boundary
core.Reset()
} else {
// Handle partial chunk at buffer boundary
break
}
}
if err != nil {
break
}
}
}For concurrent processing with minimal allocations:
package main
import (
"bytes"
"io"
"sync"
"github.com/kalbasit/fastcdc"
)
func main() {
pool, _ := fastcdc.NewChunkerPool(fastcdc.WithTargetSize(64*1024))
var wg sync.WaitGroup
jobs := make(chan []byte, 100)
// Worker pool
for i := 0; i < 10; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for data := range jobs {
chunker, _ := pool.Get(bytes.NewReader(data))
for {
chunk, err := chunker.Next()
if err == io.EOF {
break
}
processChunk(chunk.Data)
}
pool.Put(chunker)
}
}()
}
// Feed jobs
// ... send data to jobs channel ...
close(jobs)
wg.Wait()
}// Size constraints
fastcdc.WithMinSize(16*1024) // Minimum chunk size (default: 16 KiB)
fastcdc.WithTargetSize(64*1024) // Target chunk size (default: 64 KiB)
fastcdc.WithMaxSize(256*1024) // Maximum chunk size (default: 256 KiB)
// Normalization (affects chunk distribution)
fastcdc.WithNormalization(2) // Level 0-8 (default: 2)
// Higher = more uniform distribution
// Lower = faster processing
// Custom seed (for different chunking patterns)
fastcdc.WithSeed(12345) // Non-zero seed allocates per-instance table
// Buffer size (streaming API only)
fastcdc.WithBufferSize(1*1024*1024) // Default: 1 MiBBenchmarked on 10 MiB random data with 64 KiB target chunk size:
| API | Throughput | Allocations | Use Case |
|---|---|---|---|
FindBoundary() |
~1350 MB/s | 0 allocs/op | Performance-critical code |
Next() |
~1200 MB/s | ~4 alloc/op | General purpose streaming |
Pool |
~1250 MB/s | ~1 alloc/op | High-throughput concurrent |
Benchmarked on Apple M4, 10 MiB random data, 64 KiB target chunk size:
| Library | Throughput | Allocations | Bytes/op | Algorithm |
|---|---|---|---|---|
| fastcdc (FindBoundary) | 1344.45 MB/s | 0 allocs/op | 0 B | Gear hash |
| fastcdc (Next) | 1200.41 MB/s | 4 allocs/op | 514 KiB | Gear hash |
| fastcdc (Next, no norm) | 1304.57 MB/s | 4 allocs/op | 514 KiB | Gear hash |
| fastcdc (Pool) | 1230.12 MB/s | 1 allocs/op | 1 KiB | Gear hash |
| jotfs/fastcdc-go | 1180.75 MB/s | 3 allocs/op | 512 KiB | Gear hash |
| buildbuddy-io/fastcdc-go | 1160.49 MB/s | 3 allocs/op | 512 KiB | Gear hash |
| restic/chunker | 434.04 MB/s | 31 allocs/op | 25533 KiB | Rabin fingerprint |
Note: The FindBoundary() zero-allocation API provides superior performance. The streaming Next() API offers convenience with reasonable allocation overhead.
To run the comparison benchmark yourself:
cd benchmarks
./run_all.shThis will benchmark all libraries with identical test data and configurations, displaying:
- Throughput (MB/s)
- Allocations per operation
- Bytes allocated per operation
- Algorithm used
The new benchmark structure isolates each library in its own directory with independent dependencies, making it easy to add or update library versions. Raw results are saved to benchmarks/all_results.txt.
- You want a simple, easy-to-use streaming API
- You're processing files or streams sequentially
- ~1 allocation per chunk is acceptable
- You want automatic buffer management
- Every allocation counts (high-frequency processing)
- You're willing to manage buffers manually
- You need maximum performance
- You're integrating with existing buffer pools
- Processing many files concurrently
- You want to amortize allocations across operations
- You have high throughput requirements
- You're building a server or batch processor
FastCDC uses Gear hash instead of Rabin fingerprinting for 10x faster performance:
- Gear hash: 3 operations/byte (SHIFT, ADD, LOOKUP)
- Rabin: 6 operations/byte (OR, 2×XOR, 2×SHIFT, 2×LOOKUP)
Rolling hash update:
fingerprint = (fingerprint << 1) + table[current_byte]
Two-phase boundary detection for better chunk distribution:
- Skip phase [0, minSize): Fast-forward without checking
- Normalized phase [minSize, normSize): Check with smaller mask (easier to match)
- Standard phase [normSize, maxSize): Check with larger mask (harder to match)
- Hard limit: Force cut at maxSize
This prevents excessive tiny chunks while maintaining good distribution.
Each chunker instance has its own hash table, eliminating data races:
- Zero seed: Uses compile-time constant (no allocation, thread-safe)
- Custom seed: Allocates per-instance table (2 KiB, thread-safe)
- No global shared state
# Run tests
go test -v
# Run tests with race detector
go test -race
# Run internal benchmarks
cd benchmarks
go test -bench=. -benchmem -benchtime=3s
# Run comparison benchmarks against other libraries
cd benchmarks
./run_comparison.sh
# Or for detailed analysis:
python3 analyze_benchmarks.py
# Test distribution
go test -v -run=TestChunkerDistributionWe provide two APIs to balance convenience with performance:
-
Primary
Next()API: Most users want a simple streaming API. We achieve ~1 allocation/op (better than jotfs: 3, restic: 15) while keeping the API clean. -
Advanced
FindBoundary()API: Performance-critical code can drop to zero allocations by managing buffers manually.
This gives users the best of both worlds: simple API for most use cases, with a zero-allocation escape hatch when needed.
Gear hash is 3x faster than Rabin fingerprinting with similar chunking quality. The performance difference comes from:
- Simpler operations (no XOR, no multi-table lookups)
- Better CPU pipeline utilization
- Smaller lookup table (256 entries vs 512+ for Rabin)
Thread-safety without locks:
- Global table: Fast but causes data races with custom seeds
- Mutex-protected table: Thread-safe but slow
- Per-instance table: Thread-safe and fast (our choice)
The 2 KiB overhead per instance is negligible for most use cases.
Contributions welcome! Please ensure:
- Tests pass:
go test -race ./... - Benchmarks don't regress:
cd benchmarks && go test -bench=. - Code is formatted:
gofmt -s -w .
MIT License - see LICENSE file for details.
- FastCDC: a Fast and Efficient Content-Defined Chunking Approach for Data Deduplication (USENIX ATC 2016)
- Gear: A Simple and Efficient Rolling Hash
Inspired by:
- jotfs/fastcdc-go - Excellent Gear hash implementation
- restic/chunker - Battle-tested Rabin chunker