fix: prevent cache collisions when URLs return duplicate ETags#1949
fix: prevent cache collisions when URLs return duplicate ETags#1949javacruft wants to merge 5 commits intochainguard-dev:mainfrom
Conversation
Fixes chainguard-dev#1944 where multi-arch builds failed because Alpine Linux returns the same ETag for different signing keys. The cache used only ETags as keys, causing different files to collide. Changes: - Add URL hash to cache key: etag + sha256(url)[:4] - Each unique URL gets unique cache entry regardless of ETag - Add integration test simulating Alpine Linux duplicate ETags - Add concurrency test for multi-arch build scenario - Add backward compatibility test documenting cache format change - Simplify and remove low-value unit tests Test results: - All existing tests pass - Integration test verifies end-to-end cache behavior - Concurrency test validates multi-arch scenario with goroutines
- Replace if-else chain with switch statement (gocritic) - Use fmt.Fprintf instead of []byte(fmt.Sprintf()) (modernize) - Use range over int for nested loops (modernize)
Add a new workflow job and example config that specifically tests the scenario from chainguard-dev#1944 where Alpine Linux returns the same ETag for different architecture-specific signing keys. The new test: - Builds a multi-arch image (x86_64, aarch64) with Alpine repos - Includes alpine-keys package which triggers the ETag collision - Would have failed before the cache collision fix - Prevents future regressions of this issue Files added: - examples/alpine-multiarch-keys.yaml - Minimal config to trigger bug - .github/workflows/build-samples.yml - New job to test multi-arch build
|
|
||
| func cacheFileFromEtag(cacheFile, etag string) (string, error) { | ||
| cacheDir := filepath.Dir(cacheFile) | ||
| ext := ".etag" |
There was a problem hiding this comment.
The fact the the filename is no longer just an etag makes this feel a bit odd - might need a tweak
|
we write out keys inside the built image; thus we might as well store the keys by filename, as we have an existing bug that we cannot use two different keys with the same filename but different full-url / prefix. Unless we want to parse the key and store it by like it's RSA public key properties (or full sha256?) and solve the bug of distinct keys under the same name. |
The previous fix added URL hashes to cache keys to prevent ETag collisions, but broke offline mode. fetchOffline() was picking any file from the cache directory instead of filtering by URL hash. Fix: Filter cache files by expected URL hash suffix to ensure the correct cached file is returned for the specific URL.
Extract URL hash computation into cacheFileHash() to avoid duplication between cacheFileFromEtag() and fetchOffline(). Returns string for direct use in cache filenames.
The short hash is calculated using the full URL so I think this should avoid same filename collisions. |
|
If we structure the cache by URL already, and want to use URL subdirs, is it simply the case we want to use full URLs without using etags as the leaf masquerading for the filename? As in, I think it is more user friendly to simply use the filename from the URL, than truncated sha256 of the very same filename. |
|
I like the etags collision tests - in case alpine stops having key collision. I don't like the sha256 URL suffix. Looks odd. |
Fixes #1944 where multi-arch builds failed because Alpine Linux
returns the same ETag for different signing keys. The cache used
only ETags as keys, causing different files to collide.
Changes:
Test results: