Add cache stripe lock contention metric #12839

bryancall · 2026-01-29T15:41:39Z

Summary

Adds two new metrics to track cache lock contention:

proxy.process.cache.stripe.lock_contention - counts stripe mutex contention
proxy.process.cache.writer.lock_contention - counts writer VC mutex contention during read aggregation

Also available per-volume as proxy.process.cache.volume_N.stripe.lock_contention.

Background

When ATS is configured with more threads than cache volumes, threads contend heavily for the stripe mutex, causing throughput degradation. These metrics make contention visible so operators can tune their configuration.

Benchmark Results

Testing on a 16-core system with 100 cached URLs:

Threads	Volumes	Throughput	Contentions/s
16	1	476k req/s	12,095k
16	16	1,160k req/s	177k
24	32	1,260k req/s	161k

With only 1 volume, 16 threads is slower than 4 threads due to contention. Adding volumes eliminates the bottleneck.

Usage

# Stripe lock contention (global)
traffic_ctl metric get proxy.process.cache.stripe.lock_contention

# Stripe lock contention (per-volume)
traffic_ctl metric match volume.*stripe.lock_contention

# Writer lock contention
traffic_ctl metric get proxy.process.cache.writer.lock_contention

Implementation

Stripe Lock Contention Call Sites (`VC_SCHED_LOCK_RETRY` / `VC_LOCK_RETRY_EVENT`)

All for stripe->mutex:

CacheRead.cc:

L210 - openReadClose
L428 - openReadReadDone
L456 - openReadReadDone
L653 - openReadMain
L705 - openReadMain
L766 - openReadStartEarliest
L932 - openReadVecWrite
L988 - openReadStartHead
L1210 - openReadDirDelete

CacheVC.cc:

L355 - openReadClose
L553 - die
L938 - scanOpenWrite

CacheWrite.cc:

L78 - handleWriteLock
L84 - handleWriteLock
L278 - openWriteCloseDir
L331 - openWriteCloseHeadDone
L410 - openWriteCloseDataDone
L504 - openWriteWriteDone
L648 - openWriteOverwrite
L681 - openWriteOverwrite
L794 - openWriteMain

Writer Lock Contention Call Site (`VC_SCHED_WRITER_LOCK_RETRY`)

For write_vc->mutex (not stripe):

CacheRead.cc:

L278 - openReadFromWriter (read aggregation)

Files Changed

P_CacheStats.h: Add stripe_lock_contention and writer_lock_contention counters
CacheProcessor.cc: Register both metrics
P_CacheInternal.h: Add metric increments to retry macros, add VC_SCHED_WRITER_LOCK_RETRY()
CacheRead.cc: Use VC_SCHED_WRITER_LOCK_RETRY() for writer mutex case

Adds proxy.process.cache.stripe.lock_contention counter that increments each time a thread fails to acquire the stripe mutex. This helps identify cache lock contention issues when tuning thread counts vs volume counts. Also available per-volume as proxy.process.cache.volume_N.stripe.lock_contention

Add VC_SCHED_LOCK_RETRY_NO_METRIC() macro for lock retries that are not for stripe->mutex (e.g., write_vc->mutex in read aggregation). This ensures the stripe_lock_contention metric only counts actual stripe mutex contention.

Add proxy.process.cache.writer.lock_contention to track contention on the writer VC mutex during read aggregation (separate from stripe mutex).

bryancall added 3 commits January 29, 2026 07:41

Fix stripe lock contention metric accuracy

4bde886

Add VC_SCHED_LOCK_RETRY_NO_METRIC() macro for lock retries that are not for stripe->mutex (e.g., write_vc->mutex in read aggregation). This ensures the stripe_lock_contention metric only counts actual stripe mutex contention.

Add writer lock contention metric

f488fe8

Add proxy.process.cache.writer.lock_contention to track contention on the writer VC mutex during read aggregation (separate from stripe mutex).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cache stripe lock contention metric #12839

Add cache stripe lock contention metric #12839

bryancall commented Jan 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add cache stripe lock contention metric #12839

Are you sure you want to change the base?

Add cache stripe lock contention metric #12839

Conversation

bryancall commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Background

Benchmark Results

Usage

Implementation

Stripe Lock Contention Call Sites (VC_SCHED_LOCK_RETRY / VC_LOCK_RETRY_EVENT)

Writer Lock Contention Call Site (VC_SCHED_WRITER_LOCK_RETRY)

Files Changed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bryancall commented Jan 29, 2026 •

edited

Loading

Stripe Lock Contention Call Sites (`VC_SCHED_LOCK_RETRY` / `VC_LOCK_RETRY_EVENT`)

Writer Lock Contention Call Site (`VC_SCHED_WRITER_LOCK_RETRY`)