refactor: Backpressure system lifecycle enforcement and test coverage #488

hongquanli · 2026-02-01T08:17:31Z

Summary

Add lifecycle tracking with is_closed property and _warn_if_closed() helper
Use time.monotonic() instead of time.time() for timeout calculations
Add __all__ export list for explicit public API
Improve docstrings for properties (document None after close)
Make get_stats() atomic with nested locks
Simplify ownership model using BackpressureValues tuple instead of wrapper class
Add create_backpressure_values() factory function for pre-warming
Fix TOCTOU races with capture-reference pattern throughout
Convert print() statements to proper logging in multi_point_worker.py

Test Coverage (12 new tests)

Operations on closed controller (all methods return safe defaults)
Constructor with bp_values parameter (pre-warming path)
reset() warning when jobs pending
close() wakes blocked wait_for_capacity() threads
Concurrent close() thread safety

Test plan

All 26 BackpressureController tests pass
Manual testing in simulation mode - acquisition finishes promptly

🤖 Generated with Claude Code

…test coverage Backpressure Controller: - Add lifecycle tracking with is_closed property and _warn_if_closed() helper - Use time.monotonic() instead of time.time() for timeout calculations - Add __all__ export list for explicit public API - Improve docstrings for properties (document None after close) - Add lock ordering comments and clear/wait pattern explanation - Make get_stats() atomic with nested locks Code Quality: - Simplify ownership model by using BackpressureValues tuple instead of wrapper class - Add create_backpressure_values() factory function for pre-warming - Fix TOCTOU races with capture-reference pattern throughout - Convert print() statements to proper logging in multi_point_worker.py Test Coverage (12 new tests): - Operations on closed controller (all methods return safe defaults) - Constructor with bp_values parameter (pre-warming path) - reset() warning when jobs pending - close() wakes blocked wait_for_capacity() threads - Concurrent close() thread safety Tools: - Add measure_subprocess_startup.py for JobRunner startup time measurement Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Copilot

Pull request overview

This PR refactors the acquisition backpressure system to make its lifecycle safer and better defined, introduces support for pre-warmed JobRunner subprocesses that share backpressure state with controllers, and adds targeted tests and tooling to validate and measure these behaviours.

Changes:

Introduces a BackpressureValues tuple plus create_backpressure_values() factory, adds explicit lifecycle tracking (is_closed) and closed-safe behaviour to BackpressureController, and switches throttling timeouts to time.monotonic().
Extends JobRunner with readiness signalling (_ready_event, wait_ready, is_ready), more robust shutdown semantics, and safer Zarr writer cleanup, and wires this into MultiPointWorker/MultiPointController to optionally use a pre-warmed runner that shares backpressure primitives with the worker.
Adds a CLI tool to measure JobRunner subprocess startup time and 12 new backpressure tests to cover closed-controller semantics, pre-created backpressure values, logging on unsafe reset, and thread-safety/notification behaviour.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
`software/control/core/backpressure.py`	Refactors `BackpressureController` to use a shared `BackpressureValues` tuple with an explicit lifecycle (`is_closed`), closed-safe accessors, atomic `get_stats()`, and monotonic timeout handling, plus adds a `create_backpressure_values()` factory and explicit public API via `__all__`.
`software/tests/control/core/test_backpressure.py`	Expands test coverage for the updated backpressure lifecycle, including operations on closed controllers, use of externally supplied `bp_values`, logging on unsafe `reset()`, thread-safe `close()`, and wakeup behaviour for blocked `wait_for_capacity()`.
`software/control/core/job_processing.py`	Adds `_ready_event` to `JobRunner` with `wait_ready()`/`is_ready()`, exposes setters for acquisition and Zarr writer info for use with pre-warmed runners, improves `shutdown()` with a queue sentinel, and makes `ZarrWriter.clear_writers()` robust via `try/finally` dictionary clearing.
`software/control/core/multi_point_worker.py`	Integrates the new `BackpressureValues` ownership model, optionally reuses a pre-warmed `JobRunner` and its shared primitives for the first job class, refactors job-runner shutdown to be non-blocking with round-robin pending checks, and replaces `print()`s with structured logging.
`software/control/core/multi_point_controller.py`	Pre-warms a `JobRunner` subprocess and matching `BackpressureValues` at controller initialization, exposes them via `get_prewarmed_job_runner()` for single-use handoff to `MultiPointWorker`, cleans them up safely on errors and controller close, and moves memory monitoring shutdown into a background thread.
`software/tools/measure_subprocess_startup.py`	Adds a standalone script to measure cold startup time of the `JobRunner` subprocess (create, start, ready, shutdown) over multiple iterations to quantify the benefit of pre-warming.

Comments suppressed due to low confidence (1)

software/tests/control/core/test_backpressure.py:289

This import of module threading is redundant, as it was previously imported on line 8.

        import threading

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Update DownsampledViewJob section: bytes now released immediately - Simplify Counter Lifecycle section - Add JobRunner Pre-warming section with known limitation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The TestMultiPointControllerCloseMethod tests use a mock controller that was missing the _prewarmed_job_runner and _prewarmed_bp_values attributes added in the main refactor. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

software/tests/control/core/test_backpressure.py:289

This import of module threading is redundant, as it was previously imported on line 8.

        import threading

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-01T09:33:07Z

software/docs/development/acquisition-backpressure.md

+```
+Timeline (with pre-warming):
+  [Acquisition completes] → [Pre-warm starts] → [~1.2s] → [Runner ready]
+                                                              ↓
+  [User clicks Start] ────────────────────────────────→ [Instant start]


The pre-warming timeline and "Known limitation" text here describe pre-warm starting after an acquisition completes, but the implementation in MultiPointController actually starts the next pre-warmed JobRunner when get_prewarmed_job_runner() is called at the beginning of run_acquisition (immediately when the next acquisition is started). To avoid confusion for developers trying to reason about warm-up latency, consider updating this section to match the actual behavior: a new pre-warm cycle begins when the worker for the current acquisition is created, not only after the previous acquisition has finished.

[Claude Code] Fixed in commit 184a9c1 - Updated the timeline diagram and known limitation text to accurately describe that pre-warming for N+1 starts when acquisition N begins, not when it completes.

Copilot · 2026-02-01T09:33:08Z

software/control/core/multi_point_controller.py

+        Known limitation: Pre-warming is started when the previous acquisition
+        completes. If the next acquisition starts before pre-warming finishes
+        (~1.2s), the worker will wait for the subprocess. This only affects
+        rapid-fire manual clicking; real workloads (full plate scans, time-lapse
+        with intervals >2s) are unaffected. Future optimization: start pre-warming
+        when the runner is handed off, not when acquisition completes.


The docstring here states that pre-warming is started "when the previous acquisition completes", but the actual control flow starts a new pre-warmed JobRunner in get_prewarmed_job_runner(), which is called at the start of run_acquisition (i.e., when a new acquisition is kicked off). This mismatch can be confusing for future readers; consider updating the description/"Known limitation" text to reflect that the next pre-warm cycle is started when the worker for the current acquisition is created, not when the previous acquisition finishes.

Suggested change

Known limitation: Pre-warming is started when the previous acquisition

completes. If the next acquisition starts before pre-warming finishes

(~1.2s), the worker will wait for the subprocess. This only affects

rapid-fire manual clicking; real workloads (full plate scans, time-lapse

with intervals >2s) are unaffected. Future optimization: start pre-warming

when the runner is handed off, not when acquisition completes.

Known limitation: The next pre-warm cycle is started when the worker for

the current acquisition is created (i.e., at the start of

``run_acquisition``), not immediately when the previous acquisition

finishes. If the next acquisition starts before pre-warming finishes

(~1.2s), the worker will wait for the subprocess. This only affects

rapid-fire manual clicking; real workloads (full plate scans, time-lapse

with intervals >2s) are unaffected. Future optimization: trigger

pre-warming for the next acquisition as soon as the previous one

completes.

[Claude Code] Fixed in commit 184a9c1 - Updated the docstring to correctly state that pre-warming for the NEXT acquisition starts when the CURRENT acquisition begins (when get_prewarmed_job_runner() is called).

- backpressure.py: Log when capacity event set fails during close - job_processing.py: Log when shutdown sentinel can't be sent - multi_point_controller.py: Replace bare except clauses with specific exceptions and logging for objective info retrieval - multi_point_controller.py: Add try-except to background memory monitor stop - multi_point_worker.py: Add try-except to daemon shutdown thread These fixes address code review findings where exceptions could be silently swallowed in background threads and callbacks. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The documentation incorrectly stated that pre-warming starts when the previous acquisition completes. In reality, pre-warming for acquisition N+1 starts when acquisition N begins (when get_prewarmed_job_runner() is called and hands off the runner). Updated both the docstring in multi_point_controller.py and the acquisition-backpressure.md documentation to accurately describe this behavior. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

hongquanli requested a review from Copilot February 1, 2026 08:19

Copilot started reviewing on behalf of hongquanli February 1, 2026 08:20 View session

Copilot AI reviewed Feb 1, 2026

View reviewed changes

hongquanli and others added 2 commits February 1, 2026 00:44

docs: Add known limitation comment for pre-warming timing

dbfceda

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

docs: Update acquisition-backpressure.md with current behavior

e797df9

- Update DownsampledViewJob section: bytes now released immediately - Simplify Counter Lifecycle section - Add JobRunner Pre-warming section with known limitation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

hongquanli requested a review from Copilot February 1, 2026 09:26

Copilot started reviewing on behalf of hongquanli February 1, 2026 09:27 View session

fix: Add missing prewarmed attributes to test mock

3d83861

The TestMultiPointControllerCloseMethod tests use a mock controller that was missing the _prewarmed_job_runner and _prewarmed_bp_values attributes added in the main refactor. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Copilot AI reviewed Feb 1, 2026

View reviewed changes

hongquanli and others added 2 commits February 1, 2026 01:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: Backpressure system lifecycle enforcement and test coverage #488

refactor: Backpressure system lifecycle enforcement and test coverage #488

Uh oh!

hongquanli commented Feb 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 1, 2026

Uh oh!

hongquanli Feb 1, 2026

Uh oh!

Copilot AI Feb 1, 2026

Uh oh!

hongquanli Feb 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

refactor: Backpressure system lifecycle enforcement and test coverage #488

Are you sure you want to change the base?

refactor: Backpressure system lifecycle enforcement and test coverage #488

Uh oh!

Conversation

hongquanli commented Feb 1, 2026

Summary

Test Coverage (12 new tests)

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

hongquanli Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

hongquanli Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants