-
Notifications
You must be signed in to change notification settings - Fork 47
refactor: Backpressure system lifecycle enforcement and test coverage #488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…test coverage Backpressure Controller: - Add lifecycle tracking with is_closed property and _warn_if_closed() helper - Use time.monotonic() instead of time.time() for timeout calculations - Add __all__ export list for explicit public API - Improve docstrings for properties (document None after close) - Add lock ordering comments and clear/wait pattern explanation - Make get_stats() atomic with nested locks Code Quality: - Simplify ownership model by using BackpressureValues tuple instead of wrapper class - Add create_backpressure_values() factory function for pre-warming - Fix TOCTOU races with capture-reference pattern throughout - Convert print() statements to proper logging in multi_point_worker.py Test Coverage (12 new tests): - Operations on closed controller (all methods return safe defaults) - Constructor with bp_values parameter (pre-warming path) - reset() warning when jobs pending - close() wakes blocked wait_for_capacity() threads - Concurrent close() thread safety Tools: - Add measure_subprocess_startup.py for JobRunner startup time measurement Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors the acquisition backpressure system to make its lifecycle safer and better defined, introduces support for pre-warmed JobRunner subprocesses that share backpressure state with controllers, and adds targeted tests and tooling to validate and measure these behaviours.
Changes:
- Introduces a
BackpressureValuestuple pluscreate_backpressure_values()factory, adds explicit lifecycle tracking (is_closed) and closed-safe behaviour toBackpressureController, and switches throttling timeouts totime.monotonic(). - Extends
JobRunnerwith readiness signalling (_ready_event,wait_ready,is_ready), more robust shutdown semantics, and safer Zarr writer cleanup, and wires this intoMultiPointWorker/MultiPointControllerto optionally use a pre-warmed runner that shares backpressure primitives with the worker. - Adds a CLI tool to measure JobRunner subprocess startup time and 12 new backpressure tests to cover closed-controller semantics, pre-created backpressure values, logging on unsafe reset, and thread-safety/notification behaviour.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
software/control/core/backpressure.py |
Refactors BackpressureController to use a shared BackpressureValues tuple with an explicit lifecycle (is_closed), closed-safe accessors, atomic get_stats(), and monotonic timeout handling, plus adds a create_backpressure_values() factory and explicit public API via __all__. |
software/tests/control/core/test_backpressure.py |
Expands test coverage for the updated backpressure lifecycle, including operations on closed controllers, use of externally supplied bp_values, logging on unsafe reset(), thread-safe close(), and wakeup behaviour for blocked wait_for_capacity(). |
software/control/core/job_processing.py |
Adds _ready_event to JobRunner with wait_ready()/is_ready(), exposes setters for acquisition and Zarr writer info for use with pre-warmed runners, improves shutdown() with a queue sentinel, and makes ZarrWriter.clear_writers() robust via try/finally dictionary clearing. |
software/control/core/multi_point_worker.py |
Integrates the new BackpressureValues ownership model, optionally reuses a pre-warmed JobRunner and its shared primitives for the first job class, refactors job-runner shutdown to be non-blocking with round-robin pending checks, and replaces print()s with structured logging. |
software/control/core/multi_point_controller.py |
Pre-warms a JobRunner subprocess and matching BackpressureValues at controller initialization, exposes them via get_prewarmed_job_runner() for single-use handoff to MultiPointWorker, cleans them up safely on errors and controller close, and moves memory monitoring shutdown into a background thread. |
software/tools/measure_subprocess_startup.py |
Adds a standalone script to measure cold startup time of the JobRunner subprocess (create, start, ready, shutdown) over multiple iterations to quantify the benefit of pre-warming. |
Comments suppressed due to low confidence (1)
software/tests/control/core/test_backpressure.py:289
- This import of module threading is redundant, as it was previously imported on line 8.
import threading
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update DownsampledViewJob section: bytes now released immediately - Simplify Counter Lifecycle section - Add JobRunner Pre-warming section with known limitation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The TestMultiPointControllerCloseMethod tests use a mock controller that was missing the _prewarmed_job_runner and _prewarmed_bp_values attributes added in the main refactor. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
software/tests/control/core/test_backpressure.py:289
- This import of module threading is redundant, as it was previously imported on line 8.
import threading
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ``` | ||
| Timeline (with pre-warming): | ||
| [Acquisition completes] → [Pre-warm starts] → [~1.2s] → [Runner ready] | ||
| ↓ | ||
| [User clicks Start] ────────────────────────────────→ [Instant start] |
Copilot
AI
Feb 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pre-warming timeline and "Known limitation" text here describe pre-warm starting after an acquisition completes, but the implementation in MultiPointController actually starts the next pre-warmed JobRunner when get_prewarmed_job_runner() is called at the beginning of run_acquisition (immediately when the next acquisition is started). To avoid confusion for developers trying to reason about warm-up latency, consider updating this section to match the actual behavior: a new pre-warm cycle begins when the worker for the current acquisition is created, not only after the previous acquisition has finished.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Claude Code] Fixed in commit 184a9c1 - Updated the timeline diagram and known limitation text to accurately describe that pre-warming for N+1 starts when acquisition N begins, not when it completes.
| Known limitation: Pre-warming is started when the previous acquisition | ||
| completes. If the next acquisition starts before pre-warming finishes | ||
| (~1.2s), the worker will wait for the subprocess. This only affects | ||
| rapid-fire manual clicking; real workloads (full plate scans, time-lapse | ||
| with intervals >2s) are unaffected. Future optimization: start pre-warming | ||
| when the runner is handed off, not when acquisition completes. |
Copilot
AI
Feb 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring here states that pre-warming is started "when the previous acquisition completes", but the actual control flow starts a new pre-warmed JobRunner in get_prewarmed_job_runner(), which is called at the start of run_acquisition (i.e., when a new acquisition is kicked off). This mismatch can be confusing for future readers; consider updating the description/"Known limitation" text to reflect that the next pre-warm cycle is started when the worker for the current acquisition is created, not when the previous acquisition finishes.
| Known limitation: Pre-warming is started when the previous acquisition | |
| completes. If the next acquisition starts before pre-warming finishes | |
| (~1.2s), the worker will wait for the subprocess. This only affects | |
| rapid-fire manual clicking; real workloads (full plate scans, time-lapse | |
| with intervals >2s) are unaffected. Future optimization: start pre-warming | |
| when the runner is handed off, not when acquisition completes. | |
| Known limitation: The next pre-warm cycle is started when the worker for | |
| the current acquisition is created (i.e., at the start of | |
| ``run_acquisition``), not immediately when the previous acquisition | |
| finishes. If the next acquisition starts before pre-warming finishes | |
| (~1.2s), the worker will wait for the subprocess. This only affects | |
| rapid-fire manual clicking; real workloads (full plate scans, time-lapse | |
| with intervals >2s) are unaffected. Future optimization: trigger | |
| pre-warming for the next acquisition as soon as the previous one | |
| completes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Claude Code] Fixed in commit 184a9c1 - Updated the docstring to correctly state that pre-warming for the NEXT acquisition starts when the CURRENT acquisition begins (when get_prewarmed_job_runner() is called).
- backpressure.py: Log when capacity event set fails during close - job_processing.py: Log when shutdown sentinel can't be sent - multi_point_controller.py: Replace bare except clauses with specific exceptions and logging for objective info retrieval - multi_point_controller.py: Add try-except to background memory monitor stop - multi_point_worker.py: Add try-except to daemon shutdown thread These fixes address code review findings where exceptions could be silently swallowed in background threads and callbacks. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The documentation incorrectly stated that pre-warming starts when the previous acquisition completes. In reality, pre-warming for acquisition N+1 starts when acquisition N begins (when get_prewarmed_job_runner() is called and hands off the runner). Updated both the docstring in multi_point_controller.py and the acquisition-backpressure.md documentation to accurately describe this behavior. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary
is_closedproperty and_warn_if_closed()helpertime.monotonic()instead oftime.time()for timeout calculations__all__export list for explicit public APIget_stats()atomic with nested locksBackpressureValuestuple instead of wrapper classcreate_backpressure_values()factory function for pre-warmingprint()statements to proper logging in multi_point_worker.pyTest Coverage (12 new tests)
bp_valuesparameter (pre-warming path)reset()warning when jobs pendingclose()wakes blockedwait_for_capacity()threadsclose()thread safetyTest plan
🤖 Generated with Claude Code