Skip to content

Conversation

@hongquanli
Copy link
Contributor

Summary

  • Add lifecycle tracking with is_closed property and _warn_if_closed() helper
  • Use time.monotonic() instead of time.time() for timeout calculations
  • Add __all__ export list for explicit public API
  • Improve docstrings for properties (document None after close)
  • Make get_stats() atomic with nested locks
  • Simplify ownership model using BackpressureValues tuple instead of wrapper class
  • Add create_backpressure_values() factory function for pre-warming
  • Fix TOCTOU races with capture-reference pattern throughout
  • Convert print() statements to proper logging in multi_point_worker.py

Test Coverage (12 new tests)

  • Operations on closed controller (all methods return safe defaults)
  • Constructor with bp_values parameter (pre-warming path)
  • reset() warning when jobs pending
  • close() wakes blocked wait_for_capacity() threads
  • Concurrent close() thread safety

Test plan

  • All 26 BackpressureController tests pass
  • Manual testing in simulation mode - acquisition finishes promptly

🤖 Generated with Claude Code

…test coverage

Backpressure Controller:
- Add lifecycle tracking with is_closed property and _warn_if_closed() helper
- Use time.monotonic() instead of time.time() for timeout calculations
- Add __all__ export list for explicit public API
- Improve docstrings for properties (document None after close)
- Add lock ordering comments and clear/wait pattern explanation
- Make get_stats() atomic with nested locks

Code Quality:
- Simplify ownership model by using BackpressureValues tuple instead of wrapper class
- Add create_backpressure_values() factory function for pre-warming
- Fix TOCTOU races with capture-reference pattern throughout
- Convert print() statements to proper logging in multi_point_worker.py

Test Coverage (12 new tests):
- Operations on closed controller (all methods return safe defaults)
- Constructor with bp_values parameter (pre-warming path)
- reset() warning when jobs pending
- close() wakes blocked wait_for_capacity() threads
- Concurrent close() thread safety

Tools:
- Add measure_subprocess_startup.py for JobRunner startup time measurement

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the acquisition backpressure system to make its lifecycle safer and better defined, introduces support for pre-warmed JobRunner subprocesses that share backpressure state with controllers, and adds targeted tests and tooling to validate and measure these behaviours.

Changes:

  • Introduces a BackpressureValues tuple plus create_backpressure_values() factory, adds explicit lifecycle tracking (is_closed) and closed-safe behaviour to BackpressureController, and switches throttling timeouts to time.monotonic().
  • Extends JobRunner with readiness signalling (_ready_event, wait_ready, is_ready), more robust shutdown semantics, and safer Zarr writer cleanup, and wires this into MultiPointWorker/MultiPointController to optionally use a pre-warmed runner that shares backpressure primitives with the worker.
  • Adds a CLI tool to measure JobRunner subprocess startup time and 12 new backpressure tests to cover closed-controller semantics, pre-created backpressure values, logging on unsafe reset, and thread-safety/notification behaviour.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
software/control/core/backpressure.py Refactors BackpressureController to use a shared BackpressureValues tuple with an explicit lifecycle (is_closed), closed-safe accessors, atomic get_stats(), and monotonic timeout handling, plus adds a create_backpressure_values() factory and explicit public API via __all__.
software/tests/control/core/test_backpressure.py Expands test coverage for the updated backpressure lifecycle, including operations on closed controllers, use of externally supplied bp_values, logging on unsafe reset(), thread-safe close(), and wakeup behaviour for blocked wait_for_capacity().
software/control/core/job_processing.py Adds _ready_event to JobRunner with wait_ready()/is_ready(), exposes setters for acquisition and Zarr writer info for use with pre-warmed runners, improves shutdown() with a queue sentinel, and makes ZarrWriter.clear_writers() robust via try/finally dictionary clearing.
software/control/core/multi_point_worker.py Integrates the new BackpressureValues ownership model, optionally reuses a pre-warmed JobRunner and its shared primitives for the first job class, refactors job-runner shutdown to be non-blocking with round-robin pending checks, and replaces print()s with structured logging.
software/control/core/multi_point_controller.py Pre-warms a JobRunner subprocess and matching BackpressureValues at controller initialization, exposes them via get_prewarmed_job_runner() for single-use handoff to MultiPointWorker, cleans them up safely on errors and controller close, and moves memory monitoring shutdown into a background thread.
software/tools/measure_subprocess_startup.py Adds a standalone script to measure cold startup time of the JobRunner subprocess (create, start, ready, shutdown) over multiple iterations to quantify the benefit of pre-warming.
Comments suppressed due to low confidence (1)

software/tests/control/core/test_backpressure.py:289

  • This import of module threading is redundant, as it was previously imported on line 8.
        import threading

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

hongquanli and others added 2 commits February 1, 2026 00:44
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update DownsampledViewJob section: bytes now released immediately
- Simplify Counter Lifecycle section
- Add JobRunner Pre-warming section with known limitation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The TestMultiPointControllerCloseMethod tests use a mock controller
that was missing the _prewarmed_job_runner and _prewarmed_bp_values
attributes added in the main refactor.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

software/tests/control/core/test_backpressure.py:289

  • This import of module threading is redundant, as it was previously imported on line 8.
        import threading

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 151 to 155
```
Timeline (with pre-warming):
[Acquisition completes] → [Pre-warm starts] → [~1.2s] → [Runner ready]
[User clicks Start] ────────────────────────────────→ [Instant start]
Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pre-warming timeline and "Known limitation" text here describe pre-warm starting after an acquisition completes, but the implementation in MultiPointController actually starts the next pre-warmed JobRunner when get_prewarmed_job_runner() is called at the beginning of run_acquisition (immediately when the next acquisition is started). To avoid confusion for developers trying to reason about warm-up latency, consider updating this section to match the actual behavior: a new pre-warm cycle begins when the worker for the current acquisition is created, not only after the previous acquisition has finished.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code] Fixed in commit 184a9c1 - Updated the timeline diagram and known limitation text to accurately describe that pre-warming for N+1 starts when acquisition N begins, not when it completes.

Comment on lines 271 to 276
Known limitation: Pre-warming is started when the previous acquisition
completes. If the next acquisition starts before pre-warming finishes
(~1.2s), the worker will wait for the subprocess. This only affects
rapid-fire manual clicking; real workloads (full plate scans, time-lapse
with intervals >2s) are unaffected. Future optimization: start pre-warming
when the runner is handed off, not when acquisition completes.
Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring here states that pre-warming is started "when the previous acquisition completes", but the actual control flow starts a new pre-warmed JobRunner in get_prewarmed_job_runner(), which is called at the start of run_acquisition (i.e., when a new acquisition is kicked off). This mismatch can be confusing for future readers; consider updating the description/"Known limitation" text to reflect that the next pre-warm cycle is started when the worker for the current acquisition is created, not when the previous acquisition finishes.

Suggested change
Known limitation: Pre-warming is started when the previous acquisition
completes. If the next acquisition starts before pre-warming finishes
(~1.2s), the worker will wait for the subprocess. This only affects
rapid-fire manual clicking; real workloads (full plate scans, time-lapse
with intervals >2s) are unaffected. Future optimization: start pre-warming
when the runner is handed off, not when acquisition completes.
Known limitation: The next pre-warm cycle is started when the worker for
the current acquisition is created (i.e., at the start of
``run_acquisition``), not immediately when the previous acquisition
finishes. If the next acquisition starts before pre-warming finishes
(~1.2s), the worker will wait for the subprocess. This only affects
rapid-fire manual clicking; real workloads (full plate scans, time-lapse
with intervals >2s) are unaffected. Future optimization: trigger
pre-warming for the next acquisition as soon as the previous one
completes.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code] Fixed in commit 184a9c1 - Updated the docstring to correctly state that pre-warming for the NEXT acquisition starts when the CURRENT acquisition begins (when get_prewarmed_job_runner() is called).

hongquanli and others added 2 commits February 1, 2026 01:49
- backpressure.py: Log when capacity event set fails during close
- job_processing.py: Log when shutdown sentinel can't be sent
- multi_point_controller.py: Replace bare except clauses with specific
  exceptions and logging for objective info retrieval
- multi_point_controller.py: Add try-except to background memory monitor stop
- multi_point_worker.py: Add try-except to daemon shutdown thread

These fixes address code review findings where exceptions could be
silently swallowed in background threads and callbacks.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The documentation incorrectly stated that pre-warming starts when the
previous acquisition completes. In reality, pre-warming for acquisition
N+1 starts when acquisition N begins (when get_prewarmed_job_runner()
is called and hands off the runner).

Updated both the docstring in multi_point_controller.py and the
acquisition-backpressure.md documentation to accurately describe this
behavior.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants