Skip to content

Conversation

@mjc
Copy link
Owner

@mjc mjc commented Jan 28, 2026

  • Add health check that warns at 30 min, kills stuck encoder at 1 hour
  • Display alerts on dashboard
  • Fix silent failures: Sonarr now always notified after encode
  • Replace manual retry loop with Core.Retry
  • Record encoding failures in FailureTracker

🤖 Generated with Claude Code

mjc and others added 4 commits January 28, 2026 10:51
Monitors the encoder pipeline for hung ab-av1/ffmpeg processes that
produce no progress. Warns at 30 minutes, automatically kills the
stuck process at 1 hour to allow the pipeline to continue.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Shows flash error notifications when the encoder health check
detects a stalled encoder (30 min warning) or kills a stuck
process (1 hour).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- PostProcessor: Always attempt Sonarr notification even if DB
  operations fail (fixes silent failure where encode succeeded
  but Sonarr was never notified)

- encode.ex: Replace 85-line manual retry loop with Core.Retry
  (exponential backoff with jitter, max 5 attempts, then move on)

- health_check.ex: Switch to event-based monitoring via PubSub
  instead of polling encoder state with :sys.get_state(). Only
  accesses encoder state when killing a stuck process.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
notify_encoder_failure was a no-op; now delegates to
PostProcessor.process_encoding_failure which records
failures for visibility in the dashboard.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings January 28, 2026 18:28
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds encoder health monitoring and refactors error handling to improve reliability. The changes introduce a new HealthCheck GenServer that monitors the encoder pipeline for stuck states, automatically killing processes that show no progress for 60 minutes. Error handling is simplified by replacing manual retry loops with the Core.Retry module, and the PostProcessor is improved to ensure Sonarr sync attempts even when database operations fail.

Changes:

  • Added HealthCheck GenServer to monitor encoder for stuck states (warns at 30 min, kills at 60 min)
  • Replaced manual retry loop in encoder with Core.Retry module
  • Enhanced PostProcessor to attempt Sonarr sync even on DB failures
  • Added dashboard alerts for encoder health issues

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
lib/reencodarr/encoder/health_check.ex New GenServer that monitors encoder via PubSub events and kills stuck processes
lib/reencodarr/application.ex Added HealthCheck to supervision tree
lib/reencodarr/ab_av1/encode.ex Replaced manual retry logic with Core.Retry, now calls process_encoding_failure
lib/reencodarr/post_processor.ex Ensures Sonarr sync attempted even when video reload or DB update fails
lib/reencodarr_web/live/dashboard_live.ex Added handler for encoder health alerts with flash messages

- Reset all state fields when killing stuck encoder (not just encoding flag)
- Replace :sys.get_state anti-pattern with proper PubSub approach:
  - Encode broadcasts os_pid when encoding starts
  - HealthCheck tracks os_pid from events instead of introspecting state
- Use Task.start for async kill command to avoid blocking GenServer
- Fix Path.basename(nil) error in dashboard health alert handler
- Add clarifying comment about why failures don't notify Sonarr

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@mjc mjc merged commit 63be0c8 into main Jan 28, 2026
1 check passed
@mjc mjc deleted the feature/encoder-health-check branch January 28, 2026 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant