-
Notifications
You must be signed in to change notification settings - Fork 0
Add encoder health check and refactor error handling #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Monitors the encoder pipeline for hung ab-av1/ffmpeg processes that produce no progress. Warns at 30 minutes, automatically kills the stuck process at 1 hour to allow the pipeline to continue. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Shows flash error notifications when the encoder health check detects a stalled encoder (30 min warning) or kills a stuck process (1 hour). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- PostProcessor: Always attempt Sonarr notification even if DB operations fail (fixes silent failure where encode succeeded but Sonarr was never notified) - encode.ex: Replace 85-line manual retry loop with Core.Retry (exponential backoff with jitter, max 5 attempts, then move on) - health_check.ex: Switch to event-based monitoring via PubSub instead of polling encoder state with :sys.get_state(). Only accesses encoder state when killing a stuck process. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
notify_encoder_failure was a no-op; now delegates to PostProcessor.process_encoding_failure which records failures for visibility in the dashboard. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request adds encoder health monitoring and refactors error handling to improve reliability. The changes introduce a new HealthCheck GenServer that monitors the encoder pipeline for stuck states, automatically killing processes that show no progress for 60 minutes. Error handling is simplified by replacing manual retry loops with the Core.Retry module, and the PostProcessor is improved to ensure Sonarr sync attempts even when database operations fail.
Changes:
- Added HealthCheck GenServer to monitor encoder for stuck states (warns at 30 min, kills at 60 min)
- Replaced manual retry loop in encoder with Core.Retry module
- Enhanced PostProcessor to attempt Sonarr sync even on DB failures
- Added dashboard alerts for encoder health issues
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| lib/reencodarr/encoder/health_check.ex | New GenServer that monitors encoder via PubSub events and kills stuck processes |
| lib/reencodarr/application.ex | Added HealthCheck to supervision tree |
| lib/reencodarr/ab_av1/encode.ex | Replaced manual retry logic with Core.Retry, now calls process_encoding_failure |
| lib/reencodarr/post_processor.ex | Ensures Sonarr sync attempted even when video reload or DB update fails |
| lib/reencodarr_web/live/dashboard_live.ex | Added handler for encoder health alerts with flash messages |
- Reset all state fields when killing stuck encoder (not just encoding flag) - Replace :sys.get_state anti-pattern with proper PubSub approach: - Encode broadcasts os_pid when encoding starts - HealthCheck tracks os_pid from events instead of introspecting state - Use Task.start for async kill command to avoid blocking GenServer - Fix Path.basename(nil) error in dashboard health alert handler - Add clarifying comment about why failures don't notify Sonarr Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
🤖 Generated with Claude Code