Skip to content

Add disc15-19 fixtures and replace brittle heuristics with multi-signal detection#24

Merged
yxbh merged 3 commits intomainfrom
feature/disc15-19-fixtures-and-heuristic-fixes
Feb 28, 2026
Merged

Add disc15-19 fixtures and replace brittle heuristics with multi-signal detection#24
yxbh merged 3 commits intomainfrom
feature/disc15-19-fixtures-and-heuristic-fixes

Conversation

@yxbh
Copy link
Owner

@yxbh yxbh commented Feb 28, 2026

Summary

Add five new Blu-ray disc test fixtures (disc15–19) covering chapter-split episodes, OVA specials, digital archives, and single-movie discs. Replace brittle hard-coded thresholds in the analysis pipeline with multi-signal approaches that combine structural evidence from disc navigation with shape heuristics. Test count increases from 307 to 321, all passing.

Background

While adding fixtures from the "08th MS Team" Blu-ray set, several analysis bugs were uncovered where heuristics used single arbitrary thresholds that failed on new disc patterns. This PR fixes those bugs and replaces the thresholds with principled multi-signal detection that combines structural evidence (IG chapter marks, title hints, audio stream metadata) with shape heuristics.

Changes

Analysis pipeline fixes (bdpl/analyze/__init__.py)

  • Variant collapse: _maybe_collapse_variant_episodes() now checks clip overlap before collapsing — prevents unrelated duplicate specials from triggering episode collapse
  • Commentary false positive: Skip commentary detection when an IG page has buttons targeting both episode and non-episode playlists (navigation page, not commentary)
  • Digital archive dedup: Normalize ch_start=0None for digital_archive category so IG-derived and title-hint entries deduplicate
  • Pass IG chapter marks to order_episodes() and title-hint MPLS set to classify_playlists()

Chapter-split detection (bdpl/analyze/ordering.py)

  • _episodes_from_chapters() accepts optional ig_chapter_marks parameter
  • Splitting requires either IG mark confirmation (structural evidence) or est_count >= 3 (strong duration signal)
  • Prevents ~50 min single movies from being incorrectly split into 2 episodes

Digital archive detection (bdpl/analyze/classify.py)

Three independent signals, any of which lowers the item-count floor from 20 to 5 when combined with base shape checks (avg ≤ 0.5s, unique ratio ≥ 0.8):

Signal Source What it proves
Item count ≥ 20 Playlist shape Strong shape, sufficient alone
Title hint index.bdmv navigation Disc considers it real content
No audio streams Play item codecs Still images, not video

New fixtures (tests/fixtures/disc15–19/)

  • disc15: 4 chapter-split episodes, 0 specials
  • disc16: 4 chapter-split episodes, 4 specials (2 extras + 2 creditless EDs)
  • disc17: 1 OVA episode + 1 digital archive (44 still-image items)
  • disc18: 1 movie + 2 specials (1 extra + 1 creditless ED)
  • disc19: 1 OVA episode + 1 digital archive (17 items, hint-backed detection)

Test infrastructure

  • 5 per-disc integration test files
  • Session-scoped fixtures in conftest.py
  • All 6 matrix parametrizations updated

Testing

  • ruff check . — all checks passed
  • ruff format --check . — 58 files already formatted
  • pytest tests/ -q — 321 passed in ~2s
  • Manual verification of each disc against expected episode/special counts from Blu-ray menus

Additional Notes

  • All fixtures contain only structural metadata (MPLS, CLPI, index.bdmv, MovieObject.bdmv, ICS data, generic disc title XML) — no copyrighted media content
  • Fixture directories use generic names (disc15–19) with anonymized titles (TEST DISC N)

yxbh and others added 3 commits February 28, 2026 20:51
Replace brittle hard-coded thresholds with multi-signal approaches that
combine structural evidence from disc navigation with shape heuristics.

Chapter-split detection (ordering.py):
- Accept optional ig_chapter_marks from IG menu buttons
- Require est_count >= 3 OR IG mark confirmation for splitting
- Prevents single ~50min movies from being split into 2 episodes

Digital archive detection (classify.py):
- Three independent signals lower the item-count floor:
  1. Item count >= 20 (strong shape, sufficient alone)
  2. Title hint from disc navigation (lowers floor to 5)
  3. No audio streams in play items (lowers floor to 5)
- All combined with base shape checks (avg <= 0.5s, unique ratio >= 0.8)

Analysis pipeline fixes (__init__.py):
- Pass IG chapter marks through to order_episodes()
- Compute title_hint_mpls set from hints for classify_playlists()
- Fix variant collapse to check clip overlap before collapsing
- Skip commentary detection on navigation pages with mixed targets
- Normalize ch_start=0 to None for digital_archive dedup

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Five new Blu-ray disc fixtures covering chapter-split episodes, OVA
specials, digital archives, and single-movie discs:

- disc15: 4 chapter-split episodes, no specials (similar to disc14)
- disc16: 4 chapter-split episodes + 4 specials (2 extras, 2 creditless EDs)
- disc17: 1 OVA episode + 1 digital archive (44 still-image items)
- disc18: 1 movie + 2 specials (1 extra, 1 creditless ED)
- disc19: 1 OVA episode + 1 digital archive (17 items, hint-backed)

Each fixture contains only structural metadata (MPLS, CLPI, index.bdmv,
MovieObject.bdmv, ICS menu data, generic disc title XML). No copyrighted
media content is included.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per-disc integration tests for each new fixture:
- test_disc15_scan.py: chapter-split, 4 episodes, no specials
- test_disc16_scan.py: chapter-split, 4 episodes, 4 specials
- test_disc17_scan.py: single OVA + digital archive (44 items)
- test_disc18_scan.py: single movie + 2 specials
- test_disc19_scan.py: single OVA + hint-backed digital archive (17 items)

Updated conftest.py with session-scoped path and analysis fixtures.
Updated all 6 parametrizations in test_disc_matrix.py.

Test count: 307 -> 321, all passing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@yxbh yxbh merged commit 409e651 into main Feb 28, 2026
1 check passed
@yxbh yxbh deleted the feature/disc15-19-fixtures-and-heuristic-fixes branch February 28, 2026 11:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant