Panelizer

Note: This project is archived and not actively maintained. The research and architecture documentation may still be useful to others working on similar problems.

A pragmatic, human-in-the-loop system for panel-by-panel comic reading from local, DRM-free files (CBZ/PDF), combining classic computer vision, modern foundation models, and a lightweight cross-platform viewer.

Motivation: there is currently no good open solution for Guided-View-style reading of local comics. Existing readers rely on heuristics; research code stops at detection; commercial solutions are locked ecosystems.

This project aims to close that gap.

Prior art

Detection tools (no integrated reader)

Tool	Method	Output	Notes
Kumiko	OpenCV contours	JSON bboxes	Active. Good reference for CV approach. Planned but unimplemented editor.
C.A.P.E.	OpenCV + Electron editor	JSON bboxes	Closest to this project's vision. Has human-in-loop editor. No viewer. Desktop only. Semi-abandoned.
DeepPanel	CNN (TFLite)	Bboxes	Mobile-optimized (Android/iOS libs). ~400ms/page. No ordering. Apache 2.0.
best-comic-panel-detection	YOLOv12	Bboxes	mAP ~99%. Apache 2.0. Drop-in for Stage 2.
segment-anything-comic	SAM 1-3	Polygons	Handles irregular panels. SAM 2 (6x faster) and SAM 3 (Text-to-seg) are viable fallbacks.
Magi	Deep learning	Panels + order + OCR	Only open tool that does reading order. Manga-focused. Apache 2.0.

Readers with panel detection

Tool	Detection	Viewer	Editing	Status
BDReader	Heuristics	Desktop (Qt)	None	Abandoned (~2015)
Comic Smart Panels	Manual only	Windows app	Full manual	Abandoned (2015)
Panels app	ML (proprietary)	iOS/Android	None	Commercial, experimental
Smart Comic Reader	Heuristics/CV	iOS	None	Commercial, active
Comic Trim	Heuristics	Android	None	Discontinued. ~25% accuracy.

Commercial

Comixology Guided View — Manual curation by publishers. Gold standard UX, but locked ecosystem.
Marvel Smart Panels — Similar to Comixology, proprietary.
Smart Comic Reader — Automated "Smart Mode" for iOS.

Gap analysis

Stage	Kumiko	C.A.P.E.	Magi	BDReader	Panels app	This project
CV detection	✓	✓	—	✓	—	✓
ML fallback	—	—	✓	—	✓	✓
Reading order	Heuristic	Heuristic	✓	Heuristic	?	✓ (VLM)
Human editing	Planned	✓	—	—	—	✓
Viewer	Basic	—	—	✓	✓	✓ (PWA)
Cross-platform	Desktop	Desktop	Desktop	Desktop	Mobile	Web/Mobile

Differentiation from C.A.P.E.: C.A.P.E. is desktop Electron, no viewer, no ML fallback, no reading order inference. This project is web-first (PWA), includes viewer, uses ML for hard pages, and optionally VLM for ordering.

High-level idea

Split the problem into what each technique is actually good at:

Classic CV (on device / offline) — Fast, cheap, interpretable. Handles most "easy" pages.
Deep learning (offline or cloud) — Robust panel segmentation for irregular layouts.
Vision-capable LLMs (cloud, optional) — Infer reading order using semantic + visual context.
Human-in-the-loop overrides — Required for correctness and UX.
Cross-platform PWA viewer — Reads cached metadata; no native code required.

Detection is treated as a proposal, not ground truth.

Architecture overview

CBZ / PDF
   │
   ▼
[ Page extraction ]
   │
   ├─▶ Classic CV (reference: Kumiko)
   │      ├─ confidence ≥ threshold → order heuristically → panels.json
   │      └─ confidence < threshold ↓
   │
   └─▶ ML detection (YOLO or SAM-comic)
          └─ panel bboxes/polygons
                 │
                 ├─▶ Heuristic ordering (row-major LTR/RTL)
                 │      └─ if unambiguous → panels.json
                 │
                 └─▶ VLM ordering (optional, for complex layouts)
                        └─ ordered panels → panels.json
                               │
                               ▼
                      PWA Viewer (panel-by-panel)
                               │
                               ▼
                      Human overrides → panels.json (updated)

All results are cached. Models are run once per book, not per read.

Stage 1 — Classic CV panel detection (on device)

Goal: cover the majority of pages without ML.

Pipeline (per page image):

Grayscale → Gaussian blur
Adaptive threshold (handles varying page backgrounds)
Contour detection → filter by area, aspect ratio
Gutter analysis (whitespace between panels)
Recursive XY-cut or connected components
Bounding box extraction
Heuristic ordering (row-major LTR, or RTL for manga)
Confidence estimation (contour clarity, gutter strength, layout regularity)

Reference implementation: Kumiko's approach, extended with confidence scoring.

Pros: Fast, offline, explainable, no API costs.

Cons: Fails on borderless panels, irregular layouts, art bleeding into gutters.

Output: panels with confidence scores; low-confidence pages flagged for Stage 2.

Stage 2 — ML panel detection (offline / cloud)

Goal: robust geometry for pages where CV fails.

Candidate models (ready to use):

Model	Output	Speed	Notes
best-comic-panel-detection	Bboxes	Fast	YOLOv12, mAP 99%, easiest integration
SAM 2 / 3	Polygons	Medium	SAM 2 is 6x faster than v1; SAM 3 adds text-prompting
DeepPanel	Bboxes	~400ms	Best for mobile/native if needed

Recommendation: Start with YOLO model. Add SAM-comic for irregular layouts if needed.

Trigger: CV confidence below threshold, or user-flagged pages.

Output: panel bboxes (or polygons), unordered.

Stage 3 — Reading order

Goal: determine panel sequence.

3a. Heuristic ordering (default)

Row-major: top-to-bottom, then LTR (western) or RTL (manga)
Works for 80%+ of pages
Free, instant, deterministic

3b. Learned ordering (for complex layouts)

Options:

Adapt Magi's ordering logic — extract and reuse their ordering component
VLM inference — send annotated image to GPT-4V/Claude, ask for order
Train lightweight classifier — predict layout type, apply corresponding rule

VLM approach:

Input: page image with labeled panel bboxes
Prompt: "Given panels A-F, provide reading order for [LTR/RTL] comic. Note any ambiguity."
Output: ordered list + confidence

Recommendation: Start with heuristics. Add VLM only for flagged pages (cost control).

Stage 4 — Human-in-the-loop overrides (essential)

No automation is perfect. This is what turns "90% correct" into "100% usable".

Reference: C.A.P.E.'s Electron editor (study the UI patterns).

The viewer must support:

Drag-to-reorder panels
Adjust panel boundaries (resize/move)
Merge panels (e.g., splash pages detected as multiple)
Split panels (single detection covering two)
Mark as "full page" (skip panel-by-panel)
Flag page for re-processing

Override behavior:

Stored separately from auto-detected data
Never overwritten by re-running detection
Exportable (for sharing corrections without sharing images)

Stage 5 — Cross-platform viewer (PWA)

Why PWA:

No native Android/iOS development
Installable ("Add to Home Screen")
Works on desktop, Android, iOS
File API sufficient for "open file" workflow

Core features:

Open local CBZ/PDF (via <input type="file">)
Load/display panels.json
Panel-by-panel navigation (tap/swipe/keys)
Smooth zoom transitions
Reading progress persistence (IndexedDB)
Override editing UI

Backend (optional):

Static hosting or FastHTML for app shell
Comics never leave device

iOS notes:

No File System Access API (can't "open folder")
PWA storage can be evicted under pressure
Workaround: select files individually, cache in IndexedDB

Data format (draft)

{
  "version": 1,
  "book_hash": "sha256:...",
  "pages": [
    {
      "index": 0,
      "size": [1800, 2700],
      "panels": [
        {"id": "p0-A", "bbox": [100, 50, 800, 600], "confidence": 0.95},
        {"id": "p0-B", "bbox": [900, 50, 800, 600], "confidence": 0.91}
      ],
      "order": ["p0-A", "p0-B"],
      "order_confidence": 0.88,
      "source": "cv",
      "user_override": false
    }
  ],
  "overrides": {
    "p0-A": {"bbox": [110, 55, 790, 590]},
    "page_3": {"order": ["p3-B", "p3-A", "p3-C"]}
  },
  "metadata": {
    "reading_direction": "ltr",
    "created": "2025-01-15T10:30:00+00:00",
    "tool_version": "0.1.0"
  }
}

Design notes:

overrides separate from auto-detected data (never clobbered)
book_hash for cache invalidation if source changes
source tracks provenance (cv / yolo / sam / vlm / manual)
Compatible with C.A.P.E.'s .cpanel format where possible

Non-goals

Replacing Comixology / Kindle / commercial readers
Perfect automation without human input
Real-time on-device deep learning (batch processing is fine)
DRM circumvention
Cloud-hosted comic storage
Native mobile apps (PWA is sufficient)

Open questions

Confidence calibration — how to reliably estimate when CV is "good enough"?
YOLO vs SAM — is bbox sufficient, or do we need polygons for irregular panels?
Magi integration — can we extract just the ordering component?
Override sharing — format for sharing corrections without copyrighted images?
C.A.P.E. compatibility — worth maintaining .cpanel format compatibility?

Roadmap

Phase 1: Core detection (2 weekends)

CV detector with confidence scoring (reference: Kumiko)
YOLO model integration (HuggingFace)
JSON schema + validation
CLI for batch processing CBZ

Phase 2: Viewer MVP (2-3 weekends)

PWA shell
Panel-by-panel navigation
Basic touch/swipe/keyboard controls
Reading progress persistence

Phase 3: Human editing (2 weekends)

Override UI (reference: C.A.P.E.)
Drag-to-reorder, resize, merge/split
Override persistence

Phase 4: Polish + ordering (ongoing)

Mobile gesture refinement
VLM ordering for flagged pages
SAM-comic for irregular panels
Override export/import

Status

📦 Archived

This project reached early prototyping stage with working CV and YOLO detection. Development is paused indefinitely, but the codebase and research documentation are available for reference.

Development

uv sync                  # install core dependencies
uv run ruff format .     # format
uv run ruff check .      # lint
uv run pyright           # type check
uv run pytest            # test (80% coverage required)

ML Dependencies

# CPU-only (CI, macOS, or no GPU)
uv sync --extra ml-cpu --extra dev

# CUDA (NVIDIA GPU) — two-step install
uv sync --extra ml --extra dev
uv pip install --reinstall torch torchvision --index-url https://download.pytorch.org/whl/cu124
uv run --no-sync panelizer preview ...  # use --no-sync to preserve CUDA wheels

See AGENTS.md for contribution guidelines.

References

Detection

Kumiko — CV panel extraction
C.A.P.E. — CV + editor (study the UI)
best-comic-panel-detection — YOLOv12
SAM 1-3 (Meta AI) — Foundation models for polygon segmentation
segment-anything-comic — SAM fine-tuned for comics
DeepPanel — Mobile CNN

Ordering

Magi — Panel ordering + OCR

Research

Max Halford's tutorial — scikit-image approach
Manga109 dataset
CoMix benchmark — multi-task comic understanding
Comics Understanding survey — comprehensive 2024 overview

Commercial (study the UX)

Panels app — ML-based guided view
Comixology Guided View — manual curation benchmark

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.github		.github
docs		docs
src/panelizer		src/panelizer
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Uh oh!

License

hummat/panelizer

Folders and files

Latest commit

History

Repository files navigation

Panelizer

Prior art

Detection tools (no integrated reader)

Readers with panel detection

Commercial

Gap analysis

High-level idea

Architecture overview

Stage 1 — Classic CV panel detection (on device)

Stage 2 — ML panel detection (offline / cloud)

Stage 3 — Reading order

3a. Heuristic ordering (default)

3b. Learned ordering (for complex layouts)

Stage 4 — Human-in-the-loop overrides (essential)

Stage 5 — Cross-platform viewer (PWA)

Data format (draft)

Non-goals

Open questions

Roadmap

Phase 1: Core detection (2 weekends)

Phase 2: Viewer MVP (2-3 weekends)

Phase 3: Human editing (2 weekends)

Phase 4: Polish + ordering (ongoing)

Status

Development

ML Dependencies

References

Detection

Ordering

Research

Commercial (study the UX)

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Contributors 3

Uh oh!

Languages

Packages