Crawlith is a high-performance Node.js + TypeScript CLI tool and Web Dashboard for crawling websites, auditing infrastructure, and generating internal link graphs.
- Crawls websites using BFS algorithm
- Respects
robots.txtand rate limits - Generates interactive D3.js visualization and HTML reports
- Unified export system (JSON, CSV, Markdown)
- Detects critical issues (orphans, broken links, redirect chains, crawl traps, soft 404s)
- Production-grade SQLite persistent storage for crawls (
~/.crawlith/crawlith.db) - Snapshot-based metrics and history tracking
- NEW: Interactive Web Dashboard for visualizing snapshots (
crawlith ui) - NEW: Deep infrastructure auditing (TLS, DNS, Security) (
crawlith audit) - NEW: Monorepo Architecture with core library and decoupled clients
As this is an npm workspaces monorepo:
npm install
npm run buildTo use the CLI globally or link it:
npm link ./plugins/cli(Note: In development, you can use npm run crawlith -- <command> to run the CLI directly from the root).
Crawl a site and run analysis:
crawlith crawl https://example.com [options]Launch the interactive web dashboard to review your recent crawls:
crawlith uiRun transport, TLS, and DNS checks against a target:
crawlith audit https://example.comTo export the latest completed crawl data for a domain using the unified exporter:
crawlith export https://example.com --export json,html,visualize,csv--limit <number>: Max pages (default: 500)--depth <number>: Max click depth (default: 5)--output <path>: Output directory--incremental: Re-crawl efficiently using ETag/Last-Modified caching from the last snapshot--detect-soft404: Detect soft 404 pages using heuristics--detect-traps: Identify limitless dynamic parameter spaces--export [formats]: Comma-separated list for export generation--fail-on-critical: Exit with code 1 if critical issues are found--format <type>: Output format (prettyorjson). Default:pretty--log-level <level>: Logging level (normal,verbose,debug). Default:normal
Crawlith is divided into workspaces:
plugins/core: The heavy-lifting engine (Database, Graph algorithms, Crawler, Security boundaries).plugins/cli: The terminal user interface.plugins/web: The React-based dashboard frontend.
All data is stored locally in an SQLite database at ~/.crawlith/crawlith.db.
npm install
npm run build
npm run testMIT