Skip to content

Crawlith/.github

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

Crawlith

Crawlith is a modular, production-grade SEO crawl intelligence platform built for serious technical SEO analysis.

It is not another surface-level site scanner.
It is a structured crawl engine designed to extract graph intelligence, structural weaknesses, content risks, and authority signals from any website — at scale.


What Crawlith Is

Crawlith is a multi-package architecture composed of:

  • Core Engine – High-performance crawler and scoring system
  • CLI Interface – Power-user command-line control
  • Server Layer – Lightweight API for automation and integrations
  • Web UI – Clean, modern interface for visual analysis

It is designed to scale from:

  • Single-site audits
    to
  • Multi-site crawl intelligence environments.

Core Philosophy

Crawlith focuses on:

  • Structural crawl accuracy (BFS-based site graph)
  • Deterministic URL normalization
  • Scalable multi-site support
  • Snapshot-based comparisons
  • Clean, analyzable JSON / SQLite outputs
  • Modular extensibility (future plugin-ready)

No bloat. No vague “AI magic.”
Just measurable crawl intelligence.


Core Capabilities

Crawl Engine

  • Depth control
  • Page limits
  • Redirect handling
  • MIME filtering
  • Concurrency safeguards
  • Incremental crawling (ETag / Last-Modified when available)

Intelligence Layer

  • Authority scoring
  • Orphan detection
  • Crawl efficiency metrics
  • Entropy analysis
  • Hub identification
  • Content clustering (cannibalization detection)
  • Duplicate detection
  • Redirect chain analysis
  • Broken link detection
  • Canonical / noindex / nofollow extraction

Analysis (Planned / Expanding)

  • Title / Meta evaluation
  • H1 validation
  • Thin content detection
  • Image alt auditing
  • External link ratio
  • Structured data detection
  • E-E-A-T signal extraction
  • Hardware & HTTP diagnostics (SSL, HTTP/2, protocol validation)

Who It’s For

  • Technical SEO professionals
  • Agencies running repeated audits
  • Developers building SEO tooling
  • Teams managing large site architectures
  • Builders who want control over crawl data

If you just want a colorful dashboard with meaningless “health scores,” this isn’t it.

If you want crawl intelligence you can trust — welcome.


Design Principles

  • Deterministic > Probabilistic
  • Transparent scoring
  • Snapshot-based comparison model
  • Multi-site first-class support
  • Production-safe CLI design
  • Clean output formats (JSON / SQLite)
  • Extensible without breaking core

Long-Term Vision

Crawlith aims to become:

The open crawl intelligence engine powering serious SEO infrastructure.

CLI for precision.
Server for automation.
Web UI for clarity.
Core for power.


Status

Active development.
Architecture stabilized.
Modular migration complete.
Scaling intelligence layer.


License

To be defined based on distribution strategy (OSS / Hybrid / Pro modules).


Crawlith
Crawl deep. Think structurally.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors