Is there a way to estimate or predict deep crawl completion time in Crawl4AI? #1741

aman-chicmic · 2026-01-27T09:05:37Z

aman-chicmic
Jan 27, 2026

Hi everyone,

I’m using Crawl4AI for deep crawling websites and was wondering if there’s a recommended way to determine or estimate the crawl completion time.

Since the total number of pages and links is often unknown before the crawl starts, is there:

Any built-in support for ETA or progress estimation?
A best practice for estimating remaining time during an active crawl?
Suggested strategies (e.g., sampling, limits, metrics) to make crawl duration more predictable?

I’m currently relying on depth and page limits, but I’d like to better understand if dynamic ETA calculation or progress tracking is possible or recommended.

Thanks in advance for any insights or guidance!

aman-chicmic · 2026-01-27T09:07:34Z

aman-chicmic
Jan 27, 2026
Author

@unclecode Can you please review my above question, Thanks!

0 replies

ntohidi · 2026-01-27T10:41:59Z

ntohidi
Jan 27, 2026
Collaborator Sponsor

@aman-chicmic

Hey! There's no built-in ETA for deep crawl since the total number of pages is unknown upfront. But you can track progress with stream=True:

import asyncio
import time
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, BrowserConfig
from crawl4ai.deep_crawling import BFSDeepCrawlStrategy

async def deep_crawl_with_progress():
    max_pages = 30  # Set limit for predictability

    strategy = BFSDeepCrawlStrategy(
        max_depth=2,
        include_external=False,
        max_pages=max_pages,
    )

    config = CrawlerRunConfig(
        deep_crawl_strategy=strategy,
        prefetch=True,  # Fast mode
        stream=True,    # Required for progress tracking
    )

    start_time = time.time()
    count = 0

    async with AsyncWebCrawler(config=BrowserConfig(headless=True)) as crawler:
        async for result in await crawler.arun(
            "https://docs.crawl4ai.com/",
            config=config
        ):
            count += 1
            elapsed = time.time() - start_time
            rate = count / elapsed if elapsed > 0 else 0
            eta = (max_pages - count) / rate if rate > 0 else 0

            print(
                f"[{count}/{max_pages}] "
                f"Depth: {result.metadata.get('depth', 0)} | "
                f"Rate: {rate:.1f}/s | "
                f"ETA: {eta:.0f}s | "
                f"{result.url[:50]}"
            )

    print(f"\nDone! {count} pages in {time.time() - start_time:.1f}s")

asyncio.run(deep_crawl_with_progress())

Tips for predictable crawls:

Use max_pages to cap total pages
Use prefetch=True for faster discovery (skips heavy processing)
Track pages/sec rate to estimate remaining time

CrawlerMonitor and MemoryAdaptiveDispatcher are for batch crawling (arun_many), not deep crawl. For deep crawl, you can use streaming with manual tracking as shown above.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is there a way to estimate or predict deep crawl completion time in Crawl4AI? #1741

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Is there a way to estimate or predict deep crawl completion time in Crawl4AI? #1741

Uh oh!

aman-chicmic Jan 27, 2026

Replies: 2 comments

Uh oh!

aman-chicmic Jan 27, 2026 Author

Uh oh!

ntohidi Jan 27, 2026 Collaborator Sponsor

aman-chicmic
Jan 27, 2026

aman-chicmic
Jan 27, 2026
Author

ntohidi
Jan 27, 2026
Collaborator Sponsor