LLMExtractionStrategy returns empty results despite successful crawl & scrape #1458

birdalugur · 2025-08-29T14:02:01Z

birdalugur
Aug 29, 2025

Hi,

I’m testing Crawl4AI 0.7.4 with LLMExtractionStrategy, but the extraction step always returns an empty JSON array [], even though the crawl and scrape phases complete successfully.

Here are two examples I tried:

Example 1 – OpenAI (gpt-5-mini)

--- Extracting Structured Data with openai/gpt-5-mini ---
[INIT].... → Crawl4AI 0.7.4 
[FETCH]... ↓ https://openai.com/api/pricing/                                                                      | ✓ | ⏱: 1.24s 
[SCRAPE].. ◆ https://openai.com/api/pricing/                                                                      | ✓ | ⏱: 0.03s 
[EXTRACT]. ■ https://openai.com/api/pricing/                                                                      | ✓ | ⏱: 27.78s 
[COMPLETE] ● https://openai.com/api/pricing/                                                                      | ✓ | ⏱: 29.06s 
[]

Example 2 – OpenAI (gpt-5-mini) with another site

--- Extracting Structured Data with openai/gpt-5-mini ---
[INIT].... → Crawl4AI 0.7.4 
[FETCH]... ↓ https://www.migros.com.tr/                                                                           | ✓ | ⏱: 0.96s 
[SCRAPE].. ◆ https://www.migros.com.tr/                                                                           | ✓ | ⏱: 0.03s 
[EXTRACT]. ■ https://www.migros.com.tr/                                                                           | ✓ | ⏱: 12.54s 
[COMPLETE] ● https://www.migros.com.tr/                                                                           | ✓ | ⏱: 13.53s 
[]

Code Snippet

import os
import json
import asyncio
from pydantic import BaseModel, Field
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, LLMConfig
from crawl4ai import LLMExtractionStrategy, CacheMode, BrowserConfig
from typing import Dict, List

class OpenAIModelFee(BaseModel):
    model_name: str = Field(..., description="Name of the OpenAI model.")
    input_fee: str = Field(..., description="Fee for input token for the OpenAI model.")
    output_fee: str = Field(
        ..., description="Fee for output token for the OpenAI model."
    )

async def extract_structured_data_using_llm(
    provider: str, api_token: str = None, extra_headers: Dict[str, str] = None
):
    print(f"\n--- Extracting Structured Data with {provider} ---")

    if api_token is None and provider != "ollama":
        print(f"API token is required for {provider}. Skipping this example.")
        # return

    browser_config = BrowserConfig(headless=True)

    extra_args = {"temperature": 1, "max_tokens": 2000}
    if extra_headers:
        extra_args["extra_headers"] = extra_headers

    crawler_config = CrawlerRunConfig(
        cache_mode=CacheMode.BYPASS,
        word_count_threshold=1,
        page_timeout=80000,
        extraction_strategy=LLMExtractionStrategy(
            llm_config = LLMConfig(provider=provider,api_token=api_token),
            schema=OpenAIModelFee.model_json_schema(),
            extraction_type="schema",
            instruction="""From the crawled content, extract all mentioned model names along with their fees for input and output tokens. 
            Do not miss any models in the entire content.""",
            extra_args=extra_args,
        ),
    )

    async with AsyncWebCrawler(config=browser_config) as crawler:
        result = await crawler.arun(
            url="https://openai.com/api/pricing/", config=crawler_config
        )
        print(result.extracted_content)

if __name__ == "__main__":

    asyncio.run(
        extract_structured_data_using_llm(
            provider="openai/gpt-5-mini", api_token=apitoken)
    )

Expected behavior
The model should extract structured JSON based on the schema, but instead the result is always an empty array.

Environment

Crawl4AI version: 0.7.4
Tried with both Docker (gpt-oss:20b) and OpenAI (gpt-5-mini)
Python 3.12.3

Question
Is this a bug in LLMExtractionStrategy, or am I missing something in the schema/instructions?

Thanks!

Answered by DanielvanderLeij

Sep 29, 2025

GPT 5 models are not officially supported (see https://docs.crawl4ai.com/core/browser-crawler-config/#3-llmconfig-essentials), so suspect that that is causing your issues. I'm personally using gpt-4o-mini without any issues, try that model and see if it works.

View full answer

DanielvanderLeij · 2025-09-29T12:28:16Z

DanielvanderLeij
Sep 29, 2025

GPT 5 models are not officially supported (see https://docs.crawl4ai.com/core/browser-crawler-config/#3-llmconfig-essentials), so suspect that that is causing your issues. I'm personally using gpt-4o-mini without any issues, try that model and see if it works.

2 replies

SohamKukreti Oct 6, 2025
Collaborator

@birdalugur Can you try @DanielvanderLeij suggestion once and let us know?
Thanks!

birdalugur Oct 6, 2025
Author

When I changed it to provider="openai/gpt-4o-mini", my issue was resolved. I'm able to retrieve the content successfully. Here's an example:

{
    "model_name": "o4-mini",
    "input_fee": "$4.00 / 1M tokens",
    "output_fee": "$16.00 / 1M tokens",
    "error": false
}

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LLMExtractionStrategy returns empty results despite successful crawl & scrape #1458

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

LLMExtractionStrategy returns empty results despite successful crawl & scrape #1458

Uh oh!

Uh oh!

birdalugur Aug 29, 2025

Replies: 1 comment · 2 replies

Uh oh!

DanielvanderLeij Sep 29, 2025

Uh oh!

SohamKukreti Oct 6, 2025 Collaborator

Uh oh!

birdalugur Oct 6, 2025 Author

birdalugur
Aug 29, 2025

Replies: 1 comment 2 replies

DanielvanderLeij
Sep 29, 2025

SohamKukreti Oct 6, 2025
Collaborator

birdalugur Oct 6, 2025
Author