Skip to content

Document AI v1beta3 update_dataset LRO: "Unexpected state: Long-running operation had neither response nor error set" #15557

@victoriapedlar

Description

@victoriapedlar

Determine this is the right repository

  • I determined this is the correct repository in which to report this bug.

Summary of the issue

Context

I am configuring a dataset for a Custom Classifier processor using the
google-cloud-documentai v1beta3 Python client (DocumentServiceClient.update_dataset).

The underlying REST PATCH call to
projects/{project}/locations/{location}/processors/{processor}/dataset
succeeds and the dataset is correctly attached to the processor (visible in
the Console UI). However, the Python long‑running operation helper raises:

GoogleAPICallError('Unexpected state: Long-running operation had neither response nor error set.')

Expected Behavior

operation.result() on client.update_dataset(...) should complete without
raising an exception when the backend successfully updates the dataset, and
the returned Operation should have either a response or an error set.

Actual Behavior

The dataset is attached correctly (Console shows the dataset GCS URI and the
dataset works for later training), but operation.result() raises
GoogleAPICallError: None Unexpected state: Long-running operation had neither response nor error set.

This makes it difficult to distinguish real failures from this client‑side LRO quirk.

API client name and version

google-cloud-documentai==3.7.0, google-api-core==2.28.1

Reproduction steps: code

file: setup.py

from google.cloud import documentai_v1beta3 as documentai
from google.api_core.client_options import ClientOptions

# Your settings
PROJECT_ID = "PROJECT_ID"
LOCATION = "LOCATION"
PROCESSOR_ID = "PROCESSOR_ID"
gcs_uri_prefix = "GCS_URI"

# Setup the endpoint correctly
client_options = ClientOptions(api_endpoint=f"{LOCATION}-documentai.googleapis.com")

# Initialize the Document AI client
client = documentai.DocumentServiceAsyncClient(client_options=client_options)

# Build the full dataset resource name
dataset_name = f"projects/{PROJECT_ID}/locations/{LOCATION}/processors/{PROCESSOR_ID}/dataset"

# Prepare the dataset configuration
dataset = documentai.Dataset(
    name=dataset_name,
    gcs_managed_config=documentai.Dataset.GCSManagedConfig(
        gcs_prefix=documentai.GcsPrefix(
            gcs_uri_prefix=gcs_uri_prefix
        )
    ),
    spanner_indexing_config=documentai.Dataset.SpannerIndexingConfig()
)

# Prepare the update request
update_request = documentai.UpdateDatasetRequest(
    dataset=dataset
)

# Call the update_dataset API
operation = client.update_dataset(request=update_request)

response = operation.result()

Reproduction steps: supporting files

Reproduction steps: actual results

Log output:

Traceback:
  File "setup.py", line 128, in add_processor_dataset
    response = operation.result(timeout=600)
  File ".../google/api_core/future/polling.py", line 261, in result
    raise self._exception
google.api_core.exceptions.GoogleAPICallError: None Unexpected state:
Long-running operation had neither response nor error set.

Despite this, the Console shows the dataset configured correctly under the
processor.

Reproduction steps: expected results

operation.result() should either:

  • Return a Dataset (or empty response) when the update succeeds, or
  • Raise an error with details from the backend (e.g. permission denied, invalid name).

It should not raise GoogleAPICallError: ... neither response nor error set
when the dataset is successfully attached and visible in the UI.

OS & version + platform

Base image: python:3.11-slim (Debian-based) Platform: Cloud Run job (containerized)

Python environment

Python 3.11.x

Python dependencies

Package Version Editable project location


aiohappyeyeballs 2.6.1
aiohttp 3.13.3
aiosignal 1.4.0
annotated-types 0.7.0
async-timeout 5.0.1
asyncpg 0.31.0
attrs 25.4.0
backports-asyncio-runner 1.2.0
bottleneck 1.6.0
certifi 2026.1.4
cffi 2.0.0
charset-normalizer 3.4.4
cryptography 46.0.4
db-dtypes 1.5.0
decorator 5.2.1
deprecated 1.3.1
exceptiongroup 1.3.1
frozenlist 1.8.0
fsspec 2026.2.0
gcsfs 2026.2.0
google-api-core 2.29.0
google-api-python-client 2.189.0
google-auth 2.48.0
google-auth-httplib2 0.3.0
google-auth-oauthlib 1.2.4
google-cloud-bigquery 3.40.0
google-cloud-core 2.5.0
google-cloud-documentai 3.9.0
google-cloud-documentai-toolbox 0.15.0a0
google-cloud-secret-manager 2.26.0
google-cloud-storage 3.9.0
google-cloud-storage-control 1.9.0
google-cloud-vision 3.12.1
google-crc32c 1.8.0
google-resumable-media 2.8.0
googleapis-common-protos 1.72.0
greenlet 3.3.1
grpc-google-iam-v1 0.14.3
grpcio 1.78.0
grpcio-status 1.78.0
httplib2 0.31.2
idna 3.11
immutabledict 4.2.2
iniconfig 2.3.0
intervaltree 3.2.1
jinja2 3.1.6
joblib 1.5.3
llvmlite 0.46.0
lxml 6.0.2
markupsafe 3.0.3
multidict 6.7.1
numba 0.63.1
numexpr 2.14.1
numpy 2.2.6
oauthlib 3.3.1
packaging 26.0
pandas 2.3.3
pandas-gbq 0.33.0
pikepdf 10.3.0
pillow 11.3.0
pluggy 1.6.0
propcache 0.4.1
proto-plus 1.27.1
protobuf 6.33.5
psutil 7.2.2
pyarrow 22.0.0
pyasn1 0.6.2
pyasn1-modules 0.4.2
pycparser 3.0
pydantic 2.12.5
pydantic-core 2.41.5
pydantic-settings 2.12.0
pydata-google-auth 1.9.1
pygments 2.19.2
pyparsing 3.3.2
pytest 9.0.2
pytest-asyncio 1.3.0
python-dateutil 2.9.0.post0
python-dotenv 1.2.1
pytz 2025.2
requests 2.32.5
requests-oauthlib 2.0.0
rsa 4.9.1
scikit-learn 1.7.2
scipy 1.15.3
setuptools 82.0.0
six 1.17.0
sortedcontainers 2.4.0
sqlalchemy 2.0.46
tabulate 0.9.0
threadpoolctl 3.6.0
tifffile 2025.5.10
tomli 2.4.0
tqdm 4.67.3
typing-extensions 4.15.0
typing-inspection 0.4.2
tzdata 2025.3
uritemplate 4.2.0
urllib3 2.6.3
wrapt 2.1.1
yarl 1.22.0

Additional context

If I call the REST API directly with:

  curl -X PATCH \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json; charset=utf-8" \
  -d @request.json \
  "https://eu-documentai.googleapis.com/v1beta3/projects/PROJECT_ID/locations/LOCATION/processors/PROCESSOR_ID/dataset"

and the JSON body:

{
  "name": "projects/PROJECT_ID/locations/LOCATION/processors/PROCESSOR_ID/dataset",
  "gcs_managed_config": {
    "gcs_prefix": {
      "gcs_uri_prefix": "gs://..."
    }
  },
  "spanner_indexing_config": {}
}

the operation succeeds and the dataset is attached, with no error.
Only the Python client’s operation.result() reports the invalid LRO state.

Metadata

Metadata

Assignees

Labels

triage meI really want to be triaged.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions