Skip to content

Update text search best practices regarding scalability #1496

@Josipmrden

Description

@Josipmrden

per @DavIvek

I’ve experimented with text search imports and found that parallel imports can be extremely slow when a text index is present. When a text index is created and updated in parallel, imports can be 20x slower compared to running without a text index.

This happens because the text index takes a unique lock during commits and those commits can be expensive in the text index context. Meaning only one transaction can write to the index at a time. Since these writes also persist data to disk, the slowdown is expected. While this can be optimized in the future, it’s the current behavior.

Running the entire import in a single transaction is significantly faster than many small transactions, because writes don’t block each other and everything is flushed in a single commit.

That said, based on practical testing with 100k nodes, the best import speed was achieved by batching transactions rather than using a single large transaction:

Splitting the import into larger batches yielded the best results.

10k nodes per worker provided the fastest import for 100k nodes:

  • ~50% faster than importing everything in a single transaction.
  • ~30% faster than using 1k-node batches with 10 parallel workers.
  • The worst performance occurred when each node was created in its own transaction.

Conclusion: batching is strongly preferred. For large imports with a text index, using fewer transactions with larger batches provides the best balance between parallelism and commit overhead. This would be valuable to call out explicitly in the import best practices section of the documentation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions