-
Notifications
You must be signed in to change notification settings - Fork 12
Description
per @DavIvek
I’ve experimented with text search imports and found that parallel imports can be extremely slow when a text index is present. When a text index is created and updated in parallel, imports can be 20x slower compared to running without a text index.
This happens because the text index takes a unique lock during commits and those commits can be expensive in the text index context. Meaning only one transaction can write to the index at a time. Since these writes also persist data to disk, the slowdown is expected. While this can be optimized in the future, it’s the current behavior.
Running the entire import in a single transaction is significantly faster than many small transactions, because writes don’t block each other and everything is flushed in a single commit.
That said, based on practical testing with 100k nodes, the best import speed was achieved by batching transactions rather than using a single large transaction:
Splitting the import into larger batches yielded the best results.
10k nodes per worker provided the fastest import for 100k nodes:
- ~50% faster than importing everything in a single transaction.
- ~30% faster than using 1k-node batches with 10 parallel workers.
- The worst performance occurred when each node was created in its own transaction.
Conclusion: batching is strongly preferred. For large imports with a text index, using fewer transactions with larger batches provides the best balance between parallelism and commit overhead. This would be valuable to call out explicitly in the import best practices section of the documentation.