This repository was archived by the owner on May 30, 2022. It is now read-only.
Open
Conversation
Using a partial function and tweaking how the arguments are passed allows applying lru_cache on the method. The function is called for each field of each row, but the possible values for args are a lot more limited, so caching is very effective here.
ericboucher
approved these changes
Feb 9, 2022
|
|
||
| RESERVED_NULL_DEFAULT = 'NULL' | ||
|
|
||
| @lru_cache(maxsize=128) |
Member
There was a problem hiding this comment.
let's make maxsize a constant at the top maybe?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Initially posted as datamill-co#204.
This PR addresses the changes proposed in datamill-co#203
As described in the issue, loading data seemed unnecessarily slow. On a test CSV file containing 50k rows, each with 2 cols of fixed length text, target-postgres was spending about 20 seconds loading the data.
Some profiling helped identify the bottlenecks, as described in the issue.
This PR proposes 3 distinct improvements, in decreasing order of time savings (based on my test case). With cProfile instrumentation, the loading took approx 32 seconds instead of 20:
In the end, with these speedups applied, the same data now loads in under 4s instead of the original 20s.