BF: Improve TRX loading when local file headers have extra bytes #103
+130
−4
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
this fix #86.
ZIP/TRX files have two places where they store info about each file inside:
The old code assumed these two always match. But they don't have to! Some tools add extra bytes to local headers that aren't in the central directory.
So when loading, we calculated the wrong position to read data from. which was causing the crash on uncompressed data that @neurolabusc encounter.
Instead of guessing where the data starts, we now read the actual local header to find out exactly where each file's data begins.
Added a test that creates a ZIP with mismatched headers - fails before fix, passes after. You can also test with the data shared in the issue #86.
Note: maybe we should add @neurolabusc dataset in testdata (if you are ok, of course) to make sure we do not encounter this issue in other languages.