Skip to content

Fixing Json character offsets via wrapping the underlying character buffer"#2643

Merged
jurgenvinju merged 26 commits intomainfrom
fix/json-offsets
Feb 18, 2026
Merged

Fixing Json character offsets via wrapping the underlying character buffer"#2643
jurgenvinju merged 26 commits intomainfrom
fix/json-offsets

Conversation

@jurgenvinju
Copy link
Member

@jurgenvinju jurgenvinju commented Feb 11, 2026

This continues where #2638 left off.

  • stable fix and validation of offsets and lengths, via wrapping the underlying character stream
  • when called from Rascal all read and parse functions offer accurate error locations based on the above offset tracking, even when trackOrigins is off
  • the internal API which starts from a JsonReader never tracks origins and does not produce accurate offsets in error messages.
    • internal API now produces ParseError(locationTop or |unknown:///|, line, column, cause, path) instead of ParseError(locWithOffsetLineColumn, cause, path)
  • proper implementation and effect of trackOrigins=false for the public Rascal API.
    • for the internal API trackOrigins has no effect. It is always off.
  • setLenient was not implemented correctly, and needed to be updated to non-deprecated API
  • add tests for line, column numbers (start is internal, end is external bound for columns)
  • test which also triggers parseJSON with originTracking enabled creates broken offset/length information if the input exceeds 1024 characters #2633 on unix systems
  • added several tests with sizes longer than 1024 to cover more corner cases with offset tracking.
  • added test by @DavyLandman which searchers for the corner case where the buffer is not entirely read yet, but a fillBuffer with a requested size over the edge is requested anyway.
  • improved error handling for unexpected NPE's

This fix does not change the previous behavior of the Json IO library other than fixing two complex issues:

This API still streams pretty quickly. Could not measure a degredation in speed on smaller files like we have in the tests. On very larger files sometimes it's 1% slower and sometimes 1% faster, so that seems to be noise.

This is for another PR

  • fix offsets of high-low surrogate pairs (optional)

@jurgenvinju jurgenvinju self-assigned this Feb 11, 2026
@codecov
Copy link

codecov bot commented Feb 11, 2026

Codecov Report

❌ Patch coverage is 78.72340% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 46%. Comparing base (98c58e4) to head (199423e).
⚠️ Report is 30 commits behind head on main.

Files with missing lines Patch % Lines
...pl/library/lang/json/internal/JsonValueReader.java 79% 9 Missing and 6 partials ⚠️
.../rascalmpl/exceptions/RuntimeExceptionFactory.java 25% 3 Missing ⚠️
src/org/rascalmpl/library/lang/json/IO.java 88% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##              main   #2643   +/-   ##
=======================================
  Coverage       46%     46%           
+ Complexity    6682    6680    -2     
=======================================
  Files          795     795           
  Lines        65892   65899    +7     
  Branches      9877    9878    +1     
=======================================
+ Hits         30710   30716    +6     
+ Misses       32810   32802    -8     
- Partials      2372    2381    +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Member Author

@jurgenvinju jurgenvinju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This remains finicky code, because the internal pos field of GsonReader is not a proper cursor into the file. The new fix with a wrapped Reader helps a lot in getting this under control.

Nevertheless there are seemingly magic "-1"'s and "+1"'s spread throughout JsonValueReader to make up for a lack of control over de pos field in GsonReader. I'd rather have those -1's spread throughout the code, where you can see how much they depend semantically on the calls to GsonReader.{beginObject, endObject, fillBuffer} etc and their effect on the pos variable. A kneejerk reaction could be to factor this in a method; but would be a false abstraction. This is the real complexity of it.

@jurgenvinju jurgenvinju marked this pull request as ready for review February 17, 2026 10:00
Copy link
Member

@DavyLandman DavyLandman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes: this looks good 👍🏼

@sonarqubecloud
Copy link

@jurgenvinju jurgenvinju merged commit 57d8bcd into main Feb 18, 2026
7 checks passed
@jurgenvinju jurgenvinju deleted the fix/json-offsets branch February 18, 2026 11:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments