Skip to content

Conversation

@tompng
Copy link
Member

@tompng tompng commented Apr 11, 2025

#1024
Migrate nesting analysis from Ripper to Prism

Original nesting calculation with Ripper

Tokenize with ripper
Parse the tokens mainly focusing on open/close tokens
Collect open tokens per line

New nesting calculation with Prism

Traverse syntax tree
Check open_loc and closing_loc for a node that makes nesting, create a token-like object for it
Collect open token-like objects per line

Gemfile

Exclude prism-1.8.0 because test fails. (ruby/prism#3851)

@tompng tompng force-pushed the ripper_to_prism_nesting branch 3 times, most recently from be7dfe6 to c3a31b8 Compare January 16, 2026 16:52
@tompng tompng marked this pull request as ready for review January 16, 2026 17:28
Copy link
Member

@st0012 st0012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we nudge Prism maintainers to cut v1.8.1 for the fix, and add prism as a dependency and require 1.3.0+?

# scan_opens without block will return a list of open tokens at last token position
scan_opens(tokens)
# Return a list of open nestings at last token position
def open_nestings(parse_lex_result)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this or parse_by_line take (code, local_variables: []) and call Prism.parse_lex here?
I feel Prism should be hidden from the callers of this class. Then we only need to require it here instead of all the places that'd use a nesting parser.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a followup pull request #1160, parse_lex_result will be also used in RubyLex#should_continue?.
To avoid parsing the same code two times in dynamic_prompt calculation, argument needs to be parse_lex_result.

@context.io.dynamic_prompt do |lines|
  parse_lex_result = Prism.parse_lex(code, scopes: [@context.local_variables])
  line_results = IRB::NestingParser.parse_by_line(parse_lex_result)
  ...
  line_results.map.with_index do |...|
    # This part requires lex result
    continue = @scanner.should_continue?(tokens_until_line, line, line_num_offset + 1)
    ...
  end
end

Maybe we can find a better way to separate after migrating to Prism, I think.

@st0012
Copy link
Member

st0012 commented Jan 16, 2026

Oh you did this in #1091 already. I guess we just need to wait for the release then

@tompng tompng force-pushed the ripper_to_prism_nesting branch from c3a31b8 to c4882f3 Compare January 17, 2026 06:42
first_token_on_line = true
elsif t.event != :on_sp
first_token_on_line = false
@heredocs[line_index]&.sort_by { |_node, (_line, col)| col }&.reverse_each do |elem|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be:

Suggested change
@heredocs[line_index]&.sort_by { |_node, (_line, col)| col }&.reverse_each do |elem|
@heredocs[line_index]&.sort_by { |elem| elem.pos[1] }&.reverse_each do |elem|

If so, let's add a test case that catches the failure? Something like:

    def test_multiple_heredocs_same_line_ordering
      code = <<~'RUBY'
        x = <<B + <<A
        B
        A
      RUBY
      line_results = parse_by_line(code)

      _prev, opens_line1, _min = line_results[0]
      assert_equal(['<<B', '<<A'], opens_line1.map(&:tok))
    end

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! It was sorting with nil, no error was raised but not correctly sorted.
I'll add a test case that needs sort <<~A if <<B (<<B appears before <<A in the syntax tree)

preserve_indent
end
elsif prev_open_token&.event == :on_embdoc_beg || next_open_token&.event == :on_embdoc_beg
if prev_open_token&.event == next_open_token&.event
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand the code correctly, these "tokens" are actually nesting elements? Can we update these variable names to reflect that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's nesting elements. Updated 👍

It was a token, but changed to NestingParser::NestingElement
end

def test_heredoc_sorting
# Heredocs appears in the ordef B,A,D,C in syntax tree, but should be processed in A,B,C,D order.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Heredocs appears in the ordef B,A,D,C in syntax tree, but should be processed in A,B,C,D order.
# Heredocs appears in the order of B,A,D,C in syntax tree, but should be processed in A,B,C,D order.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙈 (thanks)

@tompng tompng force-pushed the ripper_to_prism_nesting branch from 0acd285 to eec5fa8 Compare January 18, 2026 16:29
@tompng tompng merged commit a0e7fba into ruby:master Jan 18, 2026
38 of 40 checks passed
@tompng tompng deleted the ripper_to_prism_nesting branch January 18, 2026 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants