Skip to content

Add glob package for pattern matching#2947

Open
bobzhang wants to merge 1 commit intomainfrom
hongbo/wip_glob
Open

Add glob package for pattern matching#2947
bobzhang wants to merge 1 commit intomainfrom
hongbo/wip_glob

Conversation

@bobzhang
Copy link
Contributor

@bobzhang bobzhang commented Nov 13, 2025

This PR adds a comprehensive glob pattern matching package to the MoonBit standard library.

Features

  • Wildcards: * (zero or more chars), ? (exactly one char)
  • Character classes: [abc], [a-z], [!abc], [^abc]
  • Brace expansion: {a,b,c}
  • Recursive patterns: ** (matches across directories)
  • Escape sequences: \ to escape special characters

Implementation

  • Clean, efficient recursive backtracking implementation
  • Proper handling of path separators (/)
  • Full Unicode support via Int::unsafe_to_char

Testing

  • 150+ test cases covering:

    • All wildcard and pattern features
    • Character classes and negation
    • Brace expansion with wildcards
    • Recursive ** patterns
    • Edge cases and error conditions
    • Real-world use cases (gitignore patterns, file matching)
  • All tests passing: 5572/5572

  • Zero warnings after moon check

Documentation

Includes comprehensive README.mbt.md with:

  • Feature overview
  • Usage examples for each pattern type
  • Real-world examples
  • API reference
  • Performance notes

Examples

@glob.is_match("*.txt", "hello.txt")                    // true
@glob.is_match("file[0-9].log", "file5.log")            // true
@glob.is_match("{jpg,png,gif}", "photo.jpg")            // true
@glob.is_match("**/*.rs", "src/lib/mod.rs")             // true
@glob.is_match("src/**/test/*.txt", "src/a/b/test/x.txt") // true

Open with Devin

Implement comprehensive glob pattern matching with support for wildcards,
character classes, brace expansion, recursive patterns, and escape sequences.
Includes 150+ test cases covering all features and real-world use cases.
@bobzhang bobzhang marked this pull request as draft November 13, 2025 14:10
@coveralls
Copy link
Collaborator

Pull Request Test Coverage Report for Build 1867

Details

  • 68 of 76 (89.47%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.01%) to 91.559%

Changes Missing Coverage Covered Lines Changed/Added Lines %
glob/glob.mbt 68 76 89.47%
Totals Coverage Status
Change from base Build 1862: -0.01%
Covered Lines: 10359
Relevant Lines: 11314

💛 - Coveralls

@bobzhang bobzhang marked this pull request as ready for review February 14, 2026 04:13
@peter-jerry-ye
Copy link
Collaborator

@codex

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 472d21b646

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +101 to +103
for i = t_idx; i < t_len; i = i + 1 {
if match_impl(pattern, text, next_p, i + 1, true) {
return true

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restrict **/ matching to path-segment boundaries

The ** branch currently tries every character offset in text (i + 1) without checking for / boundaries, so patterns that are documented as matching path segments can match inside a segment. For example, @glob.is_match("**/test", "mytest") will return true by jumping directly to the t in mytest, even though there is no directory separator. This breaks core glob accuracy for directory-style patterns like **/name and src/**/test/*.txt.

Useful? React with 👍 / 👎.

Copy link

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 7 additional findings in Devin Review.

Open in Devin Review

let mut found_close = false
while i < p_len && not(found_close) {
let curr = Int::unsafe_to_char(pattern[i])
if curr == ']' && i != start_idx + 1 && i != start_idx + 2 {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Character class closing ] detection fails for single-element non-negated classes like [a]

The condition i != start_idx + 2 at line 219 prevents ] from being recognized as a closing bracket when it appears at position start_idx + 2, even in non-negated character classes. This check is intended to allow ] as a literal first character in negated classes like [!]...] or [^]...], but it incorrectly also blocks the closing ] in non-negated single-element classes.

Root Cause and Impact

For a pattern like [a] (start_idx=0):

  • i = 1: processes a as a literal, i becomes 2
  • i = 2: curr = ']', but the check i != start_idx + 2 evaluates to 2 != 2 = false, so ] is NOT recognized as the closing bracket
  • ] is instead treated as a literal character to match against
  • i becomes 3, exceeding p_len (3), so the loop exits without finding a close
  • The "Find closing ]" loop at line 244 also starts at i = 3 >= p_len, so it returns None
  • match_char_class returns None, and the caller at line 153 returns false

Impact: Any single-element character class like [a], [5], [.], [/] will always fail to match, returning false regardless of input. The existing tests don't catch this because they all use multi-element classes like [abc] or ranges like [a-z] where ] appears at index ≥ 3.

The fix should make the start_idx + 2 exclusion conditional on negation:

if curr == ']' && i != start_idx + 1 && (not(negate) || i != start_idx + 2) {
Suggested change
if curr == ']' && i != start_idx + 1 && i != start_idx + 2 {
if curr == ']' && i != start_idx + 1 && (not(negate) || i != start_idx + 2) {
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants