Parsing structured text (e.g. program source code) sounds more complicated than it actually is.
This repo exemplifies how to quickly write a parser combinator that can be used to do just that. See also this video for a quick overview of what we're trying to achieve.
Design goals:
- Simple code, minimal comments (let the code do the talking)
- Clear separation of concerns between classes
- Easy to adapt to your own needs (core parser functionality is less than 100 lines, can be reused in you own projects)
- Meant to be evocative, not exhaustive (things like error handling are greatly simplified)
- 100% working, but intentionally incomplete (so you can have fun extending the code and tinkering with it).
The code in this repo is able to parse somewhat complex BASIC programs such as:
10 LET SomeVariable = "John" + " " + "Smith"
20 PRINT "Hello, ", SomeVariable ,"! How are you?" : PRINT "This is another print statement." : LET SomeNumber123 = 37 + 1 * (4 * 5)
30 FOR ALoopVariable = 37 + SomeNumber123 To 1000
40 LET Temp = ALoopVariable MOD 5
50 IF ALoopVariable > 400 AND Temp = 2 THEN PRINT "This is a conditional print: ", ALoopVariable
60 NEXT
Feel free to play around and add support for more BASIC features, such as:
- User input via the
INPUTcommand (should be similar to how thePRINTcommand is already parsed) - User functions via
DEF FN(how would you declare and parse user arguments in this case?) - Actually running the parsed BASIC programs (the parser returns a mostly-usable AST, but expression evaluation, not to be confused with expression parsing which is already handled, needs to be carefully thought out)
This repo takes the form of a single Visual Studio Solution. The solution itself is split in two parts:
- A universal parser combinator. Simple, concise, and portable. Can be reused in your own projects if you want to. Not meant to be feature complete.
- A BASIC program parser. Uses the universal parser functionality to parse actual BASIC programs.
The relevant files and their descriptions are as follows:
| File | Description |
|---|---|
| Parser.cs | Core parser functionality (While(),Until()) and combinator (Union(),Optional()) |
| ParserExtras.cs | Non-core but widely used parser helpers |
| TextWithPointer.cs | Represents source code with a "current pointer" |
These files use the universal parser functionality mentioned previously to implement a simple BASIC program parser.
| File | Description |
|---|---|
| BasicParser.cs | Used as a high-level wrapper/parser over a BASIC program |
| BasicProgramEntities.cs | Entities used in BasicParser.cs |
| CommandArgumentsReader.cs | Parser for BASIC command arguments (e.g. PRINT, FOR, LET, etc) |
| CoreEntitiesReader.cs | Parsers for BASIC entities like string, int, etc |
| ExpressionReader.cs | Parser for BASIC math-like expressions |
| ListReader.cs | Parser for BASIC lists, in particular lists of command arguments |
| Misc.cs | Various helper functions |