Skip to content

Unify lexing/preprocessing/parsing into hand-rolled scanner #261

Open
@Cirras

Description

@Cirras

Prerequisites

  • This improvement has not already been suggested.
  • This improvement would be generally useful, not specific to my code or setup.

Engine area

Delphi language support

Improvement description

Background

  • The current parsing is implemented in 3 steps:
    • Lexing (DelphiLexer generated by ANTLR)
    • Preprocessing (DelphiPreprocessor)
    • Parsing (DelphiParser generated by ANTLR)
  • In addition, we have a hand-rolled CompilerDirectiveParser with corresponding ExpressionLexer and ExpressionParser for handling constant expressions within {$if} and {$elseif} directives.

Problem

While this approach keeps the concerns cleanly separated and sounds nice on paper, it doesn't reflect reality at all.

Delphi's parsing is a fairly streamlined single-pass implementation that parses high-level AST structures and constructs the symbol table while reading tokens on-demand from the input data and processing compiler directives as they appear. In other words, it's all mashed together into a single process.

There are many subtle deficiencies in the current parsing that we can trace back to this fundamental difference in how the process is modeled.

Solution

Merge the 3 parsing steps into a hand-rolled DelphiScanner:

  • Lex tokens on-demand instead of doing it upfront.
  • Evaluate compiler directives as they're lexed from the input data instead of afterward.
  • Additionally, perform symbol table construction simultaneously instead of afterward.

Rationale

Enable full support for:

  • caret notation (Parsing error on caret-escaped special characters #111)
  • compiler directives nested in directive expressions (Expression lexer failing to compile file in recursive directives #260)
  • symbol information in directive expressions
    • references to constants
    • Declared function
  • "vexing" parses that require symbol information to resolve
    // [[OPTION 1]]
    // In this scenario, `Write` is called with 2 arguments
    // parsed as binary expressions: `Foo < T` and `T > (B)` 
    {
      type
        A = class
          procedure Bar;
        end;
       
      const
        Foo = 0;
        T = 0;
        B = 0;
    }
    
    // [[OPTION 2]]
    // In this scenario, `Write` is called with 1 argument
    // parsed as an invocation of generic function `Foo`.
    {  
      type
        A = class
          function Foo<U, V>(X: Integer): Integer;
          procedure Bar;
        end;
       
        T = class end;
       
      const
        B = 0;
    }
    
    procedure A.Bar;
    begin
      Write(Foo<T, T>(B));
    end;

Make it easier to implement:

Metadata

Metadata

Assignees

No one assigned

    Labels

    engineImprovements to the core engineenhancementImprovements to an existing feature

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions