Skip to content

Proposal to a few fixes/improvements in the ini parser #7420

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

dryabov
Copy link
Contributor

@dryabov dryabov commented Aug 28, 2021

In this PR, I suggest to "unify" the processing of escaped characters in the ini format lexer. The ini format is used not only for settings (like php.ini), but also for language files in the Joomla! CMS, that's why a clear set of rules is necessary. (in particular, this PR resulted from the discussion in the related Joomla's JED Checker PR)

  1. Currently, double-quoted strings are processed twice: first time in the <ST_DOUBLE_QUOTES>[^] lexer rule (to get string length), and then in the zend_ini_escape_string function. The problem is that strings are processed differently: lexer rule uses a look-behind approach to check double quote is escaped, and zend_ini_escape_string skips escaped characters in a usual way (skip-next-char approach, like in PHP's strings parser). As a result, in some cases there is no way to escape the final backslash in a string, e.g. in the case of string followed by anything except linebreak:

    KEY1 = "prefix\\" ; Warning: syntax error, unexpected end of file, expecting TC_DOLLAR_CURLY or TC_QUOTED_STRING or '"'
    KEY2 = "prefix\\" ACONST

    There is a special check in PHP for the case of a double-quoted string directly followed by linebreak (as far as I can see, it was implemented to support Windows paths like "C:\path\" as a value):

    KEY = "prefix\"

    For consistency, I'd like to switch to the PHP-way and require to escape each of the special chars (", $, \) in a usual (skip-next-char) way, without a look-behind approach. The only exception is the above-mentioned special check for Windows paths that should be kept for backward compatibility. Note it may lead to a backward incompatibility in data that use a sequence like \\" (instead of \\\") to get backslash followed by double quote (see the summary table below), but unlikely it's widely used in the wild.

  2. In the <ST_DOUBLE_QUOTES>[^] lexer rule, the token is processed starting from YYCURSOR position instead of yytext, and as a result, the first character is not taken into account. In turn, it leads to no way to escape the leading dollar followed by the open curly brace:

    KEY = "\${" ; Warning: syntax error, unexpected end of file, expecting TC_VARNAME

    With the current PR it is fixed (and meanwhile, I've fixed possible out of buffer read in the former *YYCURSOR == '{' check).

    The following table summarizes how this patch affects the processing of escaped characters:

    INI W/O PATCH WITH PATCH
    A = "aaa\" aaa\ aaa\
    A = "aaa\"; error error
    A = "aaa\\" aaa\ aaa\
    A = "aaa\\"; error aaa\
    A = "aaa\\"bbb" aaa\"bbb error
    A = "aaa\\\"bbb" aaa\"bbb aaa\"bbb
    A = "\${" error ${

    All existing parse_ini-related tests are passed.

  3. Finally (unrelated to this request, but I welcome any comments), I'd suggest to replace current short PHP docs about these escaping rules

    ; \ is used to escape a value.
    newline_is = "\\n" ; results in the string "\n", not a newline character.
    with quotes = "She said \"Exactly my point\"." ; Results in a string with quote marks in it.

    with the following detailed explanation:

    Example # 5 escaping characters

    Some characters have special meaning in double-quoted strings and must be escaped by the backslash prefix. First of all, these are the double quote " as the boundary marker, and the backslash \ (if followed by one of the special characters):

    quoted = "She said \"Exactly my point\"." ; Results in a string with quote marks in it.
    hint = "Use \\\" to escape double quote" ; Results in: Use \" to escape double quote

    There is an exception made for Windows-like paths: it's possible to don't escape trailing backslash if the quoted string is directly followed by a linebreak:

    save_path = "C:\Temp\"

    If one does need to escape double quote followed by linebreak in a multiline value, it's possible to use value concatenation in the following way (there is one double-quoted string directly followed by another one):

    long_text = "Lorem \"ipsum\"""
     dolor" ; Results in: Lorem "ipsum"\n dolor

    Another character with special meaning is $ (the dollar). It must be escaped if followed by the open curly brace:

    code = "\${test}"

    Escaping characters is not supported in the INI_SCANNER_RAW mode (in this mode all characters in are processed "as is").

    Note that the ini parser doesn't support standard escape sequences (\n, \t, etc.). If necessary, post-process the result of parse_ini_file with stripcslashes() function.

@Girgias
Copy link
Member

Girgias commented Aug 28, 2021

The doc change can be done regardless of this PR, please open an issue/PR on https://github.com/php/doc-en

@dryabov
Copy link
Contributor Author

dryabov commented Aug 28, 2021

The doc change can be done regardless of this PR, please open an issue/PR on https://github.com/php/doc-en

I know, but

  1. I'd like to get the opinions of developers who know how the ini parser and lexer work, maybe I missed something.
  2. Perhaps the developers would like to keep some parts of the documentation unpublished, especially if later they plan to make some changes that break backward compatibility.
  3. It is unknown if this patch will be accepted, but one example depends on this.

@nikic
Copy link
Member

nikic commented Aug 30, 2021

These changes look reasonable to me. Can you please add some tests for the differing behavior?

@dryabov
Copy link
Contributor Author

dryabov commented Aug 30, 2021

@nikic Is it ok to alter existing ext/standard/tests/general_functions/parse_ini_basic.phpt/.data test?

@nikic
Copy link
Member

nikic commented Aug 30, 2021

@dryabov yes

@dryabov
Copy link
Contributor Author

dryabov commented Aug 30, 2021

@nikic Done

@nikic nikic closed this in d3a6054 Aug 30, 2021
dryabov added a commit to dryabov/doc-en that referenced this pull request Sep 3, 2021
cmb69 pushed a commit to php/doc-en that referenced this pull request Sep 4, 2021
mumumu added a commit to php/doc-ja that referenced this pull request Sep 4, 2021
marcovtwout pushed a commit to marcovtwout/doc-en that referenced this pull request Dec 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants