Proposal to a few fixes/improvements in the ini parser #7420

dryabov · 2021-08-28T13:47:58Z

In this PR, I suggest to "unify" the processing of escaped characters in the ini format lexer. The ini format is used not only for settings (like php.ini), but also for language files in the Joomla! CMS, that's why a clear set of rules is necessary. (in particular, this PR resulted from the discussion in the related Joomla's JED Checker PR)

Currently, double-quoted strings are processed twice: first time in the <ST_DOUBLE_QUOTES>[^] lexer rule (to get string length), and then in the zend_ini_escape_string function. The problem is that strings are processed differently: lexer rule uses a look-behind approach to check double quote is escaped, and zend_ini_escape_string skips escaped characters in a usual way (skip-next-char approach, like in PHP's strings parser). As a result, in some cases there is no way to escape the final backslash in a string, e.g. in the case of string followed by anything except linebreak:
```
KEY1 = "prefix\\" ; Warning: syntax error, unexpected end of file, expecting TC_DOLLAR_CURLY or TC_QUOTED_STRING or '"'
KEY2 = "prefix\\" ACONST
```
There is a special check in PHP for the case of a double-quoted string directly followed by linebreak (as far as I can see, it was implemented to support Windows paths like "C:\path\" as a value):
```
KEY = "prefix\"
```
For consistency, I'd like to switch to the PHP-way and require to escape each of the special chars (", $, \) in a usual (skip-next-char) way, without a look-behind approach. The only exception is the above-mentioned special check for Windows paths that should be kept for backward compatibility. Note it may lead to a backward incompatibility in data that use a sequence like \\" (instead of \\\") to get backslash followed by double quote (see the summary table below), but unlikely it's widely used in the wild.

In the <ST_DOUBLE_QUOTES>[^] lexer rule, the token is processed starting from YYCURSOR position instead of yytext, and as a result, the first character is not taken into account. In turn, it leads to no way to escape the leading dollar followed by the open curly brace:

KEY = "\${" ; Warning: syntax error, unexpected end of file, expecting TC_VARNAME

With the current PR it is fixed (and meanwhile, I've fixed possible out of buffer read in the former *YYCURSOR == '{' check).

The following table summarizes how this patch affects the processing of escaped characters:

INI	W/O PATCH	WITH PATCH
`A = "aaa\"`	`aaa\`	`aaa\`
`A = "aaa\";`	❌error	❌error
`A = "aaa\\"`	`aaa\`	`aaa\`
`A = "aaa\\";`	❌error	`aaa\`
`A = "aaa\\"bbb"`	`aaa\"bbb`	❌error
`A = "aaa\\\"bbb"`	`aaa\"bbb`	`aaa\"bbb`
`A = "\${"`	❌error	`${`

All existing parse_ini-related tests are passed.

Finally (unrelated to this request, but I welcome any comments), I'd suggest to replace current short PHP docs about these escaping rules

; \ is used to escape a value.
newline_is = "\\n" ; results in the string "\n", not a newline character.
with quotes = "She said \"Exactly my point\"." ; Results in a string with quote marks in it.

with the following detailed explanation:

Example # 5 escaping characters

Some characters have special meaning in double-quoted strings and must be escaped by the backslash prefix. First of all, these are the double quote " as the boundary marker, and the backslash \ (if followed by one of the special characters):

quoted = "She said \"Exactly my point\"." ; Results in a string with quote marks in it.
hint = "Use \\\" to escape double quote" ; Results in: Use \" to escape double quote

There is an exception made for Windows-like paths: it's possible to don't escape trailing backslash if the quoted string is directly followed by a linebreak:

save_path = "C:\Temp\"

If one does need to escape double quote followed by linebreak in a multiline value, it's possible to use value concatenation in the following way (there is one double-quoted string directly followed by another one):

long_text = "Lorem \"ipsum\"""
dolor" ; Results in: Lorem "ipsum"\n dolor

Another character with special meaning is $ (the dollar). It must be escaped if followed by the open curly brace:

code = "\${test}"

Escaping characters is not supported in the INI_SCANNER_RAW mode (in this mode all characters in are processed "as is").

Note that the ini parser doesn't support standard escape sequences (\n, \t, etc.). If necessary, post-process the result of parse_ini_file with stripcslashes() function.

Girgias · 2021-08-28T14:45:36Z

The doc change can be done regardless of this PR, please open an issue/PR on https://github.com/php/doc-en

dryabov · 2021-08-28T15:16:29Z

The doc change can be done regardless of this PR, please open an issue/PR on https://github.com/php/doc-en

I know, but

I'd like to get the opinions of developers who know how the ini parser and lexer work, maybe I missed something.
Perhaps the developers would like to keep some parts of the documentation unpublished, especially if later they plan to make some changes that break backward compatibility.
It is unknown if this patch will be accepted, but one example depends on this.

nikic · 2021-08-30T10:36:09Z

These changes look reasonable to me. Can you please add some tests for the differing behavior?

dryabov · 2021-08-30T13:19:54Z

@nikic Is it ok to alter existing ext/standard/tests/general_functions/parse_ini_basic.phpt/.data test?

nikic · 2021-08-30T13:22:55Z

@dryabov yes

dryabov · 2021-08-30T14:10:32Z

@nikic Done

See merged PR php/php-src#7420 (php/php-src@d3a6054)

See also merged PR php/php-src#7420 (php/php-src@d3a6054) Closes GH-898.

See also merged PR php/php-src#7420 (php/php-src@d3a6054) php/doc-en@33225f7

See also merged PR php/php-src#7420 (php/php-src@d3a6054) Closes phpGH-898.

Proposal to a few fixes/improvements in the ini parser

64f3fb7

add test of characters escaping in parse_ini_file

6f64189

nikic approved these changes Aug 30, 2021

View reviewed changes

nikic closed this in d3a6054 Aug 30, 2021

dryabov added a commit to dryabov/doc-en that referenced this pull request Sep 3, 2021

Add a separate example section for escaping of characters

9d684db

See merged PR php/php-src#7420 (php/php-src@d3a6054)

dryabov mentioned this pull request Sep 3, 2021

Add a separate example section for escaping of characters php/doc-en#898

Merged

cmb69 pushed a commit to php/doc-en that referenced this pull request Sep 4, 2021

Add a separate example section for escaping of characters

33225f7

See also merged PR php/php-src#7420 (php/php-src@d3a6054) Closes GH-898.

mumumu added a commit to php/doc-ja that referenced this pull request Sep 4, 2021

Add a separate example section for escaping of characters

7e2fd55

See also merged PR php/php-src#7420 (php/php-src@d3a6054) php/doc-en@33225f7

marcovtwout pushed a commit to marcovtwout/doc-en that referenced this pull request Dec 24, 2024

Add a separate example section for escaping of characters

237ab8d

See also merged PR php/php-src#7420 (php/php-src@d3a6054) Closes phpGH-898.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal to a few fixes/improvements in the ini parser #7420

Proposal to a few fixes/improvements in the ini parser #7420

Uh oh!

dryabov commented Aug 28, 2021

Uh oh!

Girgias commented Aug 28, 2021

Uh oh!

dryabov commented Aug 28, 2021

Uh oh!

nikic commented Aug 30, 2021

Uh oh!

dryabov commented Aug 30, 2021

Uh oh!

nikic commented Aug 30, 2021

Uh oh!

dryabov commented Aug 30, 2021

Uh oh!

Uh oh!

Proposal to a few fixes/improvements in the ini parser #7420

Proposal to a few fixes/improvements in the ini parser #7420

Uh oh!

Conversation

dryabov commented Aug 28, 2021

Uh oh!

Girgias commented Aug 28, 2021

Uh oh!

dryabov commented Aug 28, 2021

Uh oh!

nikic commented Aug 30, 2021

Uh oh!

dryabov commented Aug 30, 2021

Uh oh!

nikic commented Aug 30, 2021

Uh oh!

dryabov commented Aug 30, 2021

Uh oh!

Uh oh!