Proposal to a few fixes/improvements in the ini parser #7420
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In this PR, I suggest to "unify" the processing of escaped characters in the ini format lexer. The ini format is used not only for settings (like php.ini), but also for language files in the Joomla! CMS, that's why a clear set of rules is necessary. (in particular, this PR resulted from the discussion in the related Joomla's JED Checker PR)
Currently, double-quoted strings are processed twice: first time in the
<ST_DOUBLE_QUOTES>[^]
lexer rule (to get string length), and then in thezend_ini_escape_string
function. The problem is that strings are processed differently: lexer rule uses a look-behind approach to check double quote is escaped, andzend_ini_escape_string
skips escaped characters in a usual way (skip-next-char approach, like in PHP's strings parser). As a result, in some cases there is no way to escape the final backslash in a string, e.g. in the case of string followed by anything except linebreak:There is a special check in PHP for the case of a double-quoted string directly followed by linebreak (as far as I can see, it was implemented to support Windows paths like "
C:\path\
" as a value):For consistency, I'd like to switch to the PHP-way and require to escape each of the special chars (
"
,$
,\
) in a usual (skip-next-char) way, without a look-behind approach. The only exception is the above-mentioned special check for Windows paths that should be kept for backward compatibility. Note it may lead to a backward incompatibility in data that use a sequence like\\"
(instead of\\\"
) to get backslash followed by double quote (see the summary table below), but unlikely it's widely used in the wild.In the
<ST_DOUBLE_QUOTES>[^]
lexer rule, the token is processed starting fromYYCURSOR
position instead ofyytext
, and as a result, the first character is not taken into account. In turn, it leads to no way to escape the leading dollar followed by the open curly brace:With the current PR it is fixed (and meanwhile, I've fixed possible out of buffer read in the former
*YYCURSOR == '{'
check).The following table summarizes how this patch affects the processing of escaped characters:
A = "aaa\"
aaa\
aaa\
A = "aaa\";
A = "aaa\\"
aaa\
aaa\
A = "aaa\\";
aaa\
A = "aaa\\"bbb"
aaa\"bbb
A = "aaa\\\"bbb"
aaa\"bbb
aaa\"bbb
A = "\${"
${
All existing parse_ini-related tests are passed.
Finally (unrelated to this request, but I welcome any comments), I'd suggest to replace current short PHP docs about these escaping rules
with the following detailed explanation: