-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Deprecate zend_atol() / add zend_ini_parse_quantity() #7951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Zend/zend.c
Outdated
zend_long val = zend_atol(ZSTR_VAL(new_value), ZSTR_LEN(new_value)); | ||
zend_long val = zend_ini_parse_quantity(new_value, &errstr); | ||
if (errstr) { | ||
zend_error(E_WARNING, "Invalid \"%s\" setting: %s", ZSTR_VAL(entry->name), ZSTR_VAL(errstr)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are existing examples of E_WARNING in ini handlers with similar error formats, for example:
php-src/ext/opcache/zend_accelerator_module.c
Line 210 in 2b1f8ab
zend_error(E_WARNING, "Invalid \"%s\" setting. Should be between 0 and %d", ZSTR_VAL(entry->name), |
Shouldn't the new tests be in |
I'm not sure why we have tests there in the first place. zend_test is an extension which offers auxilliary code which is needed to test some behavior of other code. It's that other code that is to be tested, and so the tests should go with that code. Making sure that zend_test is available when the test is executed, needs to be done anyway. |
2da0d7e
to
953b7be
Compare
I think because it is easier to find what is testing what, and possibly easier to run the tests, as running all of Zend while making a change to that might be a bit annoying? I personally find the tests in Zend/ to be very disorganised and kinda annoying to deal with when I'm iterating on a change. But this is just my opinion. I don't really mind where they are, but because the other ones are in zend_test it makes some sense to me for them to be with the others there. |
Would #8454 supersede or complement this? |
I think that it would complete this by exposing the functionality of zend_atol() / zend_ini_parse_quantity() to user land. |
e4aac50
to
67bc4cf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than the phrasing of the error message and a question this LGTM just by staring at the code
@@ -53,6 +53,7 @@ ZEND_BEGIN_MODULE_GLOBALS(zend_test) | |||
int register_passes; | |||
bool print_stderr_mshutdown; | |||
zend_test_fiber *active_fiber; | |||
zend_long quantity_value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to be set in any test.
Zend/zend.c
Outdated
EG(fiber_stack_size) = zend_atol(ZSTR_VAL(new_value), ZSTR_LEN(new_value)); | ||
EG(fiber_stack_size) = zend_ini_parse_quantity(new_value, &errstr); | ||
if (errstr) { | ||
zend_error(E_WARNING, "Invalid \"%s\" setting: %s", ZSTR_VAL(entry->name), ZSTR_VAL(errstr)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a variant that also emits the error to avoid repeating this code? It also wouldn't need the temporary error string, which would simplify the API for the common use case where the warning is emitted.
Zend/zend_ini.h
Outdated
* character K, M, or G (for 2**10, 2**20, and 2**30, respectively). | ||
* | ||
* The digits are parsed as decimal unless the first character is '0', in which | ||
* case they are parsed as octal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Octal numbers aren't tested.
Zend/zend_ini.c
Outdated
@@ -30,6 +30,8 @@ static HashTable *registered_zend_ini_directives; | |||
#define NO_VALUE_PLAINTEXT "no value" | |||
#define NO_VALUE_HTML "<i>no value</i>" | |||
|
|||
#define ZEND_IS_WHITESPACE(c) (((c) == ' ') || ((c) == '\t') || ((c) == '\n') || ((c) == '\r') || ((c) == '\v') || ((c) == '\f')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this an inlined function? It's easy to mess up if using the macro like this: ZEND_IS_WHITESPACE(*x++);
Zend/zend_ini.c
Outdated
* For now overflow is silently ignored -- not clear what else can be | ||
* done here, especially as the final result of this function may be | ||
* used in an unsigned context (e.g. "memory_limit=3G", which overflows | ||
* zend_long on 32-bit, but not size_t). */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if we could warn on overflow and on truncation to 32-bit. Detecting overflow is pretty simple. https://stackoverflow.com/a/1815371/1320374
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely. I wanted to do this just after this PR, but I will do it now
Zend/zend_ini.c
Outdated
while (ZEND_IS_WHITESPACE(*digits_end)) ++digits_end; | ||
|
||
/* No exponent suffix. */ | ||
if (!*digits_end) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can NUL
bytes be inserted via ini setting? If so, the rest of the string would be ignored. Not a major issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well spotted! This would have been a breaking change (minor, but still a breaking change)
To carry the conversation from #8454 here, should we not go ahead and expose Further, what implications do we have by introducing a split between signed and unsigned shorthand parsing? Is that a non-issue because every case where we've been parsing the "shorthand notation" is now being parsed with |
6a20b55
to
e1531b3
Compare
6b8ce18
to
5b15654
Compare
--FILE-- | ||
<?php | ||
|
||
var_dump(ini_set("zend_test.quantity_value", "1MB")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we also do a test where this is in the --INI--
section to see if that triggers the warning?
Zend/zend_ini.c
Outdated
} | ||
|
||
*errstr = NULL; | ||
return (zend_long) retval; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Converting unsigned numbers to signed numbers when out-of-range is implementation defined. Do all architectures supported by PHP agree on this behavior?
Edit: Actually what I was reading was about implicit conversion, I'm not sure what applies to casting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right that it's UB. The behavior of implicit conversions and casts is defined by the same notion of conversion.
I would expect that all architectures/compilers supported by PHP do agree on this behavior. If my understanding is correct, in twos complement architectures this behavior is obtained by not changing the representation / doing nothing, so I would expect that this is what compilers do. The behavior is documented at least in GCC. An other consideration in favor of this is that we depend on it in a few places, including in zend_atol().
I've removed this conversion anyway, so that the function is UB-free at least in the zend_ini_parse_uquantity() variant :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behavior is implementation defined, not undefined. We absolutely need to avoid the latter, but relying on the former can be okay. It is in this case (conversion/cast from unsigned to signed).
Sounds good, thanks @arnaud-lb. I'll keep my branch updated and try to adjust after this merges.
This also seems fair but it takes a step backwards again that I was hoping to get past. I think (maybe my understanding was wrong) before this change those internals were shielded because "shorthand notation" was treated the same everywhere? Afterwards we have one exception, Am I wrong in that this is not just refactoring the parsing code but also introducing a breaking change on The whole "shorthand notation" thing and INI directive parsing is funny: given that PHP doesn't indicate in its docs what that means and how it doesn't expose how it interprets it I'm surprised this hasn't been more of a problem (and the most popular projects out there try to guess the value but do so in a very wrong way which leads people to thing that Any thoughts on the matter as it concerns application-level code? |
This is hopefully not introducing a breaking change.
Agreed :) This PR is one step to solve this. The warnings will help users to notice when their ini settings are not interpreted as they though they were.
I agree with #8454 (comment):
Unsigned quantities complicates this a bit though. Users will need to explicitly chose between signed or unsigned. And unsigned quantities do not fit in a PHP These two aspects would make this error prone, especially since doing the wrong thing would work most of the time (like casting to int or using the wrong signed-ness). This is also inconvenient. The return value can not be used directly without An alternative to exposing ini_parse_quantity() would be to expose a function that can compare quantities, like version_compare() but for ini quantities. This seems to match the use-case described in #7944, and eliminates some risk (wrongly casting/comparing to int) and some inconveniences (needing function ini_quantity_less_than(string $a, string $b): bool;
function ini_quantity_less_than_unsigned(string $a, string $b): bool;
// Use-case from #7944
function memory_limit_set_at_least(string $at_least): void
{
if (ini_quantity_less_than_unsigned(ini_get('memory_limit'), $at_least)) {
ini_set('memory_limit', $at_least);
}
} |
Adding just
|
Apart from the unsigned bit this is the approach I've been taking in WordPress to handle these values better. Currently we've been comparing the value returned from Complications I've found is knowing what's most appropriate to pass through as the result of these comparison functions. Return the parsed value or return the string ini value? Returning whichever ini value is less than/greater than seems to work okay so that's what I've been pursuing.
|
zend_atol() and zend_atoi() don't just do number parsing. They also check for a 'K', 'M', or 'G' at the end of the string, and multiply the parsed value out accordingly. Unfortunately, they ignore any other non-numerics between the numeric component and the last character in the string. This means that numbers such as the following are both valid and non-intuitive in their final output. * "123KMG" is interpreted as "123G" -> 132070244352 * "123G " is interpreted as "123 " -> 123 * "123GB" is interpreted as "123B" -> 123 * "123 I like tacos." is also interpreted as "123." -> 123 Currently, in php-src these functions are used only for parsing ini values. In this change we deprecate zend_atol(), zend_atoi(), and introduce a new function with the same behaviour, but with the ability to report invalid inputs to the caller. The function's name also makes the behaviour less unexpected: zend_ini_parse_quantity(). Co-authored-by: Sara Golemon <pollita@php.net>
5b15654
to
1d4dfa5
Compare
Agreed. This solution is a good compromise and matches the use-case. My personal preference would be that #8454 is changed to:
Then, functions such as |
@arnaud-lb this is missing the point for which #8454 was created though. it's important for projects to be able to convey to their users what the actual limits are. granted, we can do this with the point is that projects like WordPress need to be able to show someone on an upload screen that "you can upload files smaller than 128 MiB" if 128MiB is the limit. for signed quantities that don't involve the change-of-behavior that the unsigned parsing brings I think that's fine. the interpretation which maybe we can add |
Trying to revisit #4132:
@sgolemon noticed that zend_atol/zend_atoi do more than what their name imply, and have non-intuitive behaviors:
Surprisingly, zend_atol/zend_atoi are used exclusively for parsing ini values (at least in php-src). They are used primarily in the
OnUpdateLong
handler and other variants of this handler.The behavior described above is causing ill-formatted ini values to be interpreted in non-intuitive ways and to be silently accepted.
In this PR, I want to address two things:
zend_atol()
/zend_atoi()
in the future, since their behavior is not reflected by their nameThe non-intuitive behavior is made more visible to users by triggering an E_WARNING when an ini value is ill-formatted, without otherwise changing the behavior itself (so, all values continue to be interpreted and accepted as before). This will make it easier for users to spot mistakes in ini settings.
I didn't add the warning directly in
zend_atol()
/zend_atoi()
, for two reasons:So, this PR introduces a new function:
zend_ini_parse_quantity()
, markszend_atol()
/zend_atoi()
as deprecated so that they could be removed later, and replaces every call tozend_atol()
/zend_atoi()
in php-src by a call tozend_ini_parse_quantity()
.Visible changes from a user point of view:
Setting an ini parameter with an ill-formatted value in an ini file or via ini_set() may trigger a E_WARNING (the value is accepted and its interpretation is not changed) (full list of affected ini parameters: https://gist.github.com/arnaud-lb/47572a0ed8ae458704de2659ce03ff78).
Visible changes from an extension point of view:
The
zend_atol()
/zend_atoi()
functions are marked as deprecated, so this may generate a compiler warning.I believe that these changes are backwards compatible.