Skip to content

Deprecate zend_atol() / add zend_ini_parse_quantity() #7951

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jun 17, 2022

Conversation

arnaud-lb
Copy link
Member

Trying to revisit #4132:

@sgolemon noticed that zend_atol/zend_atoi do more than what their name imply, and have non-intuitive behaviors:

zend_ato[il] don't just do number parsing.
They also check for a 'K', 'M', or 'G' at the end of the string,
and multiply the parsed value out accordingly.

Unfortunately, they ignore any other non-numerics between the
numeric component and the last character in the string.
This means that numbers such as the following are both valid
and non-intuitive in their final output.

"123KMG" is interpreted as "123G" -> 132070244352
"123G " is interpreted as "123 " -> 123
"123GB" is interpreted as "123B" -> 123
"123 I like tacos." is also interpreted as "123." -> 123 

Surprisingly, zend_atol/zend_atoi are used exclusively for parsing ini values (at least in php-src). They are used primarily in the OnUpdateLong handler and other variants of this handler.

The behavior described above is causing ill-formatted ini values to be interpreted in non-intuitive ways and to be silently accepted.

In this PR, I want to address two things:

  • Making the non-intuitive behavior more visible to users, so that mistakes can be more easily noticed and fixed
  • Preventing accidental usage of zend_atol() / zend_atoi() in the future, since their behavior is not reflected by their name

The non-intuitive behavior is made more visible to users by triggering an E_WARNING when an ini value is ill-formatted, without otherwise changing the behavior itself (so, all values continue to be interpreted and accepted as before). This will make it easier for users to spot mistakes in ini settings.

I didn't add the warning directly in zend_atol() / zend_atoi(), for two reasons:

  • Adding the warning directly in these functions could lead to unwanted side-effects for unsuspecting code outside of php-src
  • These functions must be renamed or removed anyways because they have evolved towards something every ini-specific

So, this PR introduces a new function: zend_ini_parse_quantity(), marks zend_atol() / zend_atoi() as deprecated so that they could be removed later, and replaces every call to zend_atol() / zend_atoi() in php-src by a call to zend_ini_parse_quantity().

Visible changes from a user point of view:

Setting an ini parameter with an ill-formatted value in an ini file or via ini_set() may trigger a E_WARNING (the value is accepted and its interpretation is not changed) (full list of affected ini parameters: https://gist.github.com/arnaud-lb/47572a0ed8ae458704de2659ce03ff78).

Visible changes from an extension point of view:

The zend_atol() / zend_atoi() functions are marked as deprecated, so this may generate a compiler warning.

I believe that these changes are backwards compatible.

@arnaud-lb arnaud-lb mentioned this pull request Jan 16, 2022
Zend/zend.c Outdated
zend_long val = zend_atol(ZSTR_VAL(new_value), ZSTR_LEN(new_value));
zend_long val = zend_ini_parse_quantity(new_value, &errstr);
if (errstr) {
zend_error(E_WARNING, "Invalid \"%s\" setting: %s", ZSTR_VAL(entry->name), ZSTR_VAL(errstr));
Copy link
Member Author

@arnaud-lb arnaud-lb Jan 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are existing examples of E_WARNING in ini handlers with similar error formats, for example:

zend_error(E_WARNING, "Invalid \"%s\" setting. Should be between 0 and %d", ZSTR_VAL(entry->name),

@Girgias
Copy link
Member

Girgias commented Jan 16, 2022

Shouldn't the new tests be in ext/zend_test/tests ?

@cmb69
Copy link
Member

cmb69 commented Jan 16, 2022

Shouldn't the new tests be in ext/zend_test/tests ?

I'm not sure why we have tests there in the first place. zend_test is an extension which offers auxilliary code which is needed to test some behavior of other code. It's that other code that is to be tested, and so the tests should go with that code. Making sure that zend_test is available when the test is executed, needs to be done anyway.

@arnaud-lb arnaud-lb force-pushed the zend-atol-cleanup-v2 branch 2 times, most recently from 2da0d7e to 953b7be Compare January 16, 2022 22:03
@Girgias
Copy link
Member

Girgias commented Jan 17, 2022

Shouldn't the new tests be in ext/zend_test/tests ?

I'm not sure why we have tests there in the first place. zend_test is an extension which offers auxilliary code which is needed to test some behavior of other code. It's that other code that is to be tested, and so the tests should go with that code. Making sure that zend_test is available when the test is executed, needs to be done anyway.

I think because it is easier to find what is testing what, and possibly easier to run the tests, as running all of Zend while making a change to that might be a bit annoying?

I personally find the tests in Zend/ to be very disorganised and kinda annoying to deal with when I'm iterating on a change. But this is just my opinion. I don't really mind where they are, but because the other ones are in zend_test it makes some sense to me for them to be with the others there.

@ramsey
Copy link
Member

ramsey commented May 27, 2022

Would #8454 supersede or complement this?

@ramsey ramsey added this to the PHP 8.2 milestone May 27, 2022
@arnaud-lb
Copy link
Member Author

I think that it would complete this by exposing the functionality of zend_atol() / zend_ini_parse_quantity() to user land.

@arnaud-lb arnaud-lb force-pushed the zend-atol-cleanup-v2 branch 2 times, most recently from e4aac50 to 67bc4cf Compare June 3, 2022 16:03
Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the phrasing of the error message and a question this LGTM just by staring at the code

@@ -53,6 +53,7 @@ ZEND_BEGIN_MODULE_GLOBALS(zend_test)
int register_passes;
bool print_stderr_mshutdown;
zend_test_fiber *active_fiber;
zend_long quantity_value;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to be set in any test.

Zend/zend.c Outdated
EG(fiber_stack_size) = zend_atol(ZSTR_VAL(new_value), ZSTR_LEN(new_value));
EG(fiber_stack_size) = zend_ini_parse_quantity(new_value, &errstr);
if (errstr) {
zend_error(E_WARNING, "Invalid \"%s\" setting: %s", ZSTR_VAL(entry->name), ZSTR_VAL(errstr));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a variant that also emits the error to avoid repeating this code? It also wouldn't need the temporary error string, which would simplify the API for the common use case where the warning is emitted.

Zend/zend_ini.h Outdated
* character K, M, or G (for 2**10, 2**20, and 2**30, respectively).
*
* The digits are parsed as decimal unless the first character is '0', in which
* case they are parsed as octal.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Octal numbers aren't tested.

Zend/zend_ini.c Outdated
@@ -30,6 +30,8 @@ static HashTable *registered_zend_ini_directives;
#define NO_VALUE_PLAINTEXT "no value"
#define NO_VALUE_HTML "<i>no value</i>"

#define ZEND_IS_WHITESPACE(c) (((c) == ' ') || ((c) == '\t') || ((c) == '\n') || ((c) == '\r') || ((c) == '\v') || ((c) == '\f'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this an inlined function? It's easy to mess up if using the macro like this: ZEND_IS_WHITESPACE(*x++);

Zend/zend_ini.c Outdated
* For now overflow is silently ignored -- not clear what else can be
* done here, especially as the final result of this function may be
* used in an unsigned context (e.g. "memory_limit=3G", which overflows
* zend_long on 32-bit, but not size_t). */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if we could warn on overflow and on truncation to 32-bit. Detecting overflow is pretty simple. https://stackoverflow.com/a/1815371/1320374

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely. I wanted to do this just after this PR, but I will do it now

Zend/zend_ini.c Outdated
while (ZEND_IS_WHITESPACE(*digits_end)) ++digits_end;

/* No exponent suffix. */
if (!*digits_end) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can NUL bytes be inserted via ini setting? If so, the rest of the string would be ignored. Not a major issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well spotted! This would have been a breaking change (minor, but still a breaking change)

@dmsnell
Copy link
Contributor

dmsnell commented Jun 9, 2022

To carry the conversation from #8454 here, should we not go ahead and expose ini_parse_quantity() to the user space so applications can know what the parsed value is? That function/interface doesn't depend on ATOL in any way, but it does depend on "how PHP parses the 'shorthand notation'"

Further, what implications do we have by introducing a split between signed and unsigned shorthand parsing? Is that a non-issue because every case where we've been parsing the "shorthand notation" is now being parsed with ini_parse_quantity() instead of ini_parse_uquantity()? or should the difference be that applications which want to know the parsed value should also know if they are read internally as signed or unsigned values?

@arnaud-lb arnaud-lb force-pushed the zend-atol-cleanup-v2 branch 2 times, most recently from 6b8ce18 to 5b15654 Compare June 10, 2022 16:25
--FILE--
<?php

var_dump(ini_set("zend_test.quantity_value", "1MB"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also do a test where this is in the --INI-- section to see if that triggers the warning?

Zend/zend_ini.c Outdated
}

*errstr = NULL;
return (zend_long) retval;
Copy link
Member

@iluuu1994 iluuu1994 Jun 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converting unsigned numbers to signed numbers when out-of-range is implementation defined. Do all architectures supported by PHP agree on this behavior?

Edit: Actually what I was reading was about implicit conversion, I'm not sure what applies to casting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right that it's UB. The behavior of implicit conversions and casts is defined by the same notion of conversion.

I would expect that all architectures/compilers supported by PHP do agree on this behavior. If my understanding is correct, in twos complement architectures this behavior is obtained by not changing the representation / doing nothing, so I would expect that this is what compilers do. The behavior is documented at least in GCC. An other consideration in favor of this is that we depend on it in a few places, including in zend_atol().

I've removed this conversion anyway, so that the function is UB-free at least in the zend_ini_parse_uquantity() variant :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior is implementation defined, not undefined. We absolutely need to avoid the latter, but relying on the former can be okay. It is in this case (conversion/cast from unsigned to signed).

@dmsnell
Copy link
Contributor

dmsnell commented Jun 13, 2022

Sounds good, thanks @arnaud-lb. I'll keep my branch updated and try to adjust after this merges.

Note that valid values returned by zend_ini_parse_uquantity may overflow the PHP int type. We could return a string when exposing this to userspace.

This also seems fair but it takes a step backwards again that I was hoping to get past. I think (maybe my understanding was wrong) before this change those internals were shielded because "shorthand notation" was treated the same everywhere? Afterwards we have one exception, memory_size, and probably will have more in the future. That eliminates the convenience that I was taking advantage of, that you wouldn't need to know which ini directive you were looking at in order to know how PHP would interpret it.

Am I wrong in that this is not just refactoring the parsing code but also introducing a breaking change on memory_size?

The whole "shorthand notation" thing and INI directive parsing is funny: given that PHP doesn't indicate in its docs what that means and how it doesn't expose how it interprets it I'm surprised this hasn't been more of a problem (and the most popular projects out there try to guess the value but do so in a very wrong way which leads people to thing that 14mb means 14MiB when in fact it means 14 bytes).

Any thoughts on the matter as it concerns application-level code?

@arnaud-lb
Copy link
Member Author

Am I wrong in that this is not just refactoring the parsing code but also introducing a breaking change on memory_size?

This is hopefully not introducing a breaking change. memory_limit shorthand values should result in the same effective value as before.

The whole "shorthand notation" thing and INI directive parsing is funny: given that PHP doesn't indicate in its docs what that means and how it doesn't expose how it interprets it I'm surprised this hasn't been more of a problem (and the most popular projects out there try to guess the value but do so in a very wrong way which leads people to thing that 14mb means 14MiB when in fact it means 14 bytes).

Agreed :) This PR is one step to solve this. The warnings will help users to notice when their ini settings are not interpreted as they though they were.

Any thoughts on the matter as it concerns application-level code?

I agree with #8454 (comment):

While I usually would suggest to write this function in PHP, given that zend_atol()'s behavior might change over time (I certainly hope so), exposing that function to userland is fine for me.

Unsigned quantities complicates this a bit though.

Users will need to explicitly chose between signed or unsigned.

And unsigned quantities do not fit in a PHP int, so these can not be returned as ints.

These two aspects would make this error prone, especially since doing the wrong thing would work most of the time (like casting to int or using the wrong signed-ness).

This is also inconvenient. The return value can not be used directly without gmp or similar functionality.

An alternative to exposing ini_parse_quantity() would be to expose a function that can compare quantities, like version_compare() but for ini quantities. This seems to match the use-case described in #7944, and eliminates some risk (wrongly casting/comparing to int) and some inconveniences (needing gmp or similar).

function ini_quantity_less_than(string $a, string $b): bool;
function ini_quantity_less_than_unsigned(string $a, string $b): bool;

// Use-case from #7944
function memory_limit_set_at_least(string $at_least): void
{
    if (ini_quantity_less_than_unsigned(ini_get('memory_limit'), $at_least)) {
       ini_set('memory_limit', $at_least);
    }
}

@arnaud-lb
Copy link
Member Author

Adding just ini_quantity_less_than and ini_quantity_less_than_unsigned is enough for all comparison use-cases. Though something closer to version_compare() may be easier to use:

ini_quantity_compare(string $a, string $b, string $operator): bool;
ini_quantity_compare_unsigned(string $a, string $b, string $operator): bool;

@dmsnell
Copy link
Contributor

dmsnell commented Jun 16, 2022

An alternative to exposing ini_parse_quantity() would be to expose a function that can compare quantities, like version_compare() but for ini quantities.

Apart from the unsigned bit this is the approach I've been taking in WordPress to handle these values better. Currently we've been comparing the value returned from ini_get which actually is a bit bonkers since we might end up comparing something like -1 to something like 256m and expect it to work.

https://github.com/WordPress/wordpress-develop/pull/2660/files#diff-d70ed9709be54eb5bf0913ef8f4d26e837f696dd42d08032222e755878e203cbR80-R112

Complications I've found is knowing what's most appropriate to pass through as the result of these comparison functions. Return the parsed value or return the string ini value? Returning whichever ini value is less than/greater than seems to work okay so that's what I've been pursuing.

function ini_quantity_min(string $a, string $b): string;
function ini_quantity_min_unsigned(string $a, string $b): string;

ini_quantity_min('999', '1m') == '999'

arnaud-lb and others added 17 commits June 17, 2022 12:51
zend_atol() and zend_atoi() don't just do number parsing.
They also check for a 'K', 'M', or 'G' at the end of the string,
and multiply the parsed value out accordingly.

Unfortunately, they ignore any other non-numerics between the
numeric component and the last character in the string.
This means that numbers such as the following are both valid
and non-intuitive in their final output.

* "123KMG" is interpreted as "123G" -> 132070244352
* "123G " is interpreted as "123 " -> 123
* "123GB" is interpreted as "123B" -> 123
* "123 I like tacos." is also interpreted as "123." -> 123

Currently, in php-src these functions are used only for parsing ini values.

In this change we deprecate zend_atol(), zend_atoi(), and introduce a new
function with the same behaviour, but with the ability to report invalid inputs
to the caller. The function's name also makes the behaviour less unexpected:
zend_ini_parse_quantity().

Co-authored-by: Sara Golemon <pollita@php.net>
@arnaud-lb arnaud-lb force-pushed the zend-atol-cleanup-v2 branch from 5b15654 to 1d4dfa5 Compare June 17, 2022 11:06
@arnaud-lb
Copy link
Member Author

arnaud-lb commented Jun 17, 2022

@dmsnell

Returning whichever ini value is less than/greater than seems to work okay so that's what I've been pursuing.

Agreed. This solution is a good compromise and matches the use-case.

My personal preference would be that #8454 is changed to:

  • Not expose zend_ini_parse_quantity, zend_ini_parse_uquantity directly
  • Add two new PHP functions: ini_quantity_compare and ini_quantity_compare_unsigned

Then, functions such as ini_quantity_min or memory_limit_set_at_least could be implemented in pure PHP (Edit: I mean in PHP code, by using ini_quantity_compare and ini_quantity_compare_unsigned)

@arnaud-lb arnaud-lb merged commit efc8f0e into php:master Jun 17, 2022
@dmsnell
Copy link
Contributor

dmsnell commented Jun 20, 2022

Not expose zend_ini_parse_quantity, zend_ini_parse_uquantity directly

@arnaud-lb this is missing the point for which #8454 was created though. it's important for projects to be able to convey to their users what the actual limits are. granted, we can do this with ini_quantity_compare() by starting at 1 and increasing until we are no longer less-than 😆

the point is that projects like WordPress need to be able to show someone on an upload screen that "you can upload files smaller than 128 MiB" if 128MiB is the limit. for signed quantities that don't involve the change-of-behavior that the unsigned parsing brings I think that's fine. the interpretation which ini_parse_quantity allows is wildly better than the various ways some of these projects are doing it today, and the unsigned bit just leaves one particular scenario in which it is likely to be just as bad as it was before.

maybe we can add ini_quantity_compare in a follow-up PR to keep #8454 small and focused?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants