Skip to content

zend_atoi/zend_atol cleanup #4132

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

zend_atoi/zend_atol cleanup #4132

wants to merge 2 commits into from

Conversation

sgolemon
Copy link
Member

@sgolemon sgolemon commented May 8, 2019

zend_ato[il]() don't just do number parsing.
They also check for a 'K', 'M', or 'G' at the end of the string,
and multiply the parsed value out accordingly.

Unfortunately, they ignore any other non-numerics between the
numeric component and the last character in the string.
This means that numbers such as the following are both valid
and non-intuitive in their final output.

  • "123KMG" is interpreted as "123G" -> 132070244352
  • "123G " is interpreted as "123 " -> 123
  • "123GB" is interpreted as "123B" -> 123
  • "123 I like tacos." is also interpreted as "123." -> 123

This diff primarily adds warnings for these cases when the output
would be a potentially unexpected, and unhelpful value.

Additionally, several places in php-src use these functions
despite not actually wanting their KMG behavior such as
session.upload_progress.freq which will happily parse "1 banana"
as a valid value.

For these settings, I've switched them to ZEND_STRTOL which preserves
their existing /intended/ behavior.

  • It won't respect KMG suffixes, but they never really wanted that logic anyway.
  • It will ignore non-numeric suffixes so as to not introduce new failures.

We should probably reexamine that second bullet point separately.

Lastly, with these changes, zend_atoi() is no longer used in php-src,
but I left it as a valid API for 3rd party extensions.
Note that I did make it a proxy to zend_atol() since deferring the
truncation till the end is effectively the same as truncation during
multiplication, but this avoid code duplication.

I think we should consider removing zend_atoi() entirely (perhaps in 8.0?)
and rename zend_atol() to something reflecting it's KMG specific behavior.

@@ -149,7 +149,7 @@ static ZEND_INI_MH(OnUpdateAssertions) /* {{{ */

p = (zend_long *) (base+(size_t) mh_arg1);

val = zend_atol(ZSTR_VAL(new_value), ZSTR_LEN(new_value));
val = ZEND_STRTOL(ZSTR_VAL(new_value), NULL, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit these fixes separately?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to split 'em up, sure.

}
}
return retval;
zend_error(E_WARNING, "Invalid numeric string '%s', interpreting as '%s%s'", ZSTR_VAL(strcopy), ZSTR_VAL(numonly), suffixstr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use %.*s here with explicitly passed length to avoid those string copies? Can also use %c instead of the suffixstr.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. %.*s was the format I was trying to recall. Thanks.

On the suffixstr one, I don't want an empty space introduced when it's \0.

@nikic
Copy link
Member

nikic commented May 9, 2019

What happens if you pass just M for example?

Some tests would be nice :) Otherwise this looks good.

@sgolemon
Copy link
Member Author

sgolemon commented May 9, 2019

What happens if you pass just M for example?

The same thing that happened before. ZEND_STRTOL() gives us 0 which is multiplied by 1024 * 1024, yielding 0. We should probably add a warning for that as well.

Some tests would be nice :) Otherwise this looks good.

Yeah, in my email to internals I mentioned that tests are coming, I just wanted to get the conversation going first.

@@ -828,7 +828,7 @@ int zend_startup(zend_utility_functions *utility_functions) /* {{{ */
{
char *tmp = getenv("USE_ZEND_DTRACE");

if (tmp && zend_atoi(tmp, 0)) {
if (tmp && ZEND_STRTOL(tmp, NULL, 0)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the base argument to ZEND_STRTOL be 10 (elsewhere as well)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these changes I chose not to make any unnecessary changes.
If we set "10" here, then ONLY base 10 strings will be respected.
That's probably fine for most of these sites, but I don't have a pressing need to say "You can't use octal/hex".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, indeed. I thought zend_atoi() would only accept base-10 numbers (the misnomer is rather confusing).

@Llbe
Copy link

Llbe commented May 11, 2019

Will this make INI memory_limit more strict?

E.g. var_dump(ini_set('memory_limit', 'foo'), ini_get('memory_limit')); → "foo" is accepted as a value.

@sgolemon
Copy link
Member Author

sgolemon commented May 12, 2019

Will this make INI memory_limit more strict?

Yes and no. It'll warn on bad values, but it'll still accept them.

E.g. var_dump(ini_set('memory_limit', 'foo'), ini_get('memory_limit'));foo is accepted as a value.

It's a value, but it's a terrible one. :)
You're setting your memory limit to 0 there.

Specifically, you'd get:
Warning: Invalid numeric string 'foo', no valid leading digits, interpreting as '0'

$leadingWS, $sign, $midWS, $exp, $trailingWS);
var_dump($setting);

// Technically, negative values don't make sense for socket timout,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor typo? timout -> timeout

sgolemon added 2 commits May 15, 2019 12:06
zend_ato[il]() don't *just* do number parsing.
They also check for a 'K', 'M', or 'G' at the end of the string,
and multiply the parsed value out accordingly.

None of these use cases actually call for this behavior,
so invoke ZEND_STRTOL() separately.
zend_ato[il]() don't just do number parsing.
They also check for a 'K', 'M', or 'G' at the end of the string,
and multiply the parsed value out accordingly.

Unfortunately, they ignore any other non-numerics between the
numeric component and the last character in the string.
This means that numbers such as the following are both valid
and non-intuitive in their final output.

* "123KMG" is interpreted as "123G" -> 132070244352
* "123G " is interpreted as "123 " -> 123
* "123GB" is interpreted as "123B" -> 123
* "123 I like tacos." is also interpreted as "123." -> 123

This diff primarily adds warnings for these cases when the output
would be a potentially unexpected, and unhelpful value.

Additionally, with these changes, zend_atoi() is no longer used in php-src,
but I left it as a valid API for 3rd party extensions.
Note that I did make it a proxy to zend_atol() since deferring the
truncation till the end is effectively the same as truncation during
multiplication, but this avoid code duplication.

I think we should consider removing zend_atoi() entirely (perhaps in 8.0?)
and rename zend_atol() to something reflecting it's KMG specific behavior.
@arnaud-lb
Copy link
Member

I like this change

@sgolemon do you mind if I take over ?

arnaud-lb added a commit to arnaud-lb/php-src that referenced this pull request Dec 18, 2021
zend_atol() parses integers with size suffixes (like "128M"). Here zend_atol() is
replaced with ZEND_ATOL() when the intent was not to parse a size or quantity.

This builds on 989205e and phpGH-4132.
arnaud-lb added a commit to arnaud-lb/php-src that referenced this pull request Dec 18, 2021
zend_atol() parses integers with size suffixes (like "128M"). Here zend_atol() is
replaced with ZEND_ATOL() when the intent was not to parse a size or quantity.

This builds on 989205e and phpGH-4132.
arnaud-lb added a commit to arnaud-lb/php-src that referenced this pull request Dec 18, 2021
zend_atol() parses integers with size suffixes (like "128M"). Here zend_atol() is
replaced with ZEND_ATOL() when the intent was not to parse a size or quantity.

This builds on 989205e and phpGH-4132.
@arnaud-lb arnaud-lb mentioned this pull request Dec 18, 2021
break;
default:
/* Unknown suffix */
zend_error(E_WARNING, "Invalid numeric string '%.*s', interpreting as '%.*s'",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the warning message ("Invalid numeric string"). Should that be more like "invalid suffix"? Same a few lines below.

@ramsey
Copy link
Member

ramsey commented May 27, 2022

There are several PRs floating around right now that deal with this area:

I suspect that #7788 supersedes this PR and #7951 might even negate the need for it.

Thoughts on closing this one?

@cmb69
Copy link
Member

cmb69 commented May 27, 2022

Yeah, this PR should already have been closed.

@cmb69 cmb69 closed this May 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants