-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Add ini_parse_quantity to convert ini directive "byte" shorthand to int #8454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you for the PR! To fix CI, please re-generate basic_functions_arginfo.h again, so that it gets a new stub hash (see beginning of the file). It seems to me this is the most pragmatic approach for #7944 and related issues. While I usually would suggest to write this function in PHP, given that This should still be discussed on the internals mailing list. Are you willing to write a message to that list? |
Much obliged for the review @cmb69. I'll definitely get this updated and regenerate the stub hash (sorry I missed that when editing Will try and write a message on the mailing list today or early next week. 🙇♂️ |
There is no need to edit this file manually. You can just do:
|
@dmsnell thanks for this proposal. I like this function and I wish I had it in the past when I need it for some projects. My minor comment is about the proposed function name:
|
@javiereguiluz I'm not the biggest fan of it either. I considered more verbose names as you suggest, though I found the brevity of another one I thought of decoupled it from the "bytes" language because it appears like it's really not related to bytes: decoupled because multiple non-byte values use this same parsing, anything running
I'm getting ready to post on the mailing list; I'll ask for feedback on the name because I think it's not entirely clear to me what we should be calling this since the existing naming around these values (byte sizes) doesn't fully overlap how they are used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the internal function name has nothing to do with INI parsing why call it ini_get
? The purpose ofc is to parse string representation of values written with suffixes, but IMO this could also be called atol(string $value): int
or something different w/o ini_
in name. Any thoughts?
@@ -493,6 +493,8 @@ function ini_alter(string $option, string|int|float|bool|null $value): string|fa | |||
|
|||
function ini_restore(string $option): void {} | |||
|
|||
function ini_bytes(string $option): int {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO name of the parameter should be something like $value
instead of $option
as according to your examples in the PR description what you pass here is a string representation of a value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the good catch. That was an artifact of starting by trying to return the internal value for a php.ini directive before I gave up and realized that was too broad of a scope.
Renamed to $shorthand
in 7f626f1
ext/standard/basic_functions.c
Outdated
Z_PARAM_STRING(shorthand, shorthand_len) | ||
ZEND_PARSE_PARAMETERS_END(); | ||
|
||
RETVAL_LONG(zend_atol(shorthand, shorthand_len)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it fail? What happens if it fails? I see no test for valid and non-valid values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe that the function can fail and there are no truly invalid values. That is, for values that aren't valid from a use standpoint they will return 0
.
For certain parameters like post_max_size
that means if we don't have a legitimate configuration value we won't apply any limit, whereas a legitimate value is any value that returns > 0 from this function, but negative values and outright ridiculous values are still allowed.
I would highly prefer #7951 over this and expose the new zend_ini parser function to userland in ext/std instead of having just testing it in zend_test then have this function which relies on |
@brzuchal this does have everything to do with INI parsing and that's why I picked a related name. That is, I wouldn't want to expose some That brings up what @Girgias mentioned in #7951
Actually I'm all in favor of that too, though I would think there's still some reason to abstract it behind a function related to the existing On that note, I do like the ring of |
Mailing list thread, for reference: |
There wasn't much feedback on the mailing list, and the feedback you have received has been positive, so I'm adding this to the 8.2. milestone. Please write tests for the functionality. |
@ramsey thanks for your help. I've added a test file, and read through the documentation on adding tests, but I'm not very confident I made the right choices, and I didn't see a way to test locally so I'm not sure - maybe I'm missing some major things here. Would you mind giving me at least a high-level review of the test file and let me know generally if I should be doing this differently? (If the approach is sound I would like to add more test cases that do a better job covering the expected behaviors, but don't want to do all that up-front if the general approach is wrong). |
1048576 | ||
1073741824 | ||
14 | ||
0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test looks good, but it'll fail. 😄
To run tests locally, use the run-tests.php
script in the root of php-src. After you build PHP with your changes (using make
), you can run the test like this:
php run-tests.php -p sapi/cli/php tests/basic/ini_parse_quantity_basic.phpt
Where -p sapi/cli/php
is pointing to the new PHP executable that you just built that includes your changes.
There are many options you can pass to run-tests.php
. You can check them out by entering php run-tests.php -h
.
When you have a failing test, run-tests.php
will create a bunch of additional files in the same directory alongside your test file, which you can use for debugging.
For example, when tests/basic/ini_parse_quantity_basic.php
fails, you'll see the following new files created:
tests/basic/ini_parse_quantity_basic.diff
tests/basic/ini_parse_quantity_basic.expm
tests/basic/ini_parse_quantity_basic.log
tests/basic/ini_parse_quantity_basic.out
tests/basic/ini_parse_quantity_basic.php
tests/basic/ini_parse_quantity_basic.sh
Take a look at them to see what all they contain. *.out
will tell you what the text actually printed out, while *.diff
will show a diff of what was expected vs. what was actually printed.
For more information, check out the following:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all great, and I'm quite happily surprised at how easy the testing setup is. Nothing wrong with test that fail before they succeed either 😄
These run locally for me, but I couldn't build them when I rebased against master
(I think because of changes introduced in 8.1? Could not build PHP with the following error on make
make: *** No rule to make target `/Users/dmsnell/code/php-src/main/php_stdint.h', needed by `ext/opcache/ZendAccelerator.lo'. Stop.
I'll try rebasing again in a work day or two as I presume this will be fixed upstream quickly (it's there for me in master
anyway).
Maybe I'll hold off on a more exhaustive test suite until the interplay with #7951 is clearer. The zend_test_ini_parse_quantity
functions there I think could be renamed as ini_parse_quantity
and replace this entire patch 🙃
Note that a new API is in the works, deprecating |
Thank you @iluuu1994 - I've been following that patch and have referenced it a number of times in this PR. Based on what I've seen it's not incompatible with this function, or possibly someone will want to incorporate this public function into that patch and close my work (which would be a bummer, since I was hoping I'd get at least a mention for this contribution, but I understand). The question I have is whether these are both slated to appear in the same release, assuming I can update all the tests. I've been absent for a couple weeks due to a WordPress conference that took my attention. If so, maybe we should merge the two PRs so that there's never a time when we have to update this code to account for those changes. If not, maybe this is still okay to push to Concerning the change of behavior introduced in #7951 I think we're just going to have to deal with it and that's fine too. It's just a breaking change. But, I still don't see that being any different with or without this function; just a matter of whether we expose this API sooner or later. Am I missing something? I'm more curious about @ramsey I'm going to try and fix the tests here regardless, but maybe this work should be joined with #7951? Can you share your thoughts on that? Also, I'd still like to update the documentation to indicate what this is and how it's possible to read the parsed value; I don't see that in this repository - can you (or anyone) point me in the direction to find those? |
@cmb69 for the life of me I haven't been able to build this in recent
Would like to rebase this and update it with the changes from #7951 and I guess I can just rely on the tests and CI if I need to. Would be nice to run and test locally 😄 |
@dmsnell Have you tried |
And if |
yes thank you @cmb69! I should have known as I've been burnt by hiding files before. |
y'all have been super helpful. I've updated against #7951 and added a new test just to make sure we throw warnings when given some invalid/inappropriate values. for now I think what this needs is a final look over and any instruction you want to give me as for what you want to see in the tests. there was that final thought that maybe we should just rename the function to |
Headers are tracked now and trigger recompilation, the issue is when build dependencies are changed (i.e. new files added or removed) then |
Not sure why the Travis build is failing. Seems unrelated but I don't appear to be able to restart the job to check. |
@dmsnell Travis frequently fails for no reason. You don't need to restart. I just triggered the GitHub tests, let's see if those pass. |
There is indeed a single notation, but different accepted ranges. Most ini settings accept values in the range [-2^63, 2^63-1] (on 64bit systems), but memory_limit accepts values in the range [0, 2^64]. The functions zend_ini_parse_quantity() and zend_ini_parse_uquantity() use different range checks, but otherwise behave the same. Shorthand notations outside of the range trigger a warning and the return value is unspecified. Note that this doesn't change the behavior that existed before #7951, appart from the new warnings. Unfortunately, using only It's fine with me if others agree. Some solutions which I suggested in #7951 would be to add comparison functions similar to |
Thanks for the feedback @arnaud-lb
as noted over in that other PR this is a solution for a different problem, but related to knowing the limits. even with those comparison functions though I guess won't we still have to know if PHP is using signed or unsigned parsing? that is, if we provide a for now I'm tempted to leave it hanging as a mystery given what you said, "doing the wrong thing would work most of the time." that is, maybe you can check me on this:
mainly this catches negative values for why did we not use In summary, my take from #7951 is that we should continue to leave it up to the application developer to realize that This approach is the scope-limited alternative to passing in the |
With WordPress we have seen a variety of bugs that stem from the fact that we don't reliably parse the memory size values from php.ini directives. We want to know those values in order to communicate to site users what the file upload limits are. In this patch I'm adding a new function `ini_bytes( $shorthand )` to return whatever PHP evaluates that value to be internally, thus exposing the actual byte size to the outside world. This eliminates the need for WordPress and other projects to port the config value parser since they could directly call it. Additionally it will reliably reproduce any quirks due to the platform build and overflow behaviors when handling the `g`, `m`, and `k` suffixes on `post_max_size` and `upload_max_filesize` (and others). This function is a shallow wrapper around `zend_atoi()` and could become out of sync with the actual ini directive parsing. I considered a variation of this function which would take a directive's name and return the stored internal value directly, keeping it in sync. ```php switch ( param_name ) { case 'post_max_size': RETVAL_LONG(SG(post_max_size)); } ``` After looking briefly at this option I dismissed it for being a bit too coupled to those options, too demanding to add all the possible config options to a big switch statement, and too complicated for returning all of the possible return types stored in the SAPI globals. This is my first attempt at patching PHP and I don't know what I'm doing :) I'm not thrilled at the function name but I needed something. I'm not aware of what I need to write for testing, documentation, etc... but I'm happy to add whatever is required. The expecation is to use it in situations roughly like this: ```php function max_upload_size() { $u_bytes = ini_bytes( ini_get( 'upload_max_filesize' ) || 0 ); $p_bytes = ini_bytes( ini_get( 'post_max_size' ) || 0 ); return min( $u_bytes, $p_bytes ); } ``` This patch also makes it easier to perform mathematic comparisons and operations on the config values, which again previously required duplicating the parsing outside of PHP and likely will fail for a variety of config values. ```php function validate_upload_size() { $u_bytes = ini_bytes( ini_get( 'upload_max_filesize' ) || 0 ); $p_bytes = ini_bytes( ini_get( 'post_max_size' ) || 0 ); return $u_bytes <= $p_bytes ? [ true ] : [ false, 'upload_max_filesize must be smaller than post_max_size' ]; } ```
Of all the naming suggestions so far this one seems to fit best with the purpose of the function.
ini_parse_quantity() allows values up to 2GiB in 32bit systems, and ini_parse_uquantity() allows values up to 4GiB So for example a I think that it's ok to add only |
right! I thought I had to have something wrong there. well, I don't like having to leak the internal implementation but I think this is the best we can get for the compromise. what needs to happen for this PR before merge? |
The implementation looks good to me, and the API is good for all ini settings but memory_limit. We will need other functions to handle the full range of @ramsey should we merge ? |
@arnaud-lb I think you can just merge it. If there are no objections, there is no need to wait for RM to confirm pre feature freeze. |
Merged. Thank you @dmsnell ! |
With WordPress we have seen a variety of bugs that stem from the fact that we
don't reliably parse the memory size values from php.ini directives. We want
to know those values in order to communicate to site users what the file
upload limits are.
In this patch I'm adding a new function
ini_bytes( $shorthand )
to returnwhatever PHP evaluates that value to be internally, thus exposing the actual
byte size to the outside world. This eliminates the need for WordPress and
other projects to port the config value parser since they could directly
call it. Additionally it will reliably reproduce any quirks due to the
platform build and overflow behaviors when handling the
g
,m
, andk
suffixes on
post_max_size
andupload_max_filesize
(and others).Related
Add memory_get_limit() function #8380
Adding a new function to get the memory size when I think we could use
ini_bytes()
to read the existing value reported for it instead.zend_atoi/zend_atol cleanup #4132
Highlighting some of the nuances of parsing values set for byte sizes
in the config. Some other non-byte-size configs also work through
atol
and can be similarly surprising when parsed.
Alternative approaches
This function is a shallow wrapper around
zend_atol()
and could becomeout of sync with the actual ini directive parsing. I considered a variation
of this function which would take a directive's name and return the stored
internal value directly, keeping it in sync.
After looking briefly at this option I dismissed it for being a bit too
coupled to those options, too demanding to add all the possible config
options to a big switch statement, and too complicated for returning all
of the possible return types stored in the SAPI globals.
This is my first attempt at patching PHP and I don't know what I'm doing :)
I'm not thrilled at the function name but I needed something.
If this idea is remotely valuable to people and the approach is fine
then I'll attempt to add all the necessary tests and cleanups according
to the contributing guidelines.
Expected uses
The expecation is to use it in situations roughly like this:
This patch also makes it easier to perform mathematic comparisons and
operations on the config values, which again previously required duplicating
the parsing outside of PHP and likely will fail for a variety of config values.
A strong benefit to this approach is that it does avoid having to answer
questions about the scope of the change, about handling details of the
internal representation of these config values, about predicting how a
different shorthand value will be interpreted, etc... In essence we leave
it to the PHP developer to choose to parse the values as they wish.
Something not in this patch which might be worth adding if the approach
and idea are valuable to the maintainers is to update the PHP docs to
link to this function where it mentions the shorthand but doesn't explain
how to interpret it or figure out what it actually is.