Skip to content

Fix inconsistency between grapheme_substr() and substr() #6163

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion ext/intl/grapheme/grapheme_string.c
Original file line number Diff line number Diff line change
Expand Up @@ -375,7 +375,9 @@ PHP_FUNCTION(grapheme_substr)
RETURN_THROWS();
}

if ( OUTSIDE_STRING(lstart, str_len)) {
if (str_len == 0 && lstart == 0) {
RETURN_EMPTY_STRING();
} else if (OUTSIDE_STRING(lstart, str_len)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the real problem is in OUTSIDE_STRING, which should only consider lstart > str_len to be out-of-bounds, not lstart >= str_len. "One past the end" is generally always considered a valid offset (also in strpos etc).

Though this is just a "definitely wrong" check, as this is a grapheme cluster offset, the actual offset check happens later...

Copy link
Contributor

@chschneider chschneider Sep 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think even lstart > str_len should be allowed for grapheme_substr() (and probably other functions). Whether it should return "" or false in that case is up for debate but throwing an exception IMHO cripples the API and makes it inconsistent with substr/mb_substr.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into this tonight

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've applied a fix for the main problem in 1312c41 (there are still issues for the non-ASCII case). For the grapheme_substr() case, I would first like to decide on a final behavior for substr(), before we adjust this one (#6182).

zend_argument_value_error(2, "must be contained in argument #1 ($string)");
RETURN_THROWS();
}
Expand Down
31 changes: 31 additions & 0 deletions ext/intl/tests/grapheme_substr.phpt
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
--TEST--
Test grapheme_substr() function
--SKIPIF--
<?php if( !extension_loaded( 'intl' ) ) print 'skip'; ?>
--FILE--
<?php

ini_set("intl.error_level", E_WARNING);

var_dump(grapheme_substr("", 0, 5));

try {
grapheme_substr("", 1, 5);
} catch (ValueError $exception) {
echo $exception->getMessage() . "\n";
}

var_dump(grapheme_substr("abc", 0, 5));

try {
grapheme_substr("abc", 3, 5);
} catch (ValueError $exception) {
echo $exception->getMessage() . "\n";
}

?>
--EXPECT--
string(0) ""
grapheme_substr(): Argument #2 ($start) must be contained in argument #1 ($string)
string(3) "abc"
grapheme_substr(): Argument #2 ($start) must be contained in argument #1 ($string)