Skip to content

Fix bug in mb_get_substr_slow (sometimes outputs wrong number of characters) #12983

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 21, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions ext/mbstring/mbstring.c
Original file line number Diff line number Diff line change
Expand Up @@ -2092,9 +2092,10 @@ static zend_string* mb_get_substr_slow(unsigned char *in, size_t in_len, size_t
if (from >= out_len) {
from -= out_len;
} else {
enc->from_wchar(wchar_buf + from, MIN(out_len - from, len), &buf, !in_len || out_len >= len);
size_t needed_codepoints = MIN(out_len - from, len);
enc->from_wchar(wchar_buf + from, needed_codepoints, &buf, !in_len || out_len >= len);
from = 0;
len -= MIN(out_len, len);
len -= needed_codepoints;
}
}

Expand Down
6 changes: 6 additions & 0 deletions ext/mbstring/tests/mb_substr.phpt
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,11 @@ echo "Regression:\n";
$str = "\xbd\xbd\xbd\xbd\xbd\xbd\xbd\xbe\xbd\xbd\xbd\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x89\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x00\x00\x00\x00\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8b\x8f\x8f\x8f\x8f\x8f\x8f\x8f\x8f\x8f\x8f\x8f\x8f\x8f\x8f\x8f\x8f\x8f\x8f\x8f\x8f\x8f\x8f\x8f\x8f\x8f\x8f\x8f\x8f\x8f\x8f\x8f\x8f\x8f\x8f\x8f\x8f\x8f\x8f\x8f\x8b\x8b\x8b\xbd\xbd\xbd\xbd\xbd\xbd\xbd\xbe\x01:O\xaa\xd3";
echo bin2hex(mb_substr($str, 0, 128, "JIS")), "\n";

/* Alex messed up when reimplementing mb_substr and, in cases where `from` is non-zero and
* the number of characters to extract is more than 128, miscalculated where to end the substring
* Thanks to Maurício Fauth for finding the issue */
var_dump(mb_substr('Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum dapibus feugiat ex non cursus. Pellentesque vestibulum tellus sit lectus.', 19, -1));
Comment on lines +136 to +139
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make this a separate test file altogether?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kamil-tekiela Thanks for the suggestion. Personally I prefer to have this particular test here rather than multiplying the number of different test files, but certainly some might prefer to have each test case in a different file. In any case, I do appreciate your bringing out this suggestion.


?>
--EXPECT--
EUC-JP:
Expand Down Expand Up @@ -213,3 +218,4 @@ Testing agreement with mb_strpos on invalid UTF-8 string:
?AAA
Regression:
1b28493d3d3d3d3d3d3d3e3d3d3d1b28423f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f000000003f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f1b28493d3d3d3d3d3d3d3e1b2842013a4f1b28492a1b2842
string(121) "it amet, consectetur adipiscing elit. Vestibulum dapibus feugiat ex non cursus. Pellentesque vestibulum tellus sit lectus"