Skip to content

Commit c1a37c4

Browse files
committed
Optimize mb_strcut for text encodings with mblen_table
For legacy text encodings where mb_strcut is implemented using an mblen_table (such as the various SJIS variants), mb_strcut is now ~30% faster on small strings (about 10 bytes). This is because we are now avoiding an extra, unnecessary copy operation on the output string. When used on large strings, the difference in performance is negligible, as almost the entire runtime is spent stepping through the string to find the starting and ending cut points.
1 parent 775fb31 commit c1a37c4

File tree

1 file changed

+23
-0
lines changed

1 file changed

+23
-0
lines changed

ext/mbstring/mbstring.c

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2455,6 +2455,29 @@ PHP_FUNCTION(mb_strcut)
24552455
RETURN_STR(zend_string_init_fast((const char*)(string.val + from), len & -char_len));
24562456
}
24572457

2458+
if (enc->mblen_table) {
2459+
const unsigned char *mbtab = enc->mblen_table;
2460+
const unsigned char *p, *q, *end;
2461+
int m = 0;
2462+
/* Search for start position */
2463+
for (p = (const unsigned char*)string.val, q = p + from; p < q; p += (m = mbtab[*p]));
2464+
if (p > q) {
2465+
p -= m;
2466+
}
2467+
const unsigned char *start = p;
2468+
/* Search for end position */
2469+
if (len >= string.len - (start - (const unsigned char*)string.val)) {
2470+
end = (const unsigned char*)(string.val + string.len);
2471+
} else {
2472+
for (q = p + len; p < q; p += (m = mbtab[*p]));
2473+
if (p > q) {
2474+
p -= m;
2475+
}
2476+
end = p;
2477+
}
2478+
RETURN_STR(zend_string_init_fast((const char*)start, end - start));
2479+
}
2480+
24582481
ret = mbfl_strcut(&string, &result, from, len);
24592482
ZEND_ASSERT(ret != NULL);
24602483
RETVAL_STRINGL((char *)ret->val, ret->len); /* the string is already strdup()'ed */

0 commit comments

Comments
 (0)