Skip to content

Cache pcre subpattern table #14424

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 3, 2024
Merged

Conversation

nielsdos
Copy link
Member

@nielsdos nielsdos commented Jun 1, 2024

Recreating this over and over is pointless, cache this as well.

Fixes GH-14423.

Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea seems to be sensible

subpat_names[name_idx] = zend_string_init(name, strlen(name), persistent);
if (persistent) {
GC_MAKE_PERSISTENT_LOCAL(subpat_names[name_idx]);
}
if (is_numeric_string(ZSTR_VAL(subpat_names[name_idx]), ZSTR_LEN(subpat_names[name_idx]), NULL, NULL, 0) > 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Complete aside, but this could use is_numeric_str_function as a future refactoring

Copy link
Member Author

@nielsdos nielsdos Jun 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to cover this branch in a test, but I'm not so sure we can get here. Compilation seems to fail with a similar error before we get here it seems.
EDIT: this was tested in the past but since ab32d36 this isn't covered anymore, seems like the pcre upgrade made this obsolete.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed this code (and redundant test I added) as per my previous comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly merge those commits already and rebase?

@@ -767,6 +767,9 @@ PHP 8.4 UPGRADE NOTES
- MySQLnd:
. Improved the performance of MySQLnd quoting.

- PCRE:
. Improved the performance of named capture groups.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the 1st commit - 569c291 - wrong GH issue is mentioned

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multitasking gone wrong, I'll fix it on merge when I squash this.

@mvorisek
Copy link
Contributor

mvorisek commented Jun 2, 2024

Looks very good 😍 and thank you @nielsdos very much for this PR!

@nielsdos nielsdos marked this pull request as ready for review June 2, 2024 10:41
@@ -163,6 +167,9 @@ static void php_free_pcre_cache(zval *data) /* {{{ */
{
pcre_cache_entry *pce = (pcre_cache_entry *) Z_PTR_P(data);
if (!pce) return;
if (pce->subpats_table) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it s better if free_subpats_table carries the burden to check it wdyt ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally don't like free functions that accept NULL, so I'd prefer it as-is.

@@ -49,9 +49,11 @@ char *php_pcre_version;

struct _pcre_cache_entry {
pcre2_code *re;
/* Pointer is not NULL when there are named captures.
* NULL-terminated, length is equal to capture_count + 1. */
Copy link
Member

@arnaud-lb arnaud-lb Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not NULL-terminated (I believe that length is capture_count+1 to account for group 0)

Copy link
Member

@iluuu1994 iluuu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from @arnaud-lb 's comment, LGTM!

php_error_docref(NULL, E_WARNING, "Numeric named subpatterns are not allowed");
free_subpats_table(subpat_names, num_subpats);
return NULL;
/* Note: if we're making persistent strings, they will only be used within this thread.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we may clarify that they are persistent so they survive until the next request, rather than for being shared across threads.

while (ni++ < name_cnt) {
unsigned short name_idx = 0x100 * (unsigned char)name_table[0] + (unsigned char)name_table[1];
const char *name = name_table + 2;
subpat_names[name_idx] = zend_string_init(name, strlen(name), 0);
if (is_numeric_string(ZSTR_VAL(subpat_names[name_idx]), ZSTR_LEN(subpat_names[name_idx]), NULL, NULL, 0) > 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just making sure: Was this code dead? PCRE seems to requrie:

Compilation failed: subpattern name must start with a non-digit at offset 8

for digits, or:

Compilation failed: subpattern name expected at offset 8

For symbols like ./+/- etc.

Copy link
Member Author

@nielsdos nielsdos Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is dead because it is checked during compilation of the regular expression and this check in php-src executes after compilation. The compilation failure causes this code not to execute in the first place.
EDIT: I'll commit it anyway as per Girgias's request, so even if this would be wrong after all we can easily revert it.

nielsdos added a commit that referenced this pull request Jun 3, 2024
This isn't reachable since ab32d36, because since then the library
itself checks this condition during compilation. The compilation failure
that results of it makes this code not reachable.

This is split off of GH-14424.
Recreating this over and over is pointless, cache this as well.
Fixes phpGH-14423.
@nielsdos nielsdos force-pushed the pcre-cachesubpats branch from 076464d to ae5444e Compare June 3, 2024 18:05
@nielsdos nielsdos merged commit 2597441 into php:master Jun 3, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cache PCRE named group names
6 participants