Skip to content

Normalize mb_ereg() return value #6331

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

nikic
Copy link
Member

@nikic nikic commented Oct 13, 2020

mb_ereg()/mb_eregi() currently have an inconsistent return value based on whether the $matches parameter is passed or not:

Returns the byte length of the matched string if a match for pattern was found in string, or FALSE if no matches were found or an error occurred.

If the optional parameter regs was not passed or the length of the matched string is 0, this function returns 1.

Coupling this behavior to the $matches parameter doesn't make sense -- we know the match length either way, there is no technical reason to distinguish them. However, returning the match length is not particularly useful either, especially due to the need to convert 0-length into 1-length to satisfy "truthy" checks. We could always return 1, which would kind of match the behavior of preg_match() -- however, preg_match() actually returns the number of matches, which is 0 or 1 for preg_match(), while false signals an error. However, mb_ereg() returns false both for no match and for an error. This would result in an odd 1|false return value.

The proposal here is to canonicalize mb_ereg() to always return a boolean, where true indicates a match and false indicates no match or error. This also matches the behavior of the mb_ereg_match() and mb_ereg_search() functions.

@nikic
Copy link
Member Author

nikic commented Oct 13, 2020

If we don't do this or something similar we can just mark the argument as UNKNOWN, I just thought fixing the discrepancy would be better...

Not sure who to ask about this, maybe @kocsismate @cmb69 ?

@cmb69
Copy link
Member

cmb69 commented Oct 13, 2020

Well, the current behavior is terrible, so returning bool makes sense, but I'm slightly concerned about the potential BC break; some code may rely on the returned length. That code would likely have to be adjusted to check for strlen($regs[0]) instead. Might not be much of a problem in practice, though.

@nikic
Copy link
Member Author

nikic commented Oct 13, 2020

Right, there certainly is some potential for breakage. That said, I just checked all usages of mb_ereg/mb_eregi in the top 2k composer packages, and none of them would be affected by this change. That doesn't mean it affects nobody, but at least the impact should be fairly limited.

With that in mind, I'm going to land this now to make sure it's in RC2, and may back this out again if I'm estimating the impact incorrectly.

@php-pulls php-pulls closed this in 5582490 Oct 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants