Skip to content

Remove long-deprecated mbstring INI settings #12445

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

alexdowad
Copy link
Contributor

@Girgias very kindly opened a PR for the same issue a few years ago: #5334. Apparently, the INI settings mbstring.http_input, mbstring.http_output, and mbstring.internal_encoding were deprecated in PHP 5.6 but have never been completely removed yet. It seems that it is high time to remove them.

I just reviewed the PR from @Girgias, and discovered that it makes some changes which are not needed and which will definitely cause BC breaks. This is the reason why the changes in #5334 are not OK:

The internal mbstring functionality which could previously be accessed by (for example) setting mbstring.http_input was not accessible solely in that way. Rather than using mbstring.http_input, you could also set the INI setting input_encoding, and it would do the same thing. Likewise for mbstring.http_output; the same functionality could also be activated using the function mb_http_output. Or for mbstring.internal_encoding, you could call the function mb_internal_encoding.

Since none of these features (such as the function mb_internal_encoding) are deprecated, we cannot remove any of the internal mbstring code which helps to implement them. All we can remove is the code which directly handles reads/writes to the obsolete INI settings.

Some thorough review would really be appreciated here. Also, I would like to ask what section of UPGRADING it would be best to note this removal in. Or do we normally put a note in UPGRADING when deprecated features are removed?

@nikic @Girgias @cmb69 @bukka @kamil-tekiela @youkidearitai

@alexdowad
Copy link
Contributor Author

Sorry, just to give credit where credit is due... I am looking through the code from @Girgias again and it doesn't have as many problems as I first thought. 👍🏻

I do think this PR is a little bit better though.

@kamil-tekiela
Copy link
Member

kamil-tekiela commented Oct 15, 2023 via email

Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should also formally deprecate mb_http_input(), mb_http_output(), mb_internal_encoding (), and other functions which tweak the INI settings.

IIRC the reason we didn't remove them in 8.0, is that until 7.4 MBString and Iconv didn't actually use the global INI settings.

@@ -4,7 +4,6 @@ Bug #48697 (mb_internal_encoding() value gets reset by parse_str() or mb_parse_s
mbstring
--FILE--
<?php
ini_set('mbstring.internal_encoding', 'ISO-8859-15');
ini_set('mbstring.encoding_translation', true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated: This could use the INI PHPT section to set the INI config

@@ -3,7 +3,6 @@ htmlentities() should not be influenced by mb_internal_encoding()
--INI--
default_charset=
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test can also be deleted.

Comment on lines 109 to -112
bool internal_encoding_set;
bool http_output_set;
bool http_input_set;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are the two other fields not removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because while http_input_set was only written from the handler for writes to mbstring.http_input, http_output_set (for example) is also written by the function mb_http_output. Same for internal_encoding_set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK, we probably should deprecate the functions that affect those fields.

@Girgias
Copy link
Member

Girgias commented Oct 15, 2023

Also, what's the point of mb_language() and the corresponding INI setting?

@alexdowad
Copy link
Contributor Author

Also, what's the point of mb_language() and the corresponding INI setting?

I'd have to check the code to refresh my memory on everything which is affected by mb_language(), but just from memory, one of the things which is affected is the default behavior of mb_detect_encoding if you don't specify what the candidate encodings which you want to "detect" are.

@Girgias
Copy link
Member

Girgias commented Oct 15, 2023

Also, what's the point of mb_language() and the corresponding INI setting?

I'd have to check the code to refresh my memory on everything which is affected by mb_language(), but just from memory, one of the things which is affected is the default behavior of mb_detect_encoding if you don't specify what the candidate encodings which you want to "detect" are.

Right, from a quick glance at the docs it also seems to affect mb_send_mail() ? But that seems strange?
I think getting rid of this INI setting is probably a good thing.

@derickr
Copy link
Member

derickr commented Oct 15, 2023 via email

@youkidearitai
Copy link
Contributor

It should never have been a "feature".

I agree.

By the way, mb_internal_encoding() is affected mbstring.internal_encoding. My understanding, this PR not modify it is. mb_internal_encoding() is use it often. Should I change mbstring.internal_encoding in mb_internal_encoding to default_charset?

@Girgias
Copy link
Member

Girgias commented Oct 16, 2023

It should never have been a "feature".

I agree.

By the way, mb_internal_encoding() is affected mbstring.internal_encoding. My understanding, this PR not modify it is. mb_internal_encoding() is use it often. Should I change mbstring.internal_encoding in mb_internal_encoding to default_charset?

Yes, either the default_charset or internal_encoding INI settings should be used instead.
Similarly for the input/output encodings one can use input_encoding or output_encoding

@youkidearitai
Copy link
Contributor

@Girgias Thank you for your addition.

I was worried about mb_internal_encoding why is because I often used this function to set the desired character encoding, At least old (legacy) PHP code.
This is confused that searched google "mb_internal_encoding" in Japanese.

Anyway, using input_encoding and output_encoding is correct.

@alexdowad
Copy link
Contributor Author

Dear @Girgias, @derickr, @youkidearitai, @kamil-tekiela... all of your comments are very helpful.

Do you think we want to keep mb_output_handler or not? (https://www.php.net/manual/en/function.mb-output-handler.php)

@kamil-tekiela
Copy link
Member

kamil-tekiela commented Oct 17, 2023 via email

@Girgias
Copy link
Member

Girgias commented Oct 18, 2023

mb_output_handler

Let me ask @dseguy to see if this is actually used.

@Girgias
Copy link
Member

Girgias commented Oct 23, 2023

Dear @Girgias, @derickr, @youkidearitai, @kamil-tekiela... all of your comments are very helpful.

Do you think we want to keep mb_output_handler or not? (https://www.php.net/manual/en/function.mb-output-handler.php)

Forgot to update the PR, but @dseguy looked into it and there are 0 occurrences of this in close to 3000 projects, ranging from composer packages, large open source projects and private code bases. So I think this is safe to put on the chopping board.

@cmb69
Copy link
Member

cmb69 commented Jul 11, 2024

Removing it in 8.4 is not recommended. If someone is still using it, it could prove to be a serious BC. Such removals should only be done in major releases.

I fully agree, and would like to put a PHP 9 milestone onto this PR. That might help a bit to not forget about this removal again. Or would it be better to track that in the Wiki (or elsewhere)?

@Girgias Girgias added this to the PHP 9 milestone Jul 12, 2024
@Girgias
Copy link
Member

Girgias commented Jul 12, 2024

I created the milestone and added the PR to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants