Skip to content

Do not inherit LC_CTYPE locale from environment #5488

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

nikic
Copy link
Member

@nikic nikic commented Apr 29, 2020

Apparently, the way setlocale inheritance currently works in PHP is as follows:

  • LC_ALL is defaulted to "C".
  • LC_CTYPE is inherited from environment.
  • Some code (like PCRE) does not inherit LC_CTYPE from environment, it will assume "C" by default instead.

This PR removes locale inheritance completely. On startup we will stay at the default of LC_ALL="C". Changing the locale requires an explicit setlocale() call.

@cmb69
Copy link
Member

cmb69 commented Apr 29, 2020

net-snmp's init_snmp() may call setlocale(LC_CTYPE, "");. We could counteract:

 ext/snmp/snmp.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/ext/snmp/snmp.c b/ext/snmp/snmp.c
index 8464e74aad..47a47f559b 100644
--- a/ext/snmp/snmp.c
+++ b/ext/snmp/snmp.c
@@ -49,6 +49,7 @@
 #ifdef HAVE_UNISTD_H
 #include <unistd.h>
 #endif
+#include <locale.h>
 
 #ifndef __P
 #ifdef __GNUC__
@@ -1971,6 +1972,7 @@ PHP_MINIT_FUNCTION(snmp)
 	le_snmp_session = zend_register_list_destructors_ex(php_snmp_session_destructor, NULL, PHP_SNMP_SESSION_RES_NAME, module_number);
 
 	init_snmp("snmpapp");
+	setlocale(LC_ALL, "C");
 
 #ifdef NETSNMP_DS_LIB_DONT_PERSIST_STATE
 	/* Prevent update of the snmpapp.conf file */

@cmb69
Copy link
Member

cmb69 commented Apr 29, 2020

Seems to work as desired. We would have to fix 2 Windows tests, or apply PR #5481.

@nikic
Copy link
Member Author

nikic commented Apr 29, 2020

For some reason the ext/standard/tests/strings/basename_invalid_path.phpt test fails on MacOS (even after retriggering the build, doesn't seem spurious). There's no diff shown though, just fails.

@nikic
Copy link
Member Author

nikic commented Apr 29, 2020

php_basename uses php_mblen so maybe basename() is locale-sensitive. I don't get why there is no diff though.

@kocsismate
Copy link
Member

kocsismate commented Apr 29, 2020

@nikic I ran this test locally on MacOS, and it shows the following diff:

001+ string(1) "�"
001- string(0) ""

@cmb69
Copy link
Member

cmb69 commented Apr 29, 2020

Maybe the diff can't be rendered due to invalid UTF-8?

Anyhow, from the docs:

basename() is locale aware, so for it to see the correct basename with multibyte character paths, the matching locale must be set using the setlocale() function.

Isn't that lovely. ;)

@nikic
Copy link
Member Author

nikic commented Apr 29, 2020

The basename() issue should be fixed by 90705d4. That was an implementation bug that got uncovered by this change.

Copy link
Member

@cmb69 cmb69 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this! :)

@KalleZ
Copy link
Member

KalleZ commented Apr 30, 2020

Big +1 for this, it is a good step forward

@php-pulls php-pulls closed this in c4ad8be Apr 30, 2020
TysonAndre added a commit to TysonAndre/php-src that referenced this pull request May 7, 2020
Related to php#5488
which makes php-src start out with the C locale for all locales, including
LC_CTYPE in php 8.

Example: An application sets `setlocale(LC_CTYPE, "C")` to work around
issues with strtolower not working on turkish, and unexpectedly gets slightly
worse performance.
@carusogabriel carusogabriel added this to the PHP 8.0 milestone May 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants