Changes between Version 3 and Version 4 of Ticket #8313, comment 7
- Timestamp:
- Jan 18, 2023, 4:53:16 PM (3 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
Ticket #8313, comment 7
v3 v4 1 In Python 3.8, locale.getpreferredencoding() calls _locale.nl_langinfo(_locale.CODESET) and this is implemented in C and makes the C library call nl_langinfo(CODESET). That C library call indeed returns ANSI_X4.3-1968 (official name for ASCII) after running a prediction with minimization. Predictions without minimization have it return the correct 'UTF-8'. Hours of testing and study of C library locale documentation did not reveal how this could be. The setlocale(LC_ALL, "") systemcall should copy the locale from the environment variables. Those give LANG=en_US.UTF-8. But still the nl_langinfo() Python call gives ANSI. I tried a separate C program compiled and run in the Google Colab terminal for the broken Colab session and it returned "UTF-8".1 In Python 3.8, locale.getpreferredencoding() calls _locale.nl_langinfo(_locale.CODESET) and this is implemented in C and makes the C library call nl_langinfo(CODESET). That C library call indeed returns ANSI_X4.3-1968 (official name for ASCII) after running a prediction with minimization. Predictions without minimization have it return the correct 'UTF-8'. Hours of testing and study of C library locale documentation did not reveal how this could be. The setlocale(LC_ALL, "") C library call should copy the locale from the environment variables. Those give LANG=en_US.UTF-8. But still the nl_langinfo() Python call gives ANSI. I tried a separate C program compiled and run in the Google Colab terminal for the broken Colab session and it returned "UTF-8". 2 2 3 3 {{{ … … 53 53 }}} 54 54 55 The systemlocale definition files in /usr/share/i18n/locales have not been modified (all dated April 7 2022).55 The C library locale definition files in /usr/share/i18n/locales have not been modified (all dated April 7 2022). 56 56 57 57 There was no significant change from the environment variables when Google Colab runs correctly without minimization, compared to with minimization.