Changes between Version 1 and Version 2 of Ticket #8313, comment 7
- Timestamp:
- Jan 18, 2023, 4:51:33 PM (3 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
Ticket #8313, comment 7
v1 v2 1 In Python 3.8, locale.getpreferredencoding() calls _locale.nl_langinfo(_locale.CODESET) and this is implemented in C and makes the system call nl_langinfo(CODESET). That system call indeed returns ANSI_X4.3-1968 (official name for ASCII) after running a prediction with minimization. Predictions without minimization have it return the correct 'UTF-8'. Hours of testing and study of systemlocale documentation did not reveal how this could be. The setlocale(LC_ALL, "") system call should copy the locale from the environment variables. Those give LANG=en_US.UTF-8. But still the nl_langinfo() Python call gives ANSI. I tried a separate C program compiled and run in the Google Colab terminal for the broken Colab session and it returned "UTF-8".1 In Python 3.8, locale.getpreferredencoding() calls _locale.nl_langinfo(_locale.CODESET) and this is implemented in C and makes the C library call nl_langinfo(CODESET). That system call indeed returns ANSI_X4.3-1968 (official name for ASCII) after running a prediction with minimization. Predictions without minimization have it return the correct 'UTF-8'. Hours of testing and study of C library locale documentation did not reveal how this could be. The setlocale(LC_ALL, "") system call should copy the locale from the environment variables. Those give LANG=en_US.UTF-8. But still the nl_langinfo() Python call gives ANSI. I tried a separate C program compiled and run in the Google Colab terminal for the broken Colab session and it returned "UTF-8". 2 2 3 3 {{{