Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#8313 closed defect (fixed)

AlphaFold: A UTF-8 locale is required. Got ANSI_X3.4-1968

Reported by: minioreo@… Owned by: Tom Goddard
Priority: normal Milestone:
Component: Structure Prediction Version:
Keywords: Cc:
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

Dear Developer,


I recently met a bug that many people mentioned in GitHub also, but no one's solution could really fix it. The bug is when one needs to predict a structure, selecting all the three options (1. Use PDB templates…… 2. Energy-minimize predicted structures 3. Trim-fetched structure……), the colab window will report an error after finishing prediction. The error  is: NotImplementedError: A UTF-8 locale is required. Got ANSI_X3.4-1968, which results in a failure when automatically downloading the results files.


Thank you for taking time to read my email. If you have any solutions please let me and also other people who are being plagued by this bug know.




Change History (9)

comment:1 by Eric Pettersen, 3 years ago

Component: UnassignedStructure Prediction
Owner: set to Tom Goddard
Platform: all
Project: ChimeraX
Status: newassigned
Summary: Report a Bug of ChimeraXAlphaFold: NotImplementedError: A UTF-8 locale is required

comment:2 by Tom Goddard, 3 years ago

I am investigating this AlphaFold error now. It happens when you enable energy minimization, so the easy way to avoid it right now is don't enable energy minimization. The AlphaFold run completes successfully but making a zip file from the results fails due to the text encoding error.

The problem seems to be a bug in Google Colab where all shell commands are broken if the locale which controls the text encoding (utf-8 vs ANSI) is changed from utf-8. And it looks like OpenMM which does the energy minimization somehow changes the locale. I suspect this bug was introduced into Google Colab when the updated from Python 3.7 to Python 3.8 last month.

Long ago I saw this same error, but it would only happen after OpenMM failed.

https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/6961

But now it happens even when OpenMM energy minimization is successful. This might also be caused by an update to the version of OpenMM used by AlphaFold. At any rate, the real problem is Google Colab can't execute any shell commands if a different text encoding from utf-8 is used. That is a bug, but Google isn't likely to fix it any time soon. So I will look for some way to work around the bug.

comment:3 by Tom Goddard, 3 years ago

Here is a copy of the Google Colab error in the AlphaFold Run window when energy minimize is enabled. The ChimeraX Python code running on Google Colab is trying to package of the predicted models using shell commands to copy the files and it fails in /usr/local/lib/python3.8/dist-packages/google/colab/_system_commands.py because Google Colab is checking if the locale encoding is UTF-8 and it isn't, apparently because OpenMM changed it.

2023-01-17 20:06:21,275 reranking models by plddt
2023-01-17 20:06:22,725 Done
Downloading structure predictions to directory Downloads/ChimeraX/AlphaFold
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-1-650ce46df63c> in <module>
    411 remove_from_list(seq_list, 'prokaryote')  # Obsolete "prokaryote" flag
    412 
--> 413 run_prediction(seq_list, use_templates = use_templates, energy_minimize = not dont_minimize)

4 frames
<ipython-input-1-650ce46df63c> in run_prediction(sequences, job_name, msa_mode, pair_mode, use_templates, custom_template_path, energy_minimize, model_type, num_recycles, dpi, install_log)
    100 
    101     print('Downloading structure predictions to directory Downloads/ChimeraX/AlphaFold')
--> 102     download_results(energy_minimize)
    103 
    104 # ================================================================================================

<ipython-input-1-650ce46df63c> in download_results(energy_minimize)
    271 def download_results(energy_minimize):
    272   relax = 'relaxed' if energy_minimize else 'unrelaxed'
--> 273   get_ipython().system('cp -p *_{relax}_rank_1_model_*.pdb best_model.pdb')
    274   get_ipython().system('cp -p *_unrelaxed_rank_1_model_*_scores.json best_model_pae.json')
    275 

/usr/local/lib/python3.8/dist-packages/google/colab/_shell.py in system(self, *args, **kwargs)
     93       kwargs.update({'also_return_output': True})
     94 
---> 95     output = _system_commands._system_compat(self, *args, **kwargs)  # pylint:disable=protected-access
     96 
     97     if pip_warn:

/usr/local/lib/python3.8/dist-packages/google/colab/_system_commands.py in _system_compat(shell, cmd, also_return_output)
    434   # is expected to call this function, thus adding one level of nesting to the
    435   # stack.
--> 436   result = _run_command(
    437       shell.var_expand(cmd, depth=2), clear_streamed_output=False)
    438   shell.user_ns['_exit_code'] = result.returncode

/usr/local/lib/python3.8/dist-packages/google/colab/_system_commands.py in _run_command(cmd, clear_streamed_output)
    161   locale_encoding = locale.getpreferredencoding()
    162   if locale_encoding != _ENCODING:
--> 163     raise NotImplementedError(
    164         'A UTF-8 locale is required. Got {}'.format(locale_encoding))
    165 

NotImplementedError: A UTF-8 locale is required. Got ANSI_X3.4-1968

comment:4 by Tom Goddard, 3 years ago

This problem is discussed in the following AlphaFold Github issue

https://github.com/deepmind/alphafold/issues/483

The Dec 30, 2022 comment on that issue works around the problem by avoiding Google Colab shell magic "!" and instead do the shell operations (zip, cp) with Python library calls. I will try that work-around.

comment:5 by Tom Goddard, 3 years ago

I tried converting the download shell magic to Python. But the ChimeraX Google Colab AlphaFold script uses shell magic in many places so that fix requires changing a lot of code.

So instead I tried a different fix where I redefined locale.getpreferredencoding() to return 'UTF-8' if it is not returning that value. This is an ugly fix but I was not able to track down how OpenMM changes the preferred encoding. Python executes getpreferredendcoding() as a C library call. The only environment variable that is set that that effects this seems to be LANG = en_US.UTF-8. There is no LC_ALL set. While this solution works for fixing Google Colab shell magic, the Python open() function and pathlib.Path.open() used by ColabFold still get an encoding = 'ANSI_X3.4-1968' and a second run of AlphaFold in the same Google Colab session fails attempting to write citations to a file:

Please cite ColabFold: Making protein folding accessible to all. Nature Methods (2022) if you use these predictions.
2023-01-18 03:34:54,527 Starting prediction on 2023-01-18 UTC time
2023-01-18 03:34:54,527 Installing ColabFold on Google Colab virtual machine.
Using Tesla T4 graphics processor
---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-2-c96829b279cc> in <module>
    425 remove_from_list(seq_list, 'prokaryote')  # Obsolete "prokaryote" flag
    426 
--> 427 run_prediction(seq_list, use_templates = use_templates, energy_minimize = not dont_minimize)

2 frames
<ipython-input-2-c96829b279cc> in run_prediction(sequences, job_name, msa_mode, pair_mode, use_templates, custom_template_path, energy_minimize, model_type, num_recycles, dpi, install_log)
     77 
     78     from colabfold.batch import run
---> 79     run(
     80       queries=queries,
     81       result_dir='.',

/usr/local/lib/python3.8/dist-packages/colabfold/batch.py in run(queries, result_dir, num_models, num_recycles, model_order, is_complex, num_ensemble, model_type, msa_mode, use_templates, custom_template_path, use_amber, keep_existing_results, rank_by, pair_mode, data_dir, host_url, random_seed, stop_at_score, recompile_padding, recompile_all_models, zip_results, prediction_callback, save_single_representations, save_pair_representations, training, use_gpu_relax, stop_at_score_below, dpi, max_msa)
   1270     )
   1271 
-> 1272     bibtex_file = write_bibtex(
   1273         model_type, use_msa, use_env, use_templates, use_amber, result_dir
   1274     )

/usr/local/lib/python3.8/dist-packages/colabfold/citations.py in write_bibtex(model, use_msa, use_env, use_templates, use_amber, result_dir, bibtex_file)
    129     with bibtex_file.open("w") as writer:
    130         for i in to_cite:
--> 131             writer.write(citations[i])
    132             writer.write("\n")
    133 

UnicodeEncodeError: 'ascii' codec can't encode character '\xfc' in position 53: ordinal not in range(128)

So it looks like I actually have to figure out how the default encoding is being changed and prevent that.

Last edited 3 years ago by Tom Goddard (previous) (diff)

comment:6 by Tom Goddard, 3 years ago

Summary: AlphaFold: NotImplementedError: A UTF-8 locale is requiredAlphaFold: A UTF-8 locale is required. Got ANSI_X3.4-1968

comment:7 by Tom Goddard, 3 years ago

In Python 3.8, locale.getpreferredencoding() calls _locale.nl_langinfo(_locale.CODESET) and this is implemented in C and makes the C library call nl_langinfo(CODESET). That C library call indeed returns ANSI_X4.3-1968 (official name for ASCII) after running a prediction with minimization. Predictions without minimization have it return the correct 'UTF-8'. Hours of testing and study of C library locale documentation did not reveal how this could be. The setlocale(LC_ALL, "") C library call should copy the locale from the environment variables. Those give LANG=en_US.UTF-8. But still the nl_langinfo() Python call gives ANSI. I also tried _locale.setlocale(_locale.LC_CTYPE, "en_US.UTF-8") which worked without error, but the encoding still is reported as ANSI. I tried a separate C program compiled and run in the Google Colab terminal for the broken Colab session and it returned "UTF-8" as shown here.

#include <langinfo.h>
#include <locale.h>
#include <stdio.h>
#include <stdlib.h>

int
main(int argc, char *argv[])
{
    setlocale(LC_CTYPE, "");

    printf("%d\n", CODESET);
    printf("%s\n", nl_langinfo(CODESET));

    exit(EXIT_SUCCESS);
}

gcc -o nli nli.c
./nli
14
UTF-8

The Python setlocale() call indicates UTF-8 just like the unbroken Google Colab runs

import _locale
_locale.setlocale(_locale.LC_ALL)
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=C;LC_COLLATE=C;LC_MONETARY=C;LC_MESSAGES=C;LC_PAPER=C;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C;LC_IDENTIFICATION=C

Using the locale command from the Google Colab terminal indicated nothing wrong

/content# locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

The C library locale definition files in /usr/share/i18n/locales have not been modified (all dated April 7 2022).

There was no significant change from the environment variables when Google Colab runs correctly without minimization, compared to with minimization.

os.environ without minimazation, preferred encoding is UTF-8:

environ{'SHELL': '/bin/bash',
        'NV_LIBCUBLAS_VERSION': '11.4.1.1043-1',
        'NVIDIA_VISIBLE_DEVICES': 'all',
        '__EGL_VENDOR_LIBRARY_DIRS': '/usr/lib64-nvidia:/usr/share/glvnd/egl_vendor.d/',
        'NV_NVML_DEV_VERSION': '11.2.152-1',
        'NV_CUDNN_PACKAGE_NAME': 'libcudnn8',
        'GLIBCXX_FORCE_NEW': '1',
        'CGROUP_MEMORY_EVENTS': '/sys/fs/cgroup/memory.events /var/colab/cgroup/jupyter-children/memory.events',
        'NV_LIBNCCL_DEV_PACKAGE': 'libnccl-dev=2.8.4-1+cuda11.2',
        'NV_LIBNCCL_DEV_PACKAGE_VERSION': '2.8.4-1',
        'VM_GCE_METADATA_HOST': '169.254.169.253',
        'HOSTNAME': '3eda09ac2179',
        'TBE_RUNTIME_ADDR': '172.28.0.1:8011',
        'GCE_METADATA_TIMEOUT': '3',
        'NVIDIA_REQUIRE_CUDA': 'cuda>=11.2 brand=tesla,driver>=418,driver<419 brand=tesla,driver>=450,driver<451',
        'NV_LIBCUBLAS_DEV_PACKAGE': 'libcublas-dev-11-2=11.4.1.1043-1',
        'NV_NVTX_VERSION': '11.2.152-1',
        'NV_CUDA_CUDART_DEV_VERSION': '11.2.152-1',
        'NV_LIBCUSPARSE_VERSION': '11.4.1.1152-1',
        'NV_LIBNPP_VERSION': '11.3.2.152-1',
        'NCCL_VERSION': '2.8.4-1',
        'KMP_LISTEN_PORT': '6000',
        'TF_FORCE_GPU_ALLOW_GROWTH': 'true',
        'ENV': '/root/.bashrc',
        'PWD': '/',
        'TBE_EPHEM_CREDS_ADDR': '172.28.0.1:8009',
        'TBE_CREDS_ADDR': '172.28.0.1:8008',
        'NV_CUDNN_PACKAGE': 'libcudnn8=8.1.1.33-1+cuda11.2',
        'NVIDIA_DRIVER_CAPABILITIES': 'compute,utility',
        'LAST_FORCED_REBUILD': '20221207',
        'NV_NVPROF_DEV_PACKAGE': 'cuda-nvprof-11-2=11.2.152-1',
        'NV_LIBNPP_PACKAGE': 'libnpp-11-2=11.3.2.152-1',
        'NV_LIBNCCL_DEV_PACKAGE_NAME': 'libnccl-dev',
        'TCLLIBPATH': '/usr/share/tcltk/tcllib1.19',
        'GLIBCPP_FORCE_NEW': '1',
        'NV_LIBCUBLAS_DEV_VERSION': '11.4.1.1043-1',
        'NV_LIBCUBLAS_DEV_PACKAGE_NAME': 'libcublas-dev-11-2',
        'LD_PRELOAD': '/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4',
        'NV_CUDA_CUDART_VERSION': '11.2.152-1',
        'HOME': '/root',
        'LANG': 'en_US.UTF-8',
        'CUDA_VERSION': '11.2.2',
        'CLOUDSDK_CONFIG': '/content/.config',
        'NV_LIBCUBLAS_PACKAGE': 'libcublas-11-2=11.4.1.1043-1',
        'COLAB_RELEASE_TAG': 'release-colab-20230117-060106-RC00',
        'KMP_TARGET_PORT': '9000',
        'KMP_EXTRA_ARGS': '--listen_host=172.28.0.12 --target_host=172.28.0.12 --tunnel_background_save_url=https://colab.research.google.com/tun/m/cc48301118ce562b961b3c22d803539adc1e0c19/gpu-t4-s-2cxtzs57e8ahs --tunnel_background_save_delay=10s --tunnel_periodic_background_save_frequency=30m0s --enable_output_coalescing=true --output_coalescing_required=true',
        'NV_LIBNPP_DEV_PACKAGE': 'libnpp-dev-11-2=11.3.2.152-1',
        'NV_LIBCUBLAS_PACKAGE_NAME': 'libcublas-11-2',
        'CLOUDSDK_PYTHON': 'python3',
        'NV_LIBNPP_DEV_VERSION': '11.3.2.152-1',
        'NO_GCE_CHECK': 'False',
        'PYTHONPATH': '/env/python',
        'NV_LIBCUSPARSE_DEV_VERSION': '11.4.1.1152-1',
        'LIBRARY_PATH': '/usr/local/cuda/lib64/stubs',
        'NV_CUDNN_VERSION': '8.1.1.33',
        'SHLVL': '0',
        'NV_CUDA_LIB_VERSION': '11.2.2-1',
        'NVARCH': 'x86_64',
        'DATALAB_SETTINGS_OVERRIDES': '{"kernelManagerProxyPort":6000,"kernelManagerProxyHost":"172.28.0.12","jupyterArgs":["--ip=172.28.0.12","--transport=ipc"],"debugAdapterMultiplexerPath":"/usr/local/bin/dap_multiplexer","enableLsp":true}',
        'NV_CUDNN_PACKAGE_DEV': 'libcudnn8-dev=8.1.1.33-1+cuda11.2',
        'NV_CUDA_COMPAT_PACKAGE': 'cuda-compat-11-2',
        'NV_LIBNCCL_PACKAGE': 'libnccl2=2.8.4-1+cuda11.2',
        'LD_LIBRARY_PATH': '/usr/lib64-nvidia',
        'GCS_READ_CACHE_BLOCK_SIZE_MB': '16',
        'NV_NVPROF_VERSION': '11.2.152-1',
        'PATH': '/opt/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/tools/node/bin:/tools/google-cloud-sdk/bin',
        'NV_LIBNCCL_PACKAGE_NAME': 'libnccl2',
        'NV_LIBNCCL_PACKAGE_VERSION': '2.8.4-1',
        'PYTHONWARNINGS': 'ignore:::pip._internal.cli.base_command',
        'DEBIAN_FRONTEND': 'noninteractive',
        'COLAB_BACKEND_VERSION': 'next',
        'OLDPWD': '/',
        'JPY_PARENT_PID': '86',
        'TERM': 'xterm-color',
        'CLICOLOR': '1',
        'PAGER': 'cat',
        'GIT_PAGER': 'cat',
        'MPLBACKEND': 'module://ipykernel.pylab.backend_inline',
        'ENABLE_DIRECTORYPREFETCHER': '1',
        'USE_AUTH_EPHEM': '1',
        'PYDEVD_USE_FRAME_EVAL': 'NO',
        'TF_FORCE_UNIFIED_MEMORY': '1',
        'XLA_PYTHON_CLIENT_MEM_FRACTION': '2.0',
        'TF_CPP_MIN_LOG_LEVEL': '1',
        'TF2_BEHAVIOR': '1'}
os.environ after run with minimization, preferred encoding ANSI_X3.4-1968:

environ{'SHELL': '/bin/bash',
        'NV_LIBCUBLAS_VERSION': '11.4.1.1043-1',
        'NVIDIA_VISIBLE_DEVICES': 'all',
        '__EGL_VENDOR_LIBRARY_DIRS': '/usr/lib64-nvidia:/usr/share/glvnd/egl_vendor.d/',
        'NV_NVML_DEV_VERSION': '11.2.152-1',
        'NV_CUDNN_PACKAGE_NAME': 'libcudnn8',
        'GLIBCXX_FORCE_NEW': '1',
        'CGROUP_MEMORY_EVENTS': '/sys/fs/cgroup/memory.events /var/colab/cgroup/jupyter-children/memory.events',
        'NV_LIBNCCL_DEV_PACKAGE': 'libnccl-dev=2.8.4-1+cuda11.2',
        'NV_LIBNCCL_DEV_PACKAGE_VERSION': '2.8.4-1',
        'VM_GCE_METADATA_HOST': '169.254.169.253',
        'HOSTNAME': 'e42b2f982929',
        'TBE_RUNTIME_ADDR': '172.28.0.1:8011',
        'GCE_METADATA_TIMEOUT': '3',
        'NVIDIA_REQUIRE_CUDA': 'cuda>=11.2 brand=tesla,driver>=418,driver<419 brand=tesla,driver>=450,driver<451',
        'NV_LIBCUBLAS_DEV_PACKAGE': 'libcublas-dev-11-2=11.4.1.1043-1',
        'NV_NVTX_VERSION': '11.2.152-1',
        'NV_CUDA_CUDART_DEV_VERSION': '11.2.152-1',
        'NV_LIBCUSPARSE_VERSION': '11.4.1.1152-1',
        'NV_LIBNPP_VERSION': '11.3.2.152-1',
        'NCCL_VERSION': '2.8.4-1',
        'KMP_LISTEN_PORT': '6000',
        'TF_FORCE_GPU_ALLOW_GROWTH': 'true',
        'ENV': '/root/.bashrc',
        'PWD': '/',
        'TBE_EPHEM_CREDS_ADDR': '172.28.0.1:8009',
        'TBE_CREDS_ADDR': '172.28.0.1:8008',
        'NV_CUDNN_PACKAGE': 'libcudnn8=8.1.1.33-1+cuda11.2',
        'NVIDIA_DRIVER_CAPABILITIES': 'compute,utility',
        'LAST_FORCED_REBUILD': '20221207',
        'NV_NVPROF_DEV_PACKAGE': 'cuda-nvprof-11-2=11.2.152-1',
        'NV_LIBNPP_PACKAGE': 'libnpp-11-2=11.3.2.152-1',
        'NV_LIBNCCL_DEV_PACKAGE_NAME': 'libnccl-dev',
        'TCLLIBPATH': '/usr/share/tcltk/tcllib1.19',
        'GLIBCPP_FORCE_NEW': '1',
        'NV_LIBCUBLAS_DEV_VERSION': '11.4.1.1043-1',
        'NV_LIBCUBLAS_DEV_PACKAGE_NAME': 'libcublas-dev-11-2',
        'LD_PRELOAD': '/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4',
        'NV_CUDA_CUDART_VERSION': '11.2.152-1',
        'HOME': '/root',
        'LANG': 'en_US.UTF-8',
        'CUDA_VERSION': '11.2.2',
        'CLOUDSDK_CONFIG': '/content/.config',
        'NV_LIBCUBLAS_PACKAGE': 'libcublas-11-2=11.4.1.1043-1',
        'COLAB_RELEASE_TAG': 'release-colab-20230117-060106-RC00',
        'KMP_TARGET_PORT': '9000',
        'KMP_EXTRA_ARGS': '--listen_host=172.28.0.12 --target_host=172.28.0.12 --tunnel_background_save_url=https://colab.research.google.com/tun/m/cc48301118ce562b961b3c22d803539adc1e0c19/gpu-t4-s-1azqgql3u2gw3 --tunnel_background_save_delay=10s --tunnel_periodic_background_save_frequency=30m0s --enable_output_coalescing=true --output_coalescing_required=true',
        'NV_LIBNPP_DEV_PACKAGE': 'libnpp-dev-11-2=11.3.2.152-1',
        'NV_LIBCUBLAS_PACKAGE_NAME': 'libcublas-11-2',
        'CLOUDSDK_PYTHON': 'python3',
        'NV_LIBNPP_DEV_VERSION': '11.3.2.152-1',
        'NO_GCE_CHECK': 'False',
        'PYTHONPATH': '/env/python',
        'NV_LIBCUSPARSE_DEV_VERSION': '11.4.1.1152-1',
        'LIBRARY_PATH': '/usr/local/cuda/lib64/stubs',
        'NV_CUDNN_VERSION': '8.1.1.33',
        'SHLVL': '0',
        'NV_CUDA_LIB_VERSION': '11.2.2-1',
        'NVARCH': 'x86_64',
        'DATALAB_SETTINGS_OVERRIDES': '{"kernelManagerProxyPort":6000,"kernelManagerProxyHost":"172.28.0.12","jupyterArgs":["--ip=172.28.0.12","--transport=ipc"],"debugAdapterMultiplexerPath":"/usr/local/bin/dap_multiplexer","enableLsp":true}',
        'NV_CUDNN_PACKAGE_DEV': 'libcudnn8-dev=8.1.1.33-1+cuda11.2',
        'NV_CUDA_COMPAT_PACKAGE': 'cuda-compat-11-2',
        'NV_LIBNCCL_PACKAGE': 'libnccl2=2.8.4-1+cuda11.2',
        'LD_LIBRARY_PATH': '/usr/lib64-nvidia',
        'GCS_READ_CACHE_BLOCK_SIZE_MB': '16',
        'NV_NVPROF_VERSION': '11.2.152-1',
        'PATH': '/opt/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/tools/node/bin:/tools/google-cloud-sdk/bin',
        'NV_LIBNCCL_PACKAGE_NAME': 'libnccl2',
        'NV_LIBNCCL_PACKAGE_VERSION': '2.8.4-1',
        'PYTHONWARNINGS': 'ignore:::pip._internal.cli.base_command',
        'DEBIAN_FRONTEND': 'noninteractive',
        'COLAB_BACKEND_VERSION': 'next',
        'OLDPWD': '/',
        'JPY_PARENT_PID': '88',
        'TERM': 'xterm-color',
        'CLICOLOR': '1',
        'PAGER': 'cat',
        'GIT_PAGER': 'cat',
        'MPLBACKEND': 'module://ipykernel.pylab.backend_inline',
        'ENABLE_DIRECTORYPREFETCHER': '1',
        'USE_AUTH_EPHEM': '1',
        'PYDEVD_USE_FRAME_EVAL': 'NO',
        'TF_FORCE_UNIFIED_MEMORY': '1',
        'XLA_PYTHON_CLIENT_MEM_FRACTION': '2.0',
        'TF_CPP_MIN_LOG_LEVEL': '1',
        'TF2_BEHAVIOR': '1'}

I also tested whether just importing OpenMM without running AlphaFold prediction causes the encoding to change. I had it install AlphaFold, Conda, and OpenMM as it normally does when energy minimization requested and then return. Then I imported openmm (from simtk import openmm). The getpreferredencoding() still gave the correct "UTF-8". So it takes additional steps to get the ANSI encoding.

Last edited 3 years ago by Tom Goddard (previous) (diff)

comment:8 by Tom Goddard, 3 years ago

As indicated in the preceding comment Python is returning _locale.nl_langinfo(CODESET) = 'ANSI_X4.3-1968' instead of the expected 'UTF-8' for no reason I could figure out. Based on the documented behavior of C library calls nl_langinfo() and setlocale() that are being used by Python 3.8 this behavior seems wrong. No doubt I am missing something.

A new hack to work around this problem is to replace _locale.nl_langinfo(CODESET) in Python so that it always returns UTF-8. I've tested that and it seems to work on multiple AlphaFold prediction runs in the same Google Colab session with minimization. The default pathlib and open() files created for writing properly get UTF-8 encoding. My previous work around solution of monkey patching locale.getpreferredencoding() did not fix the encoding of those opened files so only the first AlphaFold prediction was working with that solution.

Last edited 3 years ago by Tom Goddard (previous) (diff)

comment:9 by Tom Goddard, 3 years ago

Resolution: fixed
Status: assignedclosed

Fixed.

I monkeypatched Python _locale.nl_langinfo(CODESET) to force it to return UTF-8 to work around this problem could find no better solution. This fixes it even for multiple AlphaFold runs with energy minimization in the same Google Colab session. There are AlphaFold and ColabFold Github issues for the same problem that have not reported any solution.

https://github.com/deepmind/alphafold/issues/483

https://github.com/sokrypton/ColabFold/issues/237

Note: See TracTickets for help on using tickets.