#8313 closed defect (fixed)
AlphaFold: A UTF-8 locale is required. Got ANSI_X3.4-1968
Reported by: | Owned by: | Tom Goddard | |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Structure Prediction | Version: | |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Notify when closed: | Platform: | all | |
Project: | ChimeraX |
Description
Dear Developer, I recently met a bug that many people mentioned in GitHub also, but no one's solution could really fix it. The bug is when one needs to predict a structure, selecting all the three options (1. Use PDB templates…… 2. Energy-minimize predicted structures 3. Trim-fetched structure……), the colab window will report an error after finishing prediction. The error is: NotImplementedError: A UTF-8 locale is required. Got ANSI_X3.4-1968, which results in a failure when automatically downloading the results files. Thank you for taking time to read my email. If you have any solutions please let me and also other people who are being plagued by this bug know.
Change History (9)
comment:1 by , 3 years ago
Component: | Unassigned → Structure Prediction |
---|---|
Owner: | set to |
Platform: | → all |
Project: | → ChimeraX |
Status: | new → assigned |
Summary: | Report a Bug of ChimeraX → AlphaFold: NotImplementedError: A UTF-8 locale is required |
comment:2 by , 3 years ago
comment:3 by , 3 years ago
Here is a copy of the Google Colab error in the AlphaFold Run window when energy minimize is enabled. The ChimeraX Python code running on Google Colab is trying to package of the predicted models using shell commands to copy the files and it fails in /usr/local/lib/python3.8/dist-packages/google/colab/_system_commands.py because Google Colab is checking if the locale encoding is UTF-8 and it isn't, apparently because OpenMM changed it.
2023-01-17 20:06:21,275 reranking models by plddt 2023-01-17 20:06:22,725 Done Downloading structure predictions to directory Downloads/ChimeraX/AlphaFold --------------------------------------------------------------------------- NotImplementedError Traceback (most recent call last) <ipython-input-1-650ce46df63c> in <module> 411 remove_from_list(seq_list, 'prokaryote') # Obsolete "prokaryote" flag 412 --> 413 run_prediction(seq_list, use_templates = use_templates, energy_minimize = not dont_minimize) 4 frames <ipython-input-1-650ce46df63c> in run_prediction(sequences, job_name, msa_mode, pair_mode, use_templates, custom_template_path, energy_minimize, model_type, num_recycles, dpi, install_log) 100 101 print('Downloading structure predictions to directory Downloads/ChimeraX/AlphaFold') --> 102 download_results(energy_minimize) 103 104 # ================================================================================================ <ipython-input-1-650ce46df63c> in download_results(energy_minimize) 271 def download_results(energy_minimize): 272 relax = 'relaxed' if energy_minimize else 'unrelaxed' --> 273 get_ipython().system('cp -p *_{relax}_rank_1_model_*.pdb best_model.pdb') 274 get_ipython().system('cp -p *_unrelaxed_rank_1_model_*_scores.json best_model_pae.json') 275 /usr/local/lib/python3.8/dist-packages/google/colab/_shell.py in system(self, *args, **kwargs) 93 kwargs.update({'also_return_output': True}) 94 ---> 95 output = _system_commands._system_compat(self, *args, **kwargs) # pylint:disable=protected-access 96 97 if pip_warn: /usr/local/lib/python3.8/dist-packages/google/colab/_system_commands.py in _system_compat(shell, cmd, also_return_output) 434 # is expected to call this function, thus adding one level of nesting to the 435 # stack. --> 436 result = _run_command( 437 shell.var_expand(cmd, depth=2), clear_streamed_output=False) 438 shell.user_ns['_exit_code'] = result.returncode /usr/local/lib/python3.8/dist-packages/google/colab/_system_commands.py in _run_command(cmd, clear_streamed_output) 161 locale_encoding = locale.getpreferredencoding() 162 if locale_encoding != _ENCODING: --> 163 raise NotImplementedError( 164 'A UTF-8 locale is required. Got {}'.format(locale_encoding)) 165 NotImplementedError: A UTF-8 locale is required. Got ANSI_X3.4-1968
comment:4 by , 3 years ago
This problem is discussed in the following AlphaFold Github issue
https://github.com/deepmind/alphafold/issues/483
The Dec 30, 2022 comment on that issue works around the problem by avoiding Google Colab shell magic "!" and instead do the shell operations (zip, cp) with Python library calls. I will try that work-around.
comment:5 by , 3 years ago
I tried converting the download shell magic to Python. But the ChimeraX Google Colab AlphaFold script uses shell magic in many places so that fix requires changing a lot of code.
So instead I tried a different fix where I redefined locale.getpreferredencoding() to return 'UTF-8' if it is not returning that value. This is an ugly fix but I was not able to track down how OpenMM changes the preferred encoding. Python executes getpreferredendcoding() as a C library call. The only environment variable that is set that that effects this seems to be LANG = en_US.UTF-8. There is no LC_ALL set. While this solution works for fixing Google Colab shell magic, the Python open() function and pathlib.Path.open() used by ColabFold still get an encoding = 'ANSI_X3.4-1968' and a second run of AlphaFold in the same Google Colab session fails attempting to write citations to a file:
Please cite ColabFold: Making protein folding accessible to all. Nature Methods (2022) if you use these predictions. 2023-01-18 03:34:54,527 Starting prediction on 2023-01-18 UTC time 2023-01-18 03:34:54,527 Installing ColabFold on Google Colab virtual machine. Using Tesla T4 graphics processor --------------------------------------------------------------------------- UnicodeEncodeError Traceback (most recent call last) <ipython-input-2-c96829b279cc> in <module> 425 remove_from_list(seq_list, 'prokaryote') # Obsolete "prokaryote" flag 426 --> 427 run_prediction(seq_list, use_templates = use_templates, energy_minimize = not dont_minimize) 2 frames <ipython-input-2-c96829b279cc> in run_prediction(sequences, job_name, msa_mode, pair_mode, use_templates, custom_template_path, energy_minimize, model_type, num_recycles, dpi, install_log) 77 78 from colabfold.batch import run ---> 79 run( 80 queries=queries, 81 result_dir='.', /usr/local/lib/python3.8/dist-packages/colabfold/batch.py in run(queries, result_dir, num_models, num_recycles, model_order, is_complex, num_ensemble, model_type, msa_mode, use_templates, custom_template_path, use_amber, keep_existing_results, rank_by, pair_mode, data_dir, host_url, random_seed, stop_at_score, recompile_padding, recompile_all_models, zip_results, prediction_callback, save_single_representations, save_pair_representations, training, use_gpu_relax, stop_at_score_below, dpi, max_msa) 1270 ) 1271 -> 1272 bibtex_file = write_bibtex( 1273 model_type, use_msa, use_env, use_templates, use_amber, result_dir 1274 ) /usr/local/lib/python3.8/dist-packages/colabfold/citations.py in write_bibtex(model, use_msa, use_env, use_templates, use_amber, result_dir, bibtex_file) 129 with bibtex_file.open("w") as writer: 130 for i in to_cite: --> 131 writer.write(citations[i]) 132 writer.write("\n") 133 UnicodeEncodeError: 'ascii' codec can't encode character '\xfc' in position 53: ordinal not in range(128)
So it looks like I actually have to figure out how the default encoding is being changed and prevent that.
comment:6 by , 3 years ago
Summary: | AlphaFold: NotImplementedError: A UTF-8 locale is required → AlphaFold: A UTF-8 locale is required. Got ANSI_X3.4-1968 |
---|
comment:7 by , 3 years ago
In Python 3.8, locale.getpreferredencoding() calls _locale.nl_langinfo(_locale.CODESET) and this is implemented in C and makes the C library call nl_langinfo(CODESET). That C library call indeed returns ANSI_X4.3-1968 (official name for ASCII) after running a prediction with minimization. Predictions without minimization have it return the correct 'UTF-8'. Hours of testing and study of C library locale documentation did not reveal how this could be. The setlocale(LC_ALL, "") C library call should copy the locale from the environment variables. Those give LANG=en_US.UTF-8. But still the nl_langinfo() Python call gives ANSI. I also tried _locale.setlocale(_locale.LC_CTYPE, "en_US.UTF-8") which worked without error, but the encoding still is reported as ANSI. I tried a separate C program compiled and run in the Google Colab terminal for the broken Colab session and it returned "UTF-8" as shown here.
#include <langinfo.h> #include <locale.h> #include <stdio.h> #include <stdlib.h> int main(int argc, char *argv[]) { setlocale(LC_CTYPE, ""); printf("%d\n", CODESET); printf("%s\n", nl_langinfo(CODESET)); exit(EXIT_SUCCESS); } gcc -o nli nli.c ./nli 14 UTF-8
The Python setlocale() call indicates UTF-8 just like the unbroken Google Colab runs
import _locale _locale.setlocale(_locale.LC_ALL) LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=C;LC_COLLATE=C;LC_MONETARY=C;LC_MESSAGES=C;LC_PAPER=C;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C;LC_IDENTIFICATION=C
Using the locale command from the Google Colab terminal indicated nothing wrong
/content# locale LANG=en_US.UTF-8 LANGUAGE= LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=
The C library locale definition files in /usr/share/i18n/locales have not been modified (all dated April 7 2022).
There was no significant change from the environment variables when Google Colab runs correctly without minimization, compared to with minimization.
os.environ without minimazation, preferred encoding is UTF-8: environ{'SHELL': '/bin/bash', 'NV_LIBCUBLAS_VERSION': '11.4.1.1043-1', 'NVIDIA_VISIBLE_DEVICES': 'all', '__EGL_VENDOR_LIBRARY_DIRS': '/usr/lib64-nvidia:/usr/share/glvnd/egl_vendor.d/', 'NV_NVML_DEV_VERSION': '11.2.152-1', 'NV_CUDNN_PACKAGE_NAME': 'libcudnn8', 'GLIBCXX_FORCE_NEW': '1', 'CGROUP_MEMORY_EVENTS': '/sys/fs/cgroup/memory.events /var/colab/cgroup/jupyter-children/memory.events', 'NV_LIBNCCL_DEV_PACKAGE': 'libnccl-dev=2.8.4-1+cuda11.2', 'NV_LIBNCCL_DEV_PACKAGE_VERSION': '2.8.4-1', 'VM_GCE_METADATA_HOST': '169.254.169.253', 'HOSTNAME': '3eda09ac2179', 'TBE_RUNTIME_ADDR': '172.28.0.1:8011', 'GCE_METADATA_TIMEOUT': '3', 'NVIDIA_REQUIRE_CUDA': 'cuda>=11.2 brand=tesla,driver>=418,driver<419 brand=tesla,driver>=450,driver<451', 'NV_LIBCUBLAS_DEV_PACKAGE': 'libcublas-dev-11-2=11.4.1.1043-1', 'NV_NVTX_VERSION': '11.2.152-1', 'NV_CUDA_CUDART_DEV_VERSION': '11.2.152-1', 'NV_LIBCUSPARSE_VERSION': '11.4.1.1152-1', 'NV_LIBNPP_VERSION': '11.3.2.152-1', 'NCCL_VERSION': '2.8.4-1', 'KMP_LISTEN_PORT': '6000', 'TF_FORCE_GPU_ALLOW_GROWTH': 'true', 'ENV': '/root/.bashrc', 'PWD': '/', 'TBE_EPHEM_CREDS_ADDR': '172.28.0.1:8009', 'TBE_CREDS_ADDR': '172.28.0.1:8008', 'NV_CUDNN_PACKAGE': 'libcudnn8=8.1.1.33-1+cuda11.2', 'NVIDIA_DRIVER_CAPABILITIES': 'compute,utility', 'LAST_FORCED_REBUILD': '20221207', 'NV_NVPROF_DEV_PACKAGE': 'cuda-nvprof-11-2=11.2.152-1', 'NV_LIBNPP_PACKAGE': 'libnpp-11-2=11.3.2.152-1', 'NV_LIBNCCL_DEV_PACKAGE_NAME': 'libnccl-dev', 'TCLLIBPATH': '/usr/share/tcltk/tcllib1.19', 'GLIBCPP_FORCE_NEW': '1', 'NV_LIBCUBLAS_DEV_VERSION': '11.4.1.1043-1', 'NV_LIBCUBLAS_DEV_PACKAGE_NAME': 'libcublas-dev-11-2', 'LD_PRELOAD': '/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4', 'NV_CUDA_CUDART_VERSION': '11.2.152-1', 'HOME': '/root', 'LANG': 'en_US.UTF-8', 'CUDA_VERSION': '11.2.2', 'CLOUDSDK_CONFIG': '/content/.config', 'NV_LIBCUBLAS_PACKAGE': 'libcublas-11-2=11.4.1.1043-1', 'COLAB_RELEASE_TAG': 'release-colab-20230117-060106-RC00', 'KMP_TARGET_PORT': '9000', 'KMP_EXTRA_ARGS': '--listen_host=172.28.0.12 --target_host=172.28.0.12 --tunnel_background_save_url=https://colab.research.google.com/tun/m/cc48301118ce562b961b3c22d803539adc1e0c19/gpu-t4-s-2cxtzs57e8ahs --tunnel_background_save_delay=10s --tunnel_periodic_background_save_frequency=30m0s --enable_output_coalescing=true --output_coalescing_required=true', 'NV_LIBNPP_DEV_PACKAGE': 'libnpp-dev-11-2=11.3.2.152-1', 'NV_LIBCUBLAS_PACKAGE_NAME': 'libcublas-11-2', 'CLOUDSDK_PYTHON': 'python3', 'NV_LIBNPP_DEV_VERSION': '11.3.2.152-1', 'NO_GCE_CHECK': 'False', 'PYTHONPATH': '/env/python', 'NV_LIBCUSPARSE_DEV_VERSION': '11.4.1.1152-1', 'LIBRARY_PATH': '/usr/local/cuda/lib64/stubs', 'NV_CUDNN_VERSION': '8.1.1.33', 'SHLVL': '0', 'NV_CUDA_LIB_VERSION': '11.2.2-1', 'NVARCH': 'x86_64', 'DATALAB_SETTINGS_OVERRIDES': '{"kernelManagerProxyPort":6000,"kernelManagerProxyHost":"172.28.0.12","jupyterArgs":["--ip=172.28.0.12","--transport=ipc"],"debugAdapterMultiplexerPath":"/usr/local/bin/dap_multiplexer","enableLsp":true}', 'NV_CUDNN_PACKAGE_DEV': 'libcudnn8-dev=8.1.1.33-1+cuda11.2', 'NV_CUDA_COMPAT_PACKAGE': 'cuda-compat-11-2', 'NV_LIBNCCL_PACKAGE': 'libnccl2=2.8.4-1+cuda11.2', 'LD_LIBRARY_PATH': '/usr/lib64-nvidia', 'GCS_READ_CACHE_BLOCK_SIZE_MB': '16', 'NV_NVPROF_VERSION': '11.2.152-1', 'PATH': '/opt/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/tools/node/bin:/tools/google-cloud-sdk/bin', 'NV_LIBNCCL_PACKAGE_NAME': 'libnccl2', 'NV_LIBNCCL_PACKAGE_VERSION': '2.8.4-1', 'PYTHONWARNINGS': 'ignore:::pip._internal.cli.base_command', 'DEBIAN_FRONTEND': 'noninteractive', 'COLAB_BACKEND_VERSION': 'next', 'OLDPWD': '/', 'JPY_PARENT_PID': '86', 'TERM': 'xterm-color', 'CLICOLOR': '1', 'PAGER': 'cat', 'GIT_PAGER': 'cat', 'MPLBACKEND': 'module://ipykernel.pylab.backend_inline', 'ENABLE_DIRECTORYPREFETCHER': '1', 'USE_AUTH_EPHEM': '1', 'PYDEVD_USE_FRAME_EVAL': 'NO', 'TF_FORCE_UNIFIED_MEMORY': '1', 'XLA_PYTHON_CLIENT_MEM_FRACTION': '2.0', 'TF_CPP_MIN_LOG_LEVEL': '1', 'TF2_BEHAVIOR': '1'}
os.environ after run with minimization, preferred encoding ANSI_X3.4-1968: environ{'SHELL': '/bin/bash', 'NV_LIBCUBLAS_VERSION': '11.4.1.1043-1', 'NVIDIA_VISIBLE_DEVICES': 'all', '__EGL_VENDOR_LIBRARY_DIRS': '/usr/lib64-nvidia:/usr/share/glvnd/egl_vendor.d/', 'NV_NVML_DEV_VERSION': '11.2.152-1', 'NV_CUDNN_PACKAGE_NAME': 'libcudnn8', 'GLIBCXX_FORCE_NEW': '1', 'CGROUP_MEMORY_EVENTS': '/sys/fs/cgroup/memory.events /var/colab/cgroup/jupyter-children/memory.events', 'NV_LIBNCCL_DEV_PACKAGE': 'libnccl-dev=2.8.4-1+cuda11.2', 'NV_LIBNCCL_DEV_PACKAGE_VERSION': '2.8.4-1', 'VM_GCE_METADATA_HOST': '169.254.169.253', 'HOSTNAME': 'e42b2f982929', 'TBE_RUNTIME_ADDR': '172.28.0.1:8011', 'GCE_METADATA_TIMEOUT': '3', 'NVIDIA_REQUIRE_CUDA': 'cuda>=11.2 brand=tesla,driver>=418,driver<419 brand=tesla,driver>=450,driver<451', 'NV_LIBCUBLAS_DEV_PACKAGE': 'libcublas-dev-11-2=11.4.1.1043-1', 'NV_NVTX_VERSION': '11.2.152-1', 'NV_CUDA_CUDART_DEV_VERSION': '11.2.152-1', 'NV_LIBCUSPARSE_VERSION': '11.4.1.1152-1', 'NV_LIBNPP_VERSION': '11.3.2.152-1', 'NCCL_VERSION': '2.8.4-1', 'KMP_LISTEN_PORT': '6000', 'TF_FORCE_GPU_ALLOW_GROWTH': 'true', 'ENV': '/root/.bashrc', 'PWD': '/', 'TBE_EPHEM_CREDS_ADDR': '172.28.0.1:8009', 'TBE_CREDS_ADDR': '172.28.0.1:8008', 'NV_CUDNN_PACKAGE': 'libcudnn8=8.1.1.33-1+cuda11.2', 'NVIDIA_DRIVER_CAPABILITIES': 'compute,utility', 'LAST_FORCED_REBUILD': '20221207', 'NV_NVPROF_DEV_PACKAGE': 'cuda-nvprof-11-2=11.2.152-1', 'NV_LIBNPP_PACKAGE': 'libnpp-11-2=11.3.2.152-1', 'NV_LIBNCCL_DEV_PACKAGE_NAME': 'libnccl-dev', 'TCLLIBPATH': '/usr/share/tcltk/tcllib1.19', 'GLIBCPP_FORCE_NEW': '1', 'NV_LIBCUBLAS_DEV_VERSION': '11.4.1.1043-1', 'NV_LIBCUBLAS_DEV_PACKAGE_NAME': 'libcublas-dev-11-2', 'LD_PRELOAD': '/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4', 'NV_CUDA_CUDART_VERSION': '11.2.152-1', 'HOME': '/root', 'LANG': 'en_US.UTF-8', 'CUDA_VERSION': '11.2.2', 'CLOUDSDK_CONFIG': '/content/.config', 'NV_LIBCUBLAS_PACKAGE': 'libcublas-11-2=11.4.1.1043-1', 'COLAB_RELEASE_TAG': 'release-colab-20230117-060106-RC00', 'KMP_TARGET_PORT': '9000', 'KMP_EXTRA_ARGS': '--listen_host=172.28.0.12 --target_host=172.28.0.12 --tunnel_background_save_url=https://colab.research.google.com/tun/m/cc48301118ce562b961b3c22d803539adc1e0c19/gpu-t4-s-1azqgql3u2gw3 --tunnel_background_save_delay=10s --tunnel_periodic_background_save_frequency=30m0s --enable_output_coalescing=true --output_coalescing_required=true', 'NV_LIBNPP_DEV_PACKAGE': 'libnpp-dev-11-2=11.3.2.152-1', 'NV_LIBCUBLAS_PACKAGE_NAME': 'libcublas-11-2', 'CLOUDSDK_PYTHON': 'python3', 'NV_LIBNPP_DEV_VERSION': '11.3.2.152-1', 'NO_GCE_CHECK': 'False', 'PYTHONPATH': '/env/python', 'NV_LIBCUSPARSE_DEV_VERSION': '11.4.1.1152-1', 'LIBRARY_PATH': '/usr/local/cuda/lib64/stubs', 'NV_CUDNN_VERSION': '8.1.1.33', 'SHLVL': '0', 'NV_CUDA_LIB_VERSION': '11.2.2-1', 'NVARCH': 'x86_64', 'DATALAB_SETTINGS_OVERRIDES': '{"kernelManagerProxyPort":6000,"kernelManagerProxyHost":"172.28.0.12","jupyterArgs":["--ip=172.28.0.12","--transport=ipc"],"debugAdapterMultiplexerPath":"/usr/local/bin/dap_multiplexer","enableLsp":true}', 'NV_CUDNN_PACKAGE_DEV': 'libcudnn8-dev=8.1.1.33-1+cuda11.2', 'NV_CUDA_COMPAT_PACKAGE': 'cuda-compat-11-2', 'NV_LIBNCCL_PACKAGE': 'libnccl2=2.8.4-1+cuda11.2', 'LD_LIBRARY_PATH': '/usr/lib64-nvidia', 'GCS_READ_CACHE_BLOCK_SIZE_MB': '16', 'NV_NVPROF_VERSION': '11.2.152-1', 'PATH': '/opt/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/tools/node/bin:/tools/google-cloud-sdk/bin', 'NV_LIBNCCL_PACKAGE_NAME': 'libnccl2', 'NV_LIBNCCL_PACKAGE_VERSION': '2.8.4-1', 'PYTHONWARNINGS': 'ignore:::pip._internal.cli.base_command', 'DEBIAN_FRONTEND': 'noninteractive', 'COLAB_BACKEND_VERSION': 'next', 'OLDPWD': '/', 'JPY_PARENT_PID': '88', 'TERM': 'xterm-color', 'CLICOLOR': '1', 'PAGER': 'cat', 'GIT_PAGER': 'cat', 'MPLBACKEND': 'module://ipykernel.pylab.backend_inline', 'ENABLE_DIRECTORYPREFETCHER': '1', 'USE_AUTH_EPHEM': '1', 'PYDEVD_USE_FRAME_EVAL': 'NO', 'TF_FORCE_UNIFIED_MEMORY': '1', 'XLA_PYTHON_CLIENT_MEM_FRACTION': '2.0', 'TF_CPP_MIN_LOG_LEVEL': '1', 'TF2_BEHAVIOR': '1'}
I also tested whether just importing OpenMM without running AlphaFold prediction causes the encoding to change. I had it install AlphaFold, Conda, and OpenMM as it normally does when energy minimization requested and then return. Then I imported openmm (from simtk import openmm). The getpreferredencoding() still gave the correct "UTF-8". So it takes additional steps to get the ANSI encoding.
comment:8 by , 3 years ago
As indicated in the preceding comment Python is returning _locale.nl_langinfo(CODESET) = 'ANSI_X4.3-1968' instead of the expected 'UTF-8' for no reason I could figure out. Based on the documented behavior of C library calls nl_langinfo() and setlocale() that are being used by Python 3.8 this behavior seems wrong. No doubt I am missing something.
A new hack to work around this problem is to replace _locale.nl_langinfo(CODESET) in Python so that it always returns UTF-8. I've tested that and it seems to work on multiple AlphaFold prediction runs in the same Google Colab session with minimization. The default pathlib and open() files created for writing properly get UTF-8 encoding. My previous work around solution of monkey patching locale.getpreferredencoding() did not fix the encoding of those opened files so only the first AlphaFold prediction was working with that solution.
comment:9 by , 3 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Fixed.
I monkeypatched Python _locale.nl_langinfo(CODESET) to force it to return UTF-8 to work around this problem could find no better solution. This fixes it even for multiple AlphaFold runs with energy minimization in the same Google Colab session. There are AlphaFold and ColabFold Github issues for the same problem that have not reported any solution.
I am investigating this AlphaFold error now. It happens when you enable energy minimization, so the easy way to avoid it right now is don't enable energy minimization. The AlphaFold run completes successfully but making a zip file from the results fails due to the text encoding error.
The problem seems to be a bug in Google Colab where all shell commands are broken if the locale which controls the text encoding (utf-8 vs ANSI) is changed from utf-8. And it looks like OpenMM which does the energy minimization somehow changes the locale. I suspect this bug was introduced into Google Colab when the updated from Python 3.7 to Python 3.8 last month.
Long ago I saw this same error, but it would only happen after OpenMM failed.
But now it happens even when OpenMM energy minimization is successful. This might also be caused by an update to the version of OpenMM used by AlphaFold. At any rate, the real problem is Google Colab can't execute any shell commands if a different text encoding from utf-8 is used. That is a bug, but Google isn't likely to fix it any time soon. So I will look for some way to work around the bug.