Opened 3 years ago

Closed 3 years ago

#6959 closed defect (fixed)

AlphaFold prediction energy OpenMM minimization always fails

Reported by: Tom Goddard Owned by: Tom Goddard
Priority: moderate Milestone:
Component: Structure Prediction Version:
Keywords: Cc:
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

Tomas Fernandez noted in a YouTube comment of a ChimeraX AlphaFold video that AlphaFold runs that used to work are now failing in energy minimization.

Tomas writes:

"Nevertheless, and in order to confirm that this error is dependent on the predicted structure, I tried to predict the structure of a 70 amino acid protein whose model has been calculated several times through AlphaFold in classes by many students, and the error still pops up: ValueError ­ Traceback (most recent call last) in () 692 seq_list = seq_list[1:] 693 --> 694 run_prediction(seq_list, energy_minimize = not dont_minimize) 5 frames /usr/local/lib/python3.7/dist-packages/alphafold/relax/amber_minimize.py in _run_one_iteration(pdb_string, max_iterations, tolerance, stiffness, restraint_set, max_attempts, use_gpu, exclude_residues) 417 logging.info(e) 418 if not minimized: --> 419 raise ValueError(f"Minimization failed after {max_attempts} attempts.") 420 retopt_time = time.time() - start 421 retmin_attempts = attempts ValueError: Minimization failed after 100 attempts."

I also tested on 128 amino acid 7mrx chain A and it failed in the same way although it works correctly with full alphafold on minsky.

So it looks like OpenMM is somehow broken now on Google Colab. ChimeraX installs fixed versions of AlphaFold and OpenMM and all dependencies but possibly some implicit dependency without a specified version number updated and broke things. Another possibility is that Google Colab changed, possibly updating their CUDA version and that broke OpenMM.

Change History (12)

comment:1 by Tom Goddard, 3 years ago

Here's a ChimeraX AlphaFold run output showing the failure

Using AlphaFold 2.2.0
Sequence length 128
Have Colab GPU runtime
Installing HMMER for computing sequence alignments
Installing matplotlib to plot sequence alignment coverage
Installing AlphaFold
Installing OpenMM for structure energy minimization
Searching sequence databases (147 Gbytes).
Search will take 29 minutes or more.
Finding fastest mirror for sequence databases using europe
Searching uniref90 sequence database, 59 Gbytes
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
Merging chunk sequence alignments for uniref90
Searching smallbfd sequence database, 17 Gbytes
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Merging chunk sequence alignments for smallbfd
Searching mgnify sequence database, 71 Gbytes
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
 61 62 63 64 65 66 67 68 69 70 71
Merging chunk sequence alignments for mgnify

Computing structures using 5 AlphaFold parameter sets:
 model_1_ptm model_2_ptm model_3_ptm model_4_ptm model_5_ptm
Energy minimizing best structure model_4_ptm with OpenMM and Amber forcefield
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-7cd5db441766> in <module>()
    692     seq_list = seq_list[1:]
    693 
--> 694 run_prediction(seq_list, energy_minimize = not dont_minimize)

5 frames
<ipython-input-1-7cd5db441766> in run_prediction(sequences, model_names, energy_minimize, output_dir, install_log)
    661     if best_model_name:
    662         if energy_minimize:
--> 663             minimize_best_model(best_model_name, output_dir)
    664         # Copy best model and pae files.
    665         pdb_suffix = '_relaxed.pdb' if energy_minimize else '_unrelaxed.pdb'

<ipython-input-1-7cd5db441766> in minimize_best_model(best_model_name, output_dir)
    528     with open(unrelaxed_path, 'r') as f:
    529         best_unrelaxed_protein = protein.from_pdb_string(f.read())
--> 530         relaxed_pdb = energy_minimize_structure(best_unrelaxed_protein)
    531         # Write out PDB file
    532         write_pdb(relaxed_pdb, best_model_name + '_relaxed.pdb', output_dir)

<ipython-input-1-7cd5db441766> in energy_minimize_structure(pdb_model)
    541         max_outer_iterations=1,
    542         use_gpu=True)
--> 543     relaxed_pdb, _, _ = amber_relaxer.process(prot=pdb_model)
    544     return relaxed_pdb
    545 

/usr/local/lib/python3.7/dist-packages/alphafold/relax/relax.py in process(self, prot)
     64         exclude_residues=self._exclude_residues,
     65         max_outer_iterations=self._max_outer_iterations,
---> 66         use_gpu=self._use_gpu)
     67     min_pos = out['pos']
     68     start_pos = out['posinit']

/usr/local/lib/python3.7/dist-packages/alphafold/relax/amber_minimize.py in run_pipeline(prot, stiffness, use_gpu, max_outer_iterations, place_hydrogens_every_iteration, max_iterations, tolerance, restraint_set, max_attempts, checks, exclude_residues)
    481         restraint_set=restraint_set,
    482         max_attempts=max_attempts,
--> 483         use_gpu=use_gpu)
    484     prot = protein.from_pdb_string(ret["min_pdb"])
    485     if place_hydrogens_every_iteration:

/usr/local/lib/python3.7/dist-packages/alphafold/relax/amber_minimize.py in _run_one_iteration(pdb_string, max_iterations, tolerance, stiffness, restraint_set, max_attempts, use_gpu, exclude_residues)
    417       logging.info(e)
    418   if not minimized:
--> 419     raise ValueError(f"Minimization failed after {max_attempts} attempts.")
    420   ret["opt_time"] = time.time() - start
    421   ret["min_attempts"] = attempts

ValueError: Minimization failed after 100 attempts.

comment:2 by Tom Goddard, 3 years ago

If I change to use_gpu=False in the AmberRelaxation() call in the colab Python notebook then the energy minimization succeeds. So the problem may have to do with OpenMM using the GPU.

comment:3 by Tom Goddard, 3 years ago

Adding logging to AlphaFold

from absl import logging
logging.set_verbosity(logging.INFO)

shows OpenMM cannot use CUDA

Energy minimizing best structure model_4_ptm with OpenMM and Amber forcefield
INFO:absl:alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 127 (ARG) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
INFO:absl:Minimizing protein, attempt 1 of 100.
INFO:absl:Restraining 1022 / 1991 particles.
INFO:absl:No compatible CUDA device is available
INFO:absl:Minimizing protein, attempt 2 of 100.
INFO:absl:Restraining 1022 / 1991 particles.
INFO:absl:No compatible CUDA device is available
... 100 attempts then fails.
Version 0, edited 3 years ago by Tom Goddard (next)

comment:4 by Tom Goddard, 3 years ago

Conda installing OpenMM pulled in the cudatoolkit-11.7.0 that was released 6 days ago. Maybe that is incompatible with the old OpenMM 7.5.1.

/opt/conda/bin/conda install -qy -c conda-forge python=3.7 openmm=7.5.1 pdbfixer
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Solving environment: ...working... failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: /opt/conda

  added / updated specs:
    - openmm=7.5.1
    - pdbfixer
    - python=3.7


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2022.5.18.1|       ha878542_0         144 KB  conda-forge
    certifi-2022.5.18.1        |   py37h89c1867_0         150 KB  conda-forge
    cffi-1.14.6                |   py37hc58025e_0         225 KB  conda-forge
    colorama-0.4.4             |     pyh9f0ad1d_0          18 KB  conda-forge
    conda-4.12.0               |   py37h89c1867_0         1.0 MB  conda-forge
    conda-package-handling-1.8.1|   py37h540881e_1         1.0 MB  conda-forge
    cryptography-37.0.2        |   py37h38fbfac_0         1.5 MB  conda-forge
    cudatoolkit-11.7.0         |      hd8887f6_10       831.6 MB  conda-forge
    fftw-3.3.10                |nompi_h77c792f_102         6.4 MB  conda-forge
    libblas-3.9.0              |14_linux64_openblas          12 KB  conda-forge
    libcblas-3.9.0             |14_linux64_openblas          12 KB  conda-forge
    libgfortran-ng-12.1.0      |      h69a702a_16          23 KB  conda-forge
    libgfortran5-12.1.0        |      hdcd56e2_16         1.8 MB  conda-forge
    liblapack-3.9.0            |14_linux64_openblas          12 KB  conda-forge
    libopenblas-0.3.20         |pthreads_h78a6416_0        10.1 MB  conda-forge
    numpy-1.21.6               |   py37h976b520_0         6.1 MB  conda-forge
    ocl-icd-2.3.1              |       h7f98852_0         119 KB  conda-forge
    ocl-icd-system-1.0.0       |                1           4 KB  conda-forge
    openmm-7.5.1               |   py37h96c4ddf_1        10.7 MB  conda-forge
    openssl-1.1.1o             |       h166bdaf_0         2.1 MB  conda-forge
    pdbfixer-1.7               |     pyhd3deb0d_0         167 KB  conda-forge
    pip-22.1.1                 |     pyhd8ed1ab_0         1.5 MB  conda-forge
    pycosat-0.6.3              |py37h540881e_1010         107 KB  conda-forge
    pysocks-1.7.1              |   py37h89c1867_5          28 KB  conda-forge
    python-3.7.10              |hffdb5ce_100_cpython        57.3 MB  conda-forge
    python_abi-3.7             |          2_cp37m           4 KB  conda-forge
    ruamel_yaml-0.15.80        |py37h5e8e339_1006         270 KB  conda-forge
    setuptools-62.3.2          |   py37h89c1867_0         1.4 MB  conda-forge
    six-1.16.0                 |     pyh6c4a22f_0          14 KB  conda-forge
    tqdm-4.64.0                |     pyhd8ed1ab_0          81 KB  conda-forge
    urllib3-1.25.8             |   py37hc8dfbb8_1         160 KB  conda-forge
    ------------------------------------------------------------
                                           Total:       934.1 MB

The following NEW packages will be INSTALLED:

  colorama           conda-forge/noarch::colorama-0.4.4-pyh9f0ad1d_0
  cudatoolkit        conda-forge/linux-64::cudatoolkit-11.7.0-hd8887f6_10
  fftw               conda-forge/linux-64::fftw-3.3.10-nompi_h77c792f_102
  libblas            conda-forge/linux-64::libblas-3.9.0-14_linux64_openblas
  libcblas           conda-forge/linux-64::libcblas-3.9.0-14_linux64_openblas
  libgfortran-ng     conda-forge/linux-64::libgfortran-ng-12.1.0-h69a702a_16
  libgfortran5       conda-forge/linux-64::libgfortran5-12.1.0-hdcd56e2_16
  liblapack          conda-forge/linux-64::liblapack-3.9.0-14_linux64_openblas
  libopenblas        conda-forge/linux-64::libopenblas-0.3.20-pthreads_h78a6416_0
  numpy              conda-forge/linux-64::numpy-1.21.6-py37h976b520_0
  ocl-icd            conda-forge/linux-64::ocl-icd-2.3.1-h7f98852_0
  ocl-icd-system     conda-forge/linux-64::ocl-icd-system-1.0.0-1
  openmm             conda-forge/linux-64::openmm-7.5.1-py37h96c4ddf_1
  pdbfixer           conda-forge/noarch::pdbfixer-1.7-pyhd3deb0d_0
  python_abi         conda-forge/linux-64::python_abi-3.7-2_cp37m
  six                conda-forge/noarch::six-1.16.0-pyh6c4a22f_0

The following packages will be REMOVED:

  brotlipy-0.7.0-py39h27cfd23_1003

The following packages will be UPDATED:

  ca-certificates    pkgs/main::ca-certificates-2022.4.26-~ --> conda-forge::ca-certificates-2022.5.18.1-ha878542_0
  conda-package-han~ pkgs/main::conda-package-handling-1.8~ --> conda-forge::conda-package-handling-1.8.1-py37h540881e_1
  cryptography       pkgs/main::cryptography-37.0.1-py39h9~ --> conda-forge::cryptography-37.0.2-py37h38fbfac_0
  pip                pkgs/main/linux-64::pip-21.2.4-py39h0~ --> conda-forge/noarch::pip-22.1.1-pyhd8ed1ab_0
  pycosat            pkgs/main::pycosat-0.6.3-py39h27cfd23~ --> conda-forge::pycosat-0.6.3-py37h540881e_1010
  pysocks            pkgs/main::pysocks-1.7.1-py39h06a4308~ --> conda-forge::pysocks-1.7.1-py37h89c1867_5
  setuptools         pkgs/main::setuptools-61.2.0-py39h06a~ --> conda-forge::setuptools-62.3.2-py37h89c1867_0

The following packages will be SUPERSEDED by a higher-priority channel:

  certifi            pkgs/main::certifi-2022.5.18.1-py39h0~ --> conda-forge::certifi-2022.5.18.1-py37h89c1867_0
  cffi                pkgs/main::cffi-1.15.0-py39hd667e15_1 --> conda-forge::cffi-1.14.6-py37hc58025e_0
  conda              pkgs/main::conda-4.12.0-py39h06a4308_0 --> conda-forge::conda-4.12.0-py37h89c1867_0
  openssl              pkgs/main::openssl-1.1.1o-h7f8727e_0 --> conda-forge::openssl-1.1.1o-h166bdaf_0
  python                pkgs/main::python-3.9.12-h12debd9_0 --> conda-forge::python-3.7.10-hffdb5ce_100_cpython
  ruamel_yaml        pkgs/main::ruamel_yaml-0.15.100-py39h~ --> conda-forge::ruamel_yaml-0.15.80-py37h5e8e339_1006
  tqdm               pkgs/main/linux-64::tqdm-4.64.0-py39h~ --> conda-forge/noarch::tqdm-4.64.0-pyhd8ed1ab_0
  urllib3            pkgs/main::urllib3-1.26.9-py39h06a430~ --> conda-forge::urllib3-1.25.8-py37hc8dfbb8_1


Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... By downloading and using the CUDA Toolkit conda packages, you accept the terms and conditions of the CUDA End User License Agreement (EULA): https://docs.nvidia.com/cuda/eula/index.html

done
+ cd /opt/conda/lib/python3.7/site-packages/
+ patch -p0
patching file simtk/openmm/app/topology.py
Hunk #1 succeeded at 353 (offset -3 lines).
+ ln -s /opt/conda/lib/python3.7/site-packages/simtk .
+ ln -s /opt/conda/lib/python3.7/site-packages/pdbfixer .
Last edited 3 years ago by Tom Goddard (previous) (diff)

comment:5 by Tom Goddard, 3 years ago

I tried conda installing cudatoolkit 11.6.0 that came out 4 months ago instead of cudatoolkit 11.7.0 but got the same OpenMM error "No compatible CUDA device is available".

comment:6 by Tom Goddard, 3 years ago

Full alphafold has use_gpu_relax false by default but if I turn it on and run 7mrx chain A on minsky it minimizes without errors. So the problem appears to be with OpenMM using GPU on Google Colab, something wrong with the GPU CUDA configuration on Colab.

comment:7 by Tom Goddard, 3 years ago

AlphaFold's standard Google Colab notebook (provided by DeepMind) also fails in OpenMM minimization in the same way. This test was also run on my Colab Pro account and had an Nvidia P100 gpu.

AMBER relaxation: 86%
6/7 [elapsed: 26:10 remaining: 04:04]
/usr/local/lib/python3.7/dist-packages/jax/_src/tree_util.py:189: FutureWarning: jax.tree_util.tree_multimap() is deprecated. Please use jax.tree_util.tree_map() instead as a drop-in replacement.
  'instead as a drop-in replacement.', FutureWarning)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-f89616eedc2d> in <module>()
     88         max_outer_iterations=3,
     89         use_gpu=True)
---> 90     relaxed_pdb, _, _ = amber_relaxer.process(prot=unrelaxed_proteins[best_model_name])
     91   else:
     92     print('Warning: Running without the relaxation stage.')

2 frames
/opt/conda/lib/python3.7/site-packages/alphafold/relax/relax.py in process(self, prot)
     64         exclude_residues=self._exclude_residues,
     65         max_outer_iterations=self._max_outer_iterations,
---> 66         use_gpu=self._use_gpu)
     67     min_pos = out['pos']
     68     start_pos = out['posinit']

/opt/conda/lib/python3.7/site-packages/alphafold/relax/amber_minimize.py in run_pipeline(prot, stiffness, use_gpu, max_outer_iterations, place_hydrogens_every_iteration, max_iterations, tolerance, restraint_set, max_attempts, checks, exclude_residues)
    481         restraint_set=restraint_set,
    482         max_attempts=max_attempts,
--> 483         use_gpu=use_gpu)
    484     prot = protein.from_pdb_string(ret["min_pdb"])
    485     if place_hydrogens_every_iteration:

/opt/conda/lib/python3.7/site-packages/alphafold/relax/amber_minimize.py in _run_one_iteration(pdb_string, max_iterations, tolerance, stiffness, restraint_set, max_attempts, use_gpu, exclude_residues)
    417       logging.info(e)
    418   if not minimized:
--> 419     raise ValueError(f"Minimization failed after {max_attempts} attempts.")
    420   ret["opt_time"] = time.time() - start
    421   ret["min_attempts"] = attempts

ValueError: Minimization failed after 100 attempts.
Last edited 3 years ago by Tom Goddard (previous) (diff)

comment:8 by Tom Goddard, 3 years ago

Testing whether CUDA is working on Google Colab with numba from conda using command "numba -s" suggests it is working. Not clear though. What does "CUDA NVIDIA Bindings Available: False" mean?

# /opt/conda/bin/numba -s
System info:
--------------------------------------------------------------------------------
__Time Stamp__
Report started (local time)                   : 2022-05-26 00:50:04.490246
UTC start time                                : 2022-05-26 00:50:04.490258
Running time (s)                              : 0.661247

__Hardware Information__
Machine                                       : x86_64
CPU Name                                      : skylake-avx512
CPU Count                                     : 2
Number of accessible CPUs                     : 2
List of accessible CPUs cores                 : 0-1
CFS Restrictions (CPUs worth of runtime)      : None

CPU Features                                  : 64bit adx aes avx avx2 avx512bw
                                                avx512cd avx512dq avx512f avx512vl
                                                bmi bmi2 clflushopt clwb cmov cx16
                                                cx8 f16c fma fsgsbase fxsr invpcid
                                                lzcnt mmx movbe pclmul popcnt
                                                prfchw rdrnd rdseed rtm sahf sse
                                                sse2 sse3 sse4.1 sse4.2 ssse3
                                                xsave xsavec xsaveopt xsaves

Memory Total (MB)                             : 12986
Memory Available (MB)                         : 2171

__OS Information__
Platform Name                                 : Linux-5.4.188+-x86_64-with-debian-buster-sid
Platform Release                              : 5.4.188+
OS Name                                       : Linux
OS Version                                    : #1 SMP Sun Apr 24 10:03:06 PDT 2022
OS Specific Version                           : ?

CUDA NVIDIA Bindings Available                : False
CUDA NVIDIA Bindings In Use                   : False
CUDA Detect Output:
Found 1 CUDA devices
id 0    b'Tesla P100-PCIE-16GB'                              [SUPPORTED]
                      Compute Capability: 6.0
                           PCI Device ID: 4
                              PCI Bus ID: 0
                                    UUID: GPU-5f0dd9f9-fda9-7793-69c8-720784ac49d6
                                Watchdog: Disabled
             FP32/FP64 Performance Ratio: 2
Summary:
        1/1 devices are supported

CUDA Libraries Test Output:
Finding nvvm from Conda environment
        named  libnvvm.so.4.0.0
        trying to open library...       ok
Finding cudart from Conda environment
        named  libcudart.so.11.6.55
        trying to open library...       ok
Finding cudadevrt from Conda environment
        named  libcudadevrt.a
Finding libdevice from Conda environment
        searching for compute_20...     ok
        searching for compute_30...     ok
        searching for compute_35...     ok
        searching for compute_50...     ok


__SVML Information__
SVML State, config.USING_SVML                 : False
SVML Library Loaded                           : False
llvmlite Using SVML Patched LLVM              : True
SVML Operational                              : False

__Threading Layer Information__
TBB Threading Layer Available                 : True
+-->TBB imported successfully.
OpenMP Threading Layer Available              : True
+-->Vendor: GNU
Workqueue Threading Layer Available           : True
+-->Workqueue imported successfully.

__Numba Environment Variable Information__
None found.

__Conda Information__
Conda not available.

__Installed Packages__
Package                Version
---------------------- -----------
brotlipy               0.7.0
certifi                2022.5.18.1
cffi                   1.15.0
charset-normalizer     2.0.4
conda                  4.12.0
conda-package-handling 1.8.1
cryptography           37.0.1
idna                   3.3
llvmlite               0.38.0
numba                  0.55.1
numpy                  1.21.6
OpenMM                 7.5.1
pdbfixer               1.7
pip                    21.2.2
pycosat                0.6.3
pycparser              2.21
pyOpenSSL              22.0.0
PySocks                1.7.1
requests               2.27.1
ruamel-yaml-conda      0.15.100
setuptools             61.2.0
tqdm                   4.64.0
urllib3                1.26.9
wheel                  0.37.1

No errors reported.


__Warning log__
Warning: Conda not available.
 Error was [Errno 2] No such file or directory: 'conda': 'conda'

Warning (psutil): psutil cannot be imported. For more accuracy, consider installing it.
Warning (no file): /sys/fs/cgroup/cpuacct/cpu.cfs_quota_us
Warning (no file): /sys/fs/cgroup/cpuacct/cpu.cfs_period_us
--------------------------------------------------------------------------------
If requested, please copy and paste the information between
the dashed (----) lines, or from a given specific section as
appropriate.

=============================================================
IMPORTANT: Please ensure that you are happy with sharing the
contents of the information present, any information that you
wish to keep private you should remove before sharing.
=============================================================

comment:9 by Tom Goddard, 3 years ago

Testing OpenMM on Google Colab gives the same no compatible CUDA device error

# /opt/conda/bin/python -m simtk.testInstallation

OpenMM Version: 7.5.1
Git Revision: a9cfd7fb9343e21c3dbb76e377c721328830a3ee

There are 4 Platforms available:

1 Reference - Successfully computed forces
2 CPU - Successfully computed forces
3 CUDA - Error computing forces with CUDA platform
4 OpenCL - Error computing forces with OpenCL platform

CUDA platform error: No compatible CUDA device is available

OpenCL platform error: Error initializing context: clCreateContext (-6)

Median difference in forces between platforms:

Reference vs. CPU: 6.29577e-06

All differences are within tolerance.

Testing OpenMM on minsky shows it can use CUDA

goddard@minsky:~/ucsf/af/runs$ python -m simtk.testInstallation

OpenMM Version: 7.5.1
Git Revision: a9cfd7fb9343e21c3dbb76e377c721328830a3ee

There are 4 Platforms available:

1 Reference - Successfully computed forces
2 CPU - Successfully computed forces
3 CUDA - Successfully computed forces
4 OpenCL - Successfully computed forces

Median difference in forces between platforms:

Reference vs. CPU: 6.29406e-06
Reference vs. CUDA: 6.73166e-06
CPU vs. CUDA: 7.34172e-07
Reference vs. OpenCL: 6.75475e-06
CPU vs. OpenCL: 8.05102e-07
CUDA vs. OpenCL: 2.71517e-07

All differences are within tolerance.

comment:10 by Tom Goddard, 3 years ago

Googles AlphaFold colab notebook also failed on 7mrx chain A when run on an Nvidia T4 using Colab instead of ColabPro (using my other Google account).

comment:11 by Tom Goddard, 3 years ago

I think the problem is because conda has decided to install cudatoolkit version 11.7.0 which is too new for the Google Colab Nvidia driver. Also 11.6.0, 11.5.0, 11.3.1 are too new. But 11.2.2 and 11.0.3 worked. I tested these different versions of OpenMM on Google Colab without alphafold using the following notebook code to install openmm and run the OpenMM test code. nvcc reports the colab cuda version is 11.1 and maybe that is suppose to match the condacudatoolkit version.

def run_shell_commands(commands, filename, install_log):
    with open(filename, 'w') as f:
        f.write(commands)

    # The -x option logs each command with a prompt in front of it.
    !bash -x "{filename}" >> "{install_log}" 2>&1
    if _exit_code != 0:
        raise RuntimeError('Error running shell script %s, output in log file %s'
                           % (filename, install_log))
    
def install_openmm(
        conda_install_sh = 'https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh',
        install_log = 'install_log.txt'):
    '''Must install alphafold first since an openmm patch from alphafold is used.'''
    # Install Conda
    import os.path
    conda_install = os.path.join('/tmp', os.path.basename(conda_install_sh))
    cmds = f'''
# Exit if any command fails
set -e

wget -q -P /tmp {conda_install_sh} \
    && bash "{conda_install}" -b -p /opt/conda -f \
    && rm "{conda_install}"

# Install Python, OpenMM and pdbfixer in Conda
/opt/conda/bin/conda update -qy conda && \
    /opt/conda/bin/conda install -qy -c conda-forge python=3.7 openmm=7.5.1 cudatoolkit=11.0.3
'''
    run_shell_commands(cmds, 'install_openmm.sh', install_log)

install_openmm()

!/opt/conda/bin/python -m simtk.testInstallation
Last edited 3 years ago by Tom Goddard (previous) (diff)

comment:12 by Tom Goddard, 3 years ago

Resolution: fixed
Status: assignedclosed

Fixed.

Specified conda cudatoolkit version 11.2 in colab openmm install script to assure it is compatible with colab cuda driver.

Not clear whether Google Colab or Conda changed to break conda from automatically choosing a compatible cudatoolkit version. Also not sure how long it has been broken.

Note: See TracTickets for help on using tickets.