Opened 4 years ago

Last modified 4 years ago

#5043 assigned enhancement

Add an AlphaFold structure prediction tool

Reported by: Tom Goddard Owned by: Tom Goddard
Priority: moderate Milestone:
Component: Structure Analysis Version:
Keywords: Cc:
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

Would like the user to be able to specify a sequence and compute a predicted structure using AlphaFold.

An example of a server that does this is the AlphaFold Google Colab

https://colab.research.google.com/github/deepmind/alphafold/blob/main/notebooks/AlphaFold.ipynb

which uses free virtual machines with GPUs provided by Google that run Python in a web browser.

There are several requirements needed to run AlphaFold that are difficult to meet on the researcher's desktop computer so the compute job needs to be run on a server. Some of the requirements are:

1) Nvidia GPU supporting CUDA is needed. I believe AlphaFold cannot run without this because it uses DeepMinds JAX library for just-in-time compilation to the GPU and derivative calculation.

2) Linux operating system. It appears that AlphaFold has not been run on Windows. Mac does not support Nvidia GPU.

3) Large sequence and structure databases, ~2.2 Tbytes ideally on a fast SSD drive. The entire sequence database will be scanned for creating a multiple sequence alignment many thousands of sequences deep. Smaller databases can be used. The above mentioned Google Colab server uses 150 Gbytes that is streamed from the web on every prediction.

4) Multi-hour computation for each structure prediction. A 500 amino acid test with Google Colab took about 2 hours. For a ChimeraX service there may be hundreds of jobs running per day, requiring several hundred GPU compute hours per day. This amount of compute resources will be expensive to maintain. Google Colab is free even for commercial use but service terms are subject to change, and there are no guarantees of available resources, even getting a virtual machine with GPU depends on availability. Google offers a $10 Colab Pro service that gives priority access to resources, but again with no guarantees. Users could be encouraged to use their own Colab Pro account if ChimeraX utilized Google Colab resources.

Change History (18)

comment:1 by Tom Goddard, 4 years ago

I tried making an AlphaFold prediction tool in ChimeraX using a web viewer panel that shows Google Colab with the chosen sequence pasted into an AlphaFold notebook and run, and the prediction downloaded and displayed. Tested with nanobody structure 3dwt chain A, about 1 hour to predict a structure. It works pretty nicely. Here are some of the drawbacks.

D1) Requires a standard Google account (used for Drive / Calendar / Colab...) so the user can use Google Colab. Most users will already have this.
D2) Requires logging into the Google account from the ChimeraX AlphaFold browser. This only has to be done one time -- the login cookies are remembered.
D3) Have to click "Run Anyway" on a popup that says the interactive python notebook does not come from Google, it comes from GitHub and may not be safe. Has to be done each ChimeraX session that uses AlphaFold prediction.
D4) The full Google Colab web user interface is shown which can be intimidating. The user does not need to use it in any way since ChimeraX pastes in the sequence, runs, and downloads the result. So it functions mainly as a log to show the calculation progress.
D5) The web panel always floats on top of the ChimeraX window and cannot be iconified. This is an annoying limitation of all our ChimeraX tool panels. The pane can be docked but need to be pretty large to fit the Colab UI and log messages. A possible fix would be to have a general AlphaFold tool pane offering database search and prediction to start the job and it can have an iconify/deiconify button "Prediction Log".
D6) The ChimeraX tool uses implementation details of the Google Colab site to inject the protein sequence and run the notebook without user interaction. That will likely break if Google changes the web site. There are no APIs to programmatically control Colab. We should do a status check to an RBVI url that can announce if the service is broken and recommend updating to newer working ChimeraX.
D7) Google Colab does not guarantee service and free accounts seem to be limited to 2 hours per day, so about 1 or 2 structure predictions. I hit this limit during development so bought Colab Pro service for $10 / month which tries to provide 12 hours / day service and priority access to faster GPUs. We should describe this cheap service in the docs and link to it from a general AlphaFold gui. Code will need to be improved to gracefully handle session expiration.
D8) The Colab AlphaFold uses reduced sequence databases (150 Gbytes) compared to full AlphaFold (2.2 Tbytes) and uses no structure templates. This can result in worse predictions. We should offer another way to do a full calculation possibly using AWS or RBVI servers.
D9) The tool needs weeks of improvement to handle errors gracefully, improve the user interface, and offer better output such as all 5 models that are computed. Currently it is run with the "alphafold predict <chain-spec>" command.

This Colab AlphaFold solution is the easiest way I could come up with to provide the heavy compute resources (Linux + Nvidia GPU + hundreds of Gbytes to Tbytes of disk + hour long run times per job). I think it is a pretty usable first try.

comment:2 by Tom Goddard, 4 years ago

More testing of AlphaFold reveals various usability problems with longer sequences than the 100 amino acid test sequences I was using.

P1) Jackhmmer sequence search has very long startup time for longer sequences. Tried MYOM1_HUMAN, 1685 amino acids and first 1 GB of 150 GB of sequence search had not completed in 30 minutes. Reducing to using first 1000 residues and it took 30 minutes to an hour before it searched the first 1 GB, but then proceeded at a few Gbytes per minute, similar speed to short 100 aa sequences.

P2) Runs out of memory. Ran out of memory after 2 hours on first 1000 amino acids of MYOM1_HUMAN. Had completed multiple sequence alignment and started AlphaFold run using tensorflow which allocated too much memory. There were ~130,000 sequences in the multiple alignment, about 10x more than in previous small protein runs I tried. I believe the memory use is proportional to sequence length squared and number of sequences in alignment (from reading the methods paper). Standard Google Colab virtual machine has 12 Gbytes. With Colab Pro I can increase it to 25 Gbytes. That also ran out of memory and log showed tensorflow allocating 5 0.5 Gbytes chunks, then two 12 Gbyte blocks. With Colab Pro+ ($50/month) can use 52 Gbytes memory. Users will only have free colab account. I added code to limit each of 3 databases (uniref90, sbfd, mgnify) to at most 10,000 unique sequence hits. That kept the memory use down to 5 Gbytes.

P3) CUDA compiler fails when energy minimizing structure in AlphaFold code get_violation_metrics(). OpenMM minimization iterates until there are no violations. Traceback for failure is below. Memory use was 9 Gbytes on a 25 Gbyte VM so does not seem like the problem. The colab log suggests setting an XLA_FLAGS environment variable to force single-thread CUDA compilations which it said will be slower. The tensor flow variable TF_FORCE_UNIFIED_MEMORY=1 set by our notebook might be a cause. Could be the CUDA driver on my Colab VM got into a bad state. Needs more testing to find a way to work around this. Got this error with first 1000 amino acids and first 500 of MYOM1_HUMAN.

P4) The colab notebook code sometimes chooses Asia mirror instead of United States which probably leads to very slow fetch of the 150 Gbytes of databases. I took this code from the Google AlphaFold notebook. It simply queries US, Asia and Europe mirrors and uses the first to deliver a 1 Gbyte chunk. May need to allow the user to force this choice instead.

Traceback for failed energy minimization (P3) with first 1000 amino acids of MYOM1_HUMAN:

RuntimeError                              Traceback (most recent call last)
<ipython-input-1-374326cb1a27> in <module>()
    417 #sequence = 'QVQLVESGGGSVQAGGSLRLSCTASGGSEYSYSTFSLGWFRQAPGQEREAVAAIASMGGLTYYADSVKGRFTISRDNAKNTVTLQMNNLKPEDTAIYYCAAVRGYFMRLPSSHNFRYWGQGTQVTVSSRGR'  #@param {type:"string"}
    418 
--> 419 run_prediction(sequence)

20 frames
<ipython-input-1-374326cb1a27> in run_prediction(sequence, output_dir, INSTALL_LOG)
    400         predict_and_save(sequence, databases, output_dir, model_names = ['model_1'])
    401     else:
--> 402         predict_and_save(sequence, databases, output_dir)
    403 
    404     # Make a zip file of the predictions

<ipython-input-1-374326cb1a27> in predict_and_save(sequence, databases, output_dir, model_names)
    363     # Write out PDB files and predicted errors
    364     write_unrelaxed_pdbs(unrelaxed_proteins, pae_outputs, output_dir)
--> 365     write_best_pdb(plddts, unrelaxed_proteins, output_dir)
    366 
    367     print ('Structure prediction completed.')

<ipython-input-1-374326cb1a27> in write_best_pdb(plddts, unrelaxed_proteins, output_dir)
    279     # AMBER relax the best model
    280     print('Energy minimizing best structure with OpenMM')
--> 281     relaxed_pdb = energy_minimize_structure(unrelaxed_proteins[best_model_name])
    282 
    283     # Write out the prediction

<ipython-input-1-374326cb1a27> in energy_minimize_structure(pdb_model)
    293         exclude_residues=[],
    294         max_outer_iterations=20)
--> 295     relaxed_pdb, _, _ = amber_relaxer.process(prot=pdb_model)
    296     return relaxed_pdb
    297 

/usr/local/lib/python3.7/dist-packages/alphafold/relax/relax.py in process(self, prot)
     60         tolerance=self._tolerance, stiffness=self._stiffness,
     61         exclude_residues=self._exclude_residues,
---> 62         max_outer_iterations=self._max_outer_iterations)
     63     min_pos = out['pos']
     64     start_pos = out['posinit']

/usr/local/lib/python3.7/dist-packages/alphafold/relax/amber_minimize.py in run_pipeline(prot, stiffness, max_outer_iterations, place_hydrogens_every_iteration, max_iterations, tolerance, restraint_set, max_attempts, checks, exclude_residues)
    480     else:
    481       pdb_string = ret["min_pdb"]
--> 482     ret.update(get_violation_metrics(prot))
    483     ret.update({
    484         "num_exclusions": len(exclude_residues),

/usr/local/lib/python3.7/dist-packages/alphafold/relax/amber_minimize.py in get_violation_metrics(prot)
    354 def get_violation_metrics(prot: protein.Protein):
    355   """Computes violation and alignment metrics."""
--> 356   structural_violations, struct_metrics = find_violations(prot)
    357   violation_idx = np.flatnonzero(
    358       structural_violations["total_per_residue_violations_mask"])

/usr/local/lib/python3.7/dist-packages/alphafold/relax/amber_minimize.py in find_violations(prot_np)
    341       config=ml_collections.ConfigDict(
    342           {"violation_tolerance_factor": 12,  # Taken from model config.
--> 343            "clash_overlap_tolerance": 1.5,  # Taken from model config.
    344           }))
    345   violation_metrics = folding.compute_violation_metrics(

/usr/local/lib/python3.7/dist-packages/alphafold/model/folding.py in find_structural_violations(batch, atom14_pred_positions, config)
    765       residue_index=batch['residue_index'],
    766       overlap_tolerance_soft=config.clash_overlap_tolerance,
--> 767       overlap_tolerance_hard=config.clash_overlap_tolerance)
    768 
    769   # Compute all within-residue violations (clashes,

/usr/local/lib/python3.7/dist-packages/alphafold/model/all_atom.py in between_residue_clash_loss(atom14_pred_positions, atom14_atom_exists, atom14_atom_radius, residue_index, overlap_tolerance_soft, overlap_tolerance_hard)
    832   # shape (N, 14)
    833   per_atom_loss_sum = (jnp.sum(dists_to_low_error, axis=[0, 2]) +
--> 834                        jnp.sum(dists_to_low_error, axis=[1, 3]))
    835 
    836   # Compute the hard clash mask.

/usr/local/lib/python3.7/dist-packages/jax/_src/numpy/lax_numpy.py in sum(a, axis, dtype, out, keepdims, initial, where)
   2145                     bool_op=lax.bitwise_or, upcast_f16_for_computation=True,
   2146                     axis=axis, dtype=dtype, out=out, keepdims=keepdims,
-> 2147                     initial=initial, where_=where, parallel_reduce=lax.psum)
   2148 
   2149 @_wraps(np.prod, skip_params=['out'])

/usr/local/lib/python3.7/dist-packages/jax/_src/numpy/lax_numpy.py in _reduction(a, name, np_fun, op, init_val, has_identity, preproc, bool_op, upcast_f16_for_computation, axis, dtype, out, keepdims, initial, where_, parallel_reduce)
   2098     result = parallel_reduce(a, dims)
   2099   else:
-> 2100     result = lax.reduce(a, init_val, op, dims)
   2101   if initial is not None:
   2102     result = op(_reduction_init_val(a, initial), result)

/usr/local/lib/python3.7/dist-packages/jax/_src/lax/lax.py in reduce(operands, init_values, computation, dimensions)
   1274     # monoid reducers bypass the weak_type_rule, so we set it explicitly.
   1275     weak_type = dtypes.is_weakly_typed(*flat_operands) and dtypes.is_weakly_typed(*flat_init_values)
-> 1276     return _convert_element_type(monoid_reducer(*flat_operands, dimensions),
   1277                                  weak_type=weak_type)
   1278   else:

/usr/local/lib/python3.7/dist-packages/jax/_src/lax/lax.py in _reduce_sum(operand, axes)
   1346 
   1347 def _reduce_sum(operand: Array, axes: Sequence[int]) -> Array:
-> 1348   return reduce_sum_p.bind(operand, axes=tuple(axes))
   1349 
   1350 def _reduce_prod(operand: Array, axes: Sequence[int]) -> Array:

/usr/local/lib/python3.7/dist-packages/jax/core.py in bind(self, *args, **params)
    263         args, used_axis_names(self, params) if self._dispatch_on_params else None)
    264     tracers = map(top_trace.full_raise, args)
--> 265     out = top_trace.process_primitive(self, tracers, params)
    266     return map(full_lower, out) if self.multiple_results else full_lower(out)
    267 

/usr/local/lib/python3.7/dist-packages/jax/core.py in process_primitive(self, primitive, tracers, params)
    608 
    609   def process_primitive(self, primitive, tracers, params):
--> 610     return primitive.impl(*tracers, **params)
    611 
    612   def process_call(self, primitive, f, tracers, params):

/usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py in apply_primitive(prim, *args, **params)
    271 def apply_primitive(prim, *args, **params):
    272   """Impl rule that compiles and runs a single primitive 'prim' using XLA."""
--> 273   compiled_fun = xla_primitive_callable(prim, *unsafe_map(arg_spec, args), **params)
    274   return compiled_fun(*args)
    275 

/usr/local/lib/python3.7/dist-packages/jax/_src/util.py in wrapper(*args, **kwargs)
    184         return f(*args, **kwargs)
    185       else:
--> 186         return cached(config._trace_context(), *args, **kwargs)
    187 
    188     wrapper.cache_clear = cached.cache_clear

/usr/local/lib/python3.7/dist-packages/jax/_src/util.py in cached(_, *args, **kwargs)
    177     @functools.lru_cache(max_size)
    178     def cached(_, *args, **kwargs):
--> 179       return f(*args, **kwargs)
    180 
    181     @functools.wraps(f)

/usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py in xla_primitive_callable(prim, *arg_specs, **params)
    320       device_assignment=device and (device.id,))
    321   options.parameter_is_tupled_arguments = tuple_args
--> 322   compiled = backend_compile(backend, built_c, options)
    323   if nreps == 1:
    324     return partial(_execute_compiled_primitive, prim, compiled, handle_result)

/usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py in backend_compile(backend, built_c, options)
    383   # we use a separate function call to ensure that XLA compilation appears
    384   # separately in Python profiling results
--> 385   return backend.compile(built_c, compile_options=options)
    386 
    387 def _execute_compiled_primitive(prim, compiled, result_handler, *args):

RuntimeError: Unknown: an illegal memory access was encountered
in external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_asm_compiler.cc(57): 'cuLinkCreate(0, nullptr, nullptr, &link_state)'

Last edited 4 years ago by Tom Goddard (previous) (diff)

comment:3 by Tom Goddard, 4 years ago

One more Google Colab problem

P5) If internet connection drops Colab GUI is broken. My wifi dropped for 10 minutes. The Colab GUI showed the calculation with a red play button and no log updating was happening. Pressing that button updated the calculation log by 20 one Gbyte sequence blocks processed, so it was still running. But other parts of the Colab GUI did not recover. The elapsed time is frozen at 15 minutes and says job completed, even though it is still running. The memory and disk use indicator says Busy and View Resource menu shows not info. The GUI seems to have lost track of the job. I wonder if it would have terminated the job after some time if the browser connection is lost.

in reply to:  4 ; comment:4 by Tristan Croll, 4 years ago

Looks really promising! It would be great if models produced in this way provided some method to (a) identify them as an AlphaFold model, and (b) programmatically get the associated PAE matrix without the user having to worry about it. I can already do this for models fetched from the AlphaFold DB by parsing the UniProt ID from their mmCIF metadata (and use it to auto-adjust the weights of ISOLDE's distance restraints)... if it can be made similarly automatic for user-generated models that's one less piece of interface complexity to worry about.
________________________________
From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
Sent: 21 August 2021 04:20
Cc: chimera-programmers@cgl.ucsf.edu <chimera-programmers@cgl.ucsf.edu>; goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>; meng@cgl.ucsf.edu <meng@cgl.ucsf.edu>; scooter@cgl.ucsf.edu <scooter@cgl.ucsf.edu>; tef@cgl.ucsf.edu <tef@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk>
Subject: Re: [ChimeraX] #5043: Add an AlphaFold structure prediction tool

#5043: Add an AlphaFold structure prediction tool
-----------------------------------------+-------------------------
          Reporter:  Tom Goddard         |      Owner:  Tom Goddard
              Type:  enhancement         |     Status:  assigned
          Priority:  moderate            |  Milestone:
         Component:  Structure Analysis  |    Version:
        Resolution:                      |   Keywords:
        Blocked By:                      |   Blocking:
Notify when closed:                      |   Platform:  all
           Project:  ChimeraX            |
-----------------------------------------+-------------------------

Comment (by Tom Goddard):

 One more Google Colab problem

 P5) If internet connection drops Colab GUI is broken.  My wifi dropped for
 10 minutes.  The Colab GUI showed the calculation with a red play button
 and no log updating was happening.  Pressing that button updated the
 calculation log by 20 one Gbyte sequence blocks processed, so it was still
 running.  But other parts of the Colab GUI did not recover.  The elapsed
 time is frozen at 15 minutes and says job completed, even though it is
 still running.  The memory and disk use indicator says Busy and View
 Resource menu shows not info.  The GUI seems to have lost track of the
 job.  I wonder if it would have terminated the job after some time if the
 browser connection is lost.

--
Ticket URL: <https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/5043#comment:3>
ChimeraX <https://www.rbvi.ucsf.edu/chimerax/>
ChimeraX Issue Tracker

comment:5 by Tom Goddard, 4 years ago

Tristan mentions an AlphaFold / RoseTTAfold implementation using MMseqs2 that is much faster than Jackhmmer.

Begin forwarded message:

From: Tristan Croll
Subject: Re: AlphaFold GUI and commands
Date: August 24, 2021 at 9:51:06 AM PDT
To: Tom Goddard , Elaine Meng
Cc: Chimera Staff <chimera-staff@…>

Hi Tom,

You might find it worthwhile to take a look at https://www.biorxiv.org/content/10.1101/2021.08.15.456425v1. In particular, it claims that using MMseqs2 for the multiple sequence alignment gives a more accurate prediction while also being about 16 times faster than jackhmmer.

Best regards,
Tristan

comment:6 by Tom Goddard, 4 years ago

Test comment

comment:7 by Tom Goddard, 4 years ago

AlphaFold failed fetching databases on 3nos chain A during niaid meeting demo two times. First it gave Content Unavailable when alphafold jackhammer was trying to fetch chunks of uniref90 sequence database. Then I ran again and it read the first 4 chunks and then gave ContentTooShortError, traceback below. It appears Google deleted and replaced all the sequence database files, so the first failure was because a chunk file was missing and the second because only half a file had been written. It is surprising that they did not stage an update in another directory and switch it over so there would be no failures -- but I guess it is not a money maker so they don't care much.

Should put code in my colab script to catch sequence database fetch errors and give a message that sequence data was not available instead of a traceback.

Sequence length 427
Have Colab GPU runtime
Finding fastest mirror for sequence databases using united states
Searching sequence databases (147 Gbytes).
Search will take 29 minutes or more.
Searching uniref90 sequence database, 59 Gbytes
 1 2 3 4
---------------------------------------------------------------------------
ContentTooShortError                      Traceback (most recent call last)
<ipython-input-2-89adc2f6833b> in <module>()
    429 #sequence = 'QVQLVESGGGSVQAGGSLRLSCTASGGSEYSYSTFSLGWFRQAPGQEREAVAAIASMGGLTYYADSVKGRFTISRDNAKNTVTLQMNNLKPEDTAIYYCAAVRGYFMRLPSSHNFRYWGQGTQVTVSSRGR'  #@param {type:"string"}
    430 
--> 431 run_prediction(sequence)

7 frames
<ipython-input-2-89adc2f6833b> in run_prediction(sequence, output_dir, INSTALL_LOG)
    412         predict_and_save(sequence, databases, output_dir, model_names = ['model_1'])
    413     else:
--> 414         predict_and_save(sequence, databases, output_dir)
    415 
    416     # Make a zip file of the predictions

<ipython-input-2-89adc2f6833b> in predict_and_save(sequence, databases, output_dir, model_names)
    361     print ('Searching sequence databases (%d Gbytes).' % nchunks)
    362     print ('Search will take %d minutes or more.' % max(1,nchunks//5))
--> 363     dbs = jackhmmer_sequence_search(seq_file, databases)
    364 
    365     # Make multiple sequence alignment.

<ipython-input-2-89adc2f6833b> in jackhmmer_sequence_search(seq_file, databases, jackhmmer_binary_path)
    178             streaming_callback = progress_cb,
    179             z_value=db['z value'])
--> 180         dbs.append((db_name, jackhmmer_runner.query(seq_file), db['max hits']))
    181         print ('')
    182 

/usr/local/lib/python3.7/dist-packages/alphafold/data/tools/jackhmmer.py in query(self, input_fasta_path)
    187 
    188         # Run Jackhmmer with the chunk
--> 189         future.result()
    190         chunked_output.append(
    191             self._query_chunk(input_fasta_path, db_local_chunk(i)))

/usr/lib/python3.7/concurrent/futures/_base.py in result(self, timeout)
    433                 raise CancelledError()
    434             elif self._state == FINISHED:
--> 435                 return self.__get_result()
    436             else:
    437                 raise TimeoutError()

/usr/lib/python3.7/concurrent/futures/_base.py in __get_result(self)
    382     def __get_result(self):
    383         if self._exception:
--> 384             raise self._exception
    385         else:
    386             return self._result

/usr/lib/python3.7/concurrent/futures/thread.py in run(self)
     55 
     56         try:
---> 57             result = self.fn(*self.args, **self.kwargs)
     58         except BaseException as exc:
     59             self.future.set_exception(exc)

/usr/lib/python3.7/urllib/request.py in urlretrieve(url, filename, reporthook, data)
    286         raise ContentTooShortError(
    287             "retrieval incomplete: got only %i out of %i bytes"
--> 288             % (read, size), result)
    289 
    290     return result

ContentTooShortError: <urlopen error retrieval incomplete: got only 83886080 out of 1073742897 bytes>

Last edited 4 years ago by Tom Goddard (previous) (diff)

comment:8 by Tom Goddard, 4 years ago

Cc: Tristan Croll chimera-programmers Scooter Morris Tom Ferrin Elaine Meng removed

Remove CC recipients to avoid spamming them with technical details.

comment:9 by Tom Goddard, 4 years ago

Trying to get 1000 amino acid sequence prediction. Failed.

Long sequence 7eir failed about 2 hours 50 minutes in almost at the end with CUDA_ERROR_ILLEGAL_ADDRESS.
Python3 process was showing 9 Gbytes resident, 34 Gbytes virtual, so maybe it was
out of memory. Try again with high memory runtime.

Sequence length 1021
Have Colab GPU runtime
Finding fastest mirror for sequence databases using united states
Searching sequence databases (147 Gbytes).
Search will take 29 minutes or more.
Searching uniref90 sequence database, 59 Gbytes
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
Searching smallbfd sequence database, 17 Gbytes
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Searching mgnify sequence database, 71 Gbytes
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
 61 62 63 64 65 66 67 68 69 70 71
Computing multiple sequence alignment
1735 similar sequences found (1180 uniref90, 55 smallbfd, 500 mgnify)
Computing structures using 5 AlphaFold parameter sets:
 model_1 model_2 model_3 model_4
---------------------------------------------------------------------------
UnfilteredStackTrace                      Traceback (most recent call last)
<ipython-input-4-e2d1039274c0> in <module>()
    430 
--> 431 run_prediction(sequence)

17 frames
<ipython-input-4-e2d1039274c0> in run_prediction(sequence, output_dir, INSTALL_LOG)
    413     else:
--> 414         predict_and_save(sequence, databases, output_dir)
    415 

<ipython-input-4-e2d1039274c0> in predict_and_save(sequence, databases, output_dir, model_names)
    373     # Predict structures
--> 374     unrelaxed_proteins, plddts, pae_outputs =         predict_structure(sequence, msas, deletion_matrices, model_names)
    375 

<ipython-input-4-e2d1039274c0> in predict_structure(sequence, msas, deletion_matrices, model_names)
    250         processed_feature_dict = model_runner.process_features(feature_dict, random_seed=0)
--> 251         prediction_result = model_runner.predict(processed_feature_dict)
    252 

/usr/local/lib/python3.7/dist-packages/alphafold/model/model.py in predict(self, feat)
    132                  tree.map_structure(lambda x: x.shape, feat))
--> 133     result = self.apply(self.params, jax.random.PRNGKey(0), feat)
    134     # This block is to ensure benchmark timings are accurate. Some blocking is

/usr/local/lib/python3.7/dist-packages/jax/_src/traceback_util.py in reraise_with_filtered_traceback(*args, **kwargs)
    161     try:
--> 162       return fun(*args, **kwargs)
    163     except Exception as e:

/usr/local/lib/python3.7/dist-packages/jax/_src/api.py in cache_miss(*args, **kwargs)
    407         device=device, backend=backend, name=flat_fun.__name__,
--> 408         donated_invars=donated_invars, inline=inline)
    409     out_pytree_def = out_tree()

/usr/local/lib/python3.7/dist-packages/jax/core.py in bind(self, fun, *args, **params)
   1613   def bind(self, fun, *args, **params):
-> 1614     return call_bind(self, fun, *args, **params)
   1615 

/usr/local/lib/python3.7/dist-packages/jax/core.py in call_bind(primitive, fun, *args, **params)
   1604   tracers = map(top_trace.full_raise, args)
-> 1605   outs = primitive.process(top_trace, fun, tracers, params)
   1606   return map(full_lower, apply_todos(env_trace_todo(), outs))

/usr/local/lib/python3.7/dist-packages/jax/core.py in process(self, trace, fun, tracers, params)
   1616   def process(self, trace, fun, tracers, params):
-> 1617     return trace.process_call(self, fun, tracers, params)
   1618 

/usr/local/lib/python3.7/dist-packages/jax/core.py in process_call(self, primitive, f, tracers, params)
    612   def process_call(self, primitive, f, tracers, params):
--> 613     return primitive.impl(f, *tracers, **params)
    614   process_map = process_call

/usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py in _xla_call_impl(***failed resolving arguments***)
    621   try:
--> 622     out = compiled_fun(*args)
    623   except FloatingPointError:

/usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py in _execute_compiled(compiled, avals, handlers, kept_var_idx, *args)
    911           if x is not token and i in kept_var_idx))
--> 912   out_bufs = compiled.execute(input_bufs)
    913   check_special(xla_call_p.name, out_bufs)

UnfilteredStackTrace: RuntimeError: Internal: Failed to launch CUDA kernel: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered

The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.

--------------------

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
<ipython-input-4-e2d1039274c0> in <module>()
    429 #sequence = 'QVQLVESGGGSVQAGGSLRLSCTASGGSEYSYSTFSLGWFRQAPGQEREAVAAIASMGGLTYYADSVKGRFTISRDNAKNTVTLQMNNLKPEDTAIYYCAAVRGYFMRLPSSHNFRYWGQGTQVTVSSRGR'  #@param {type:"string"}
    430 
--> 431 run_prediction(sequence)

<ipython-input-4-e2d1039274c0> in run_prediction(sequence, output_dir, INSTALL_LOG)
    412         predict_and_save(sequence, databases, output_dir, model_names = ['model_1'])
    413     else:
--> 414         predict_and_save(sequence, databases, output_dir)
    415 
    416     # Make a zip file of the predictions

<ipython-input-4-e2d1039274c0> in predict_and_save(sequence, databases, output_dir, model_names)
    372 
    373     # Predict structures
--> 374     unrelaxed_proteins, plddts, pae_outputs =         predict_structure(sequence, msas, deletion_matrices, model_names)
    375 
    376     # Write out PDB files and predicted errors

<ipython-input-4-e2d1039274c0> in predict_structure(sequence, msas, deletion_matrices, model_names)
    249         model_runner = model.RunModel(cfg, params)
    250         processed_feature_dict = model_runner.process_features(feature_dict, random_seed=0)
--> 251         prediction_result = model_runner.predict(processed_feature_dict)
    252 
    253         if 'predicted_aligned_error' in prediction_result:

/usr/local/lib/python3.7/dist-packages/alphafold/model/model.py in predict(self, feat)
    131     logging.info('Running predict with shape(feat) = %s',
    132                  tree.map_structure(lambda x: x.shape, feat))
--> 133     result = self.apply(self.params, jax.random.PRNGKey(0), feat)
    134     # This block is to ensure benchmark timings are accurate. Some blocking is
    135     # already happening when computing get_confidence_metrics, and this ensures

/usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py in _execute_compiled(compiled, avals, handlers, kept_var_idx, *args)
    910           for i, x in enumerate(args)
    911           if x is not token and i in kept_var_idx))
--> 912   out_bufs = compiled.execute(input_bufs)
    913   check_special(xla_call_p.name, out_bufs)
    914   return [handler(*bs) for handler, bs in zip(handlers, _partition_outputs(avals, out_bufs))]

RuntimeError: Internal: Failed to launch CUDA kernel: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered

comment:10 by Tom Goddard, 4 years ago

Rerunning 1021 amino acid prediction with High-RAM Google Colab runtime. Failed.

Rerunning, got to model_1 in under 55 minutes, much faster, with high-RAM (25 GB) runtime.
Possibly it was thrashing with the standard runtime (12 GB) in the sequence search phase.
At about 1 hour 55 minutes finished all 5 models, no error, started openmm minimization.
Top showed virtual memory use at 37 G, resident 7.5 G. But RAM icon showed only 3.9 GB of 25 GB used.
OpenMM minimization failed with illegal memory access in cuda compiler. Top reports memory use
at 0.013t (13 GB?) and virtual 43 GB. RAM icon reports 8 GB of 25 GB. Maybe OpenMM just blew up
sending atoms into deep space.

Sequence length 1021
Have Colab GPU runtime
Installing HMMER for computing sequence alignments
Installing AlphaFold
Installing OpenMM for structure energy minimization
Finding fastest mirror for sequence databases using united states
Searching sequence databases (147 Gbytes).
Search will take 29 minutes or more.
Searching uniref90 sequence database, 59 Gbytes
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
Searching smallbfd sequence database, 17 Gbytes
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Searching mgnify sequence database, 71 Gbytes
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
 61 62 63 64 65 66 67 68 69 70 71
Computing multiple sequence alignment
1735 similar sequences found (1180 uniref90, 55 smallbfd, 500 mgnify)
Computing structures using 5 AlphaFold parameter sets:
 model_1 model_2 model_3 model_4 model_5
Energy minimizing best structure with OpenMM
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-1-e2d1039274c0> in <module>()
    429 #sequence = 'QVQLVESGGGSVQAGGSLRLSCTASGGSEYSYSTFSLGWFRQAPGQEREAVAAIASMGGLTYYADSVKGRFTISRDNAKNTVTLQMNNLKPEDTAIYYCAAVRGYFMRLPSSHNFRYWGQGTQVTVSSRGR'  #@param {type:"string"}
    430 
--> 431 run_prediction(sequence)

20 frames
<ipython-input-1-e2d1039274c0> in run_prediction(sequence, output_dir, INSTALL_LOG)
    412         predict_and_save(sequence, databases, output_dir, model_names = ['model_1'])
    413     else:
--> 414         predict_and_save(sequence, databases, output_dir)
    415 
    416     # Make a zip file of the predictions

<ipython-input-1-e2d1039274c0> in predict_and_save(sequence, databases, output_dir, model_names)
    376     # Write out PDB files and predicted errors
    377     write_unrelaxed_pdbs(unrelaxed_proteins, pae_outputs, output_dir)
--> 378     write_best_pdb(plddts, unrelaxed_proteins, output_dir)
    379 
    380     print ('Structure prediction completed.')

<ipython-input-1-e2d1039274c0> in write_best_pdb(plddts, unrelaxed_proteins, output_dir)
    292     # AMBER relax the best model
    293     print('Energy minimizing best structure with OpenMM')
--> 294     relaxed_pdb = energy_minimize_structure(unrelaxed_proteins[best_model_name])
    295 
    296     # Write out the prediction

<ipython-input-1-e2d1039274c0> in energy_minimize_structure(pdb_model)
    306         exclude_residues=[],
    307         max_outer_iterations=20)
--> 308     relaxed_pdb, _, _ = amber_relaxer.process(prot=pdb_model)
    309     return relaxed_pdb
    310 

/usr/local/lib/python3.7/dist-packages/alphafold/relax/relax.py in process(self, prot)
     60         tolerance=self._tolerance, stiffness=self._stiffness,
     61         exclude_residues=self._exclude_residues,
---> 62         max_outer_iterations=self._max_outer_iterations)
     63     min_pos = out['pos']
     64     start_pos = out['posinit']

/usr/local/lib/python3.7/dist-packages/alphafold/relax/amber_minimize.py in run_pipeline(prot, stiffness, max_outer_iterations, place_hydrogens_every_iteration, max_iterations, tolerance, restraint_set, max_attempts, checks, exclude_residues)
    480     else:
    481       pdb_string = ret["min_pdb"]
--> 482     ret.update(get_violation_metrics(prot))
    483     ret.update({
    484         "num_exclusions": len(exclude_residues),

/usr/local/lib/python3.7/dist-packages/alphafold/relax/amber_minimize.py in get_violation_metrics(prot)
    354 def get_violation_metrics(prot: protein.Protein):
    355   """Computes violation and alignment metrics."""
--> 356   structural_violations, struct_metrics = find_violations(prot)
    357   violation_idx = np.flatnonzero(
    358       structural_violations["total_per_residue_violations_mask"])

/usr/local/lib/python3.7/dist-packages/alphafold/relax/amber_minimize.py in find_violations(prot_np)
    341       config=ml_collections.ConfigDict(
    342           {"violation_tolerance_factor": 12,  # Taken from model config.
--> 343            "clash_overlap_tolerance": 1.5,  # Taken from model config.
    344           }))
    345   violation_metrics = folding.compute_violation_metrics(

/usr/local/lib/python3.7/dist-packages/alphafold/model/folding.py in find_structural_violations(batch, atom14_pred_positions, config)
    765       residue_index=batch['residue_index'],
    766       overlap_tolerance_soft=config.clash_overlap_tolerance,
--> 767       overlap_tolerance_hard=config.clash_overlap_tolerance)
    768 
    769   # Compute all within-residue violations (clashes,

/usr/local/lib/python3.7/dist-packages/alphafold/model/all_atom.py in between_residue_clash_loss(atom14_pred_positions, atom14_atom_exists, atom14_atom_radius, residue_index, overlap_tolerance_soft, overlap_tolerance_hard)
    832   # shape (N, 14)
    833   per_atom_loss_sum = (jnp.sum(dists_to_low_error, axis=[0, 2]) +
--> 834                        jnp.sum(dists_to_low_error, axis=[1, 3]))
    835 
    836   # Compute the hard clash mask.

/usr/local/lib/python3.7/dist-packages/jax/_src/numpy/lax_numpy.py in sum(a, axis, dtype, out, keepdims, initial, where)
   2145                     bool_op=lax.bitwise_or, upcast_f16_for_computation=True,
   2146                     axis=axis, dtype=dtype, out=out, keepdims=keepdims,
-> 2147                     initial=initial, where_=where, parallel_reduce=lax.psum)
   2148 
   2149 @_wraps(np.prod, skip_params=['out'])

/usr/local/lib/python3.7/dist-packages/jax/_src/numpy/lax_numpy.py in _reduction(a, name, np_fun, op, init_val, has_identity, preproc, bool_op, upcast_f16_for_computation, axis, dtype, out, keepdims, initial, where_, parallel_reduce)
   2098     result = parallel_reduce(a, dims)
   2099   else:
-> 2100     result = lax.reduce(a, init_val, op, dims)
   2101   if initial is not None:
   2102     result = op(_reduction_init_val(a, initial), result)

/usr/local/lib/python3.7/dist-packages/jax/_src/lax/lax.py in reduce(operands, init_values, computation, dimensions)
   1274     # monoid reducers bypass the weak_type_rule, so we set it explicitly.
   1275     weak_type = dtypes.is_weakly_typed(*flat_operands) and dtypes.is_weakly_typed(*flat_init_values)
-> 1276     return _convert_element_type(monoid_reducer(*flat_operands, dimensions),
   1277                                  weak_type=weak_type)
   1278   else:

/usr/local/lib/python3.7/dist-packages/jax/_src/lax/lax.py in _reduce_sum(operand, axes)
   1346 
   1347 def _reduce_sum(operand: Array, axes: Sequence[int]) -> Array:
-> 1348   return reduce_sum_p.bind(operand, axes=tuple(axes))
   1349 
   1350 def _reduce_prod(operand: Array, axes: Sequence[int]) -> Array:

/usr/local/lib/python3.7/dist-packages/jax/core.py in bind(self, *args, **params)
    263         args, used_axis_names(self, params) if self._dispatch_on_params else None)
    264     tracers = map(top_trace.full_raise, args)
--> 265     out = top_trace.process_primitive(self, tracers, params)
    266     return map(full_lower, out) if self.multiple_results else full_lower(out)
    267 

/usr/local/lib/python3.7/dist-packages/jax/core.py in process_primitive(self, primitive, tracers, params)
    608 
    609   def process_primitive(self, primitive, tracers, params):
--> 610     return primitive.impl(*tracers, **params)
    611 
    612   def process_call(self, primitive, f, tracers, params):

/usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py in apply_primitive(prim, *args, **params)
    271 def apply_primitive(prim, *args, **params):
    272   """Impl rule that compiles and runs a single primitive 'prim' using XLA."""
--> 273   compiled_fun = xla_primitive_callable(prim, *unsafe_map(arg_spec, args), **params)
    274   return compiled_fun(*args)
    275 

/usr/local/lib/python3.7/dist-packages/jax/_src/util.py in wrapper(*args, **kwargs)
    184         return f(*args, **kwargs)
    185       else:
--> 186         return cached(config._trace_context(), *args, **kwargs)
    187 
    188     wrapper.cache_clear = cached.cache_clear

/usr/local/lib/python3.7/dist-packages/jax/_src/util.py in cached(_, *args, **kwargs)
    177     @functools.lru_cache(max_size)
    178     def cached(_, *args, **kwargs):
--> 179       return f(*args, **kwargs)
    180 
    181     @functools.wraps(f)

/usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py in xla_primitive_callable(prim, *arg_specs, **params)
    320       device_assignment=device and (device.id,))
    321   options.parameter_is_tupled_arguments = tuple_args
--> 322   compiled = backend_compile(backend, built_c, options)
    323   if nreps == 1:
    324     return partial(_execute_compiled_primitive, prim, compiled, handle_result)

/usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py in backend_compile(backend, built_c, options)
    383   # we use a separate function call to ensure that XLA compilation appears
    384   # separately in Python profiling results
--> 385   return backend.compile(built_c, compile_options=options)
    386 
    387 def _execute_compiled_primitive(prim, compiled, result_handler, *args):

RuntimeError: Unknown: an illegal memory access was encountered
in external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_asm_compiler.cc(57): 'cuLinkCreate(0, nullptr, nullptr, &link_state)'



Colab logs suggest setting gpu compilation to single threaded.
Filter
Timestamp	Level	Message
Aug 26, 2021, 9:19:13 PM	WARNING	2021-08-27 04:19:13.486258: E external/org_tensorflow/tensorflow/compiler/xla/service/gpu/gpu_compiler.cc:980] The CUDA linking API did not work. Please use XLA_FLAGS=--xla_gpu_force_compilation_parallelism=1 to bypass it, but expect to get longer compilation time due to the lack of multi-threading.
Aug 26, 2021, 9:19:04 PM	WARNING	tcmalloc: large alloc 2451824640 bytes == 0x556848c0a000 @ 0x7f83a2d391e7 0x7f839a1af46e 0x7f839a1ffc7b 0x7f839a1ffd18 0x7f839a2bbd79 0x7f839a2bee4c 0x7f839a3dde7f 0x7f839a3e3fb5 0x7f839a3e5e3d 0x7f839a3e7516 0x55671d910280 0x55671d90fe59 0x7f839a2c60db 0x55671d9f8e52 0x55671d97f834 0x55671d91065a 0x55671d97eb0e 0x55671d97dc35 0x55671d91073a 0x55671d97f93b 0x55671d97dc35 0x55671d91073a 0x55671d97f93b 0x55671d91065a 0x55671d97eb0e 0x55671d91065a 0x55671d97eb0e 0x55671d97dc35 0x55671d91073a 0x55671d97f93b 0x55671d97e235

comment:11 by Tom Goddard, 4 years ago

Tried 800 amino acid prediction. Failed with Standard memory runtime with out of memory when computing 4th model. Succeeded on High-RAM runtime. Here is failed run.

Trying 7egf_E 800 amino acid alphafold prediction. Crap using Europe mirror. Probably why
it has only handled 25 of 147 sequence chunks in 1 hour. Need to change the logic for
determining which mirror to use. Saw it choose Asia before too. Forced it to use united states
but still very slow, about 2 hours for first 100 Gbytes. GPU is P100 (2016 model, 16 GB), enthusiast
level $2000. After 3 hours, says runtime crashed after using all available ram showing model_4
being computed.

Sequence length 800
Have Colab GPU runtime
Finding fastest mirror for sequence databases using united states
Searching sequence databases (147 Gbytes).
Search will take 29 minutes or more.
Searching uniref90 sequence database, 59 Gbytes
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
Searching smallbfd sequence database, 17 Gbytes
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Searching mgnify sequence database, 71 Gbytes
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
 61 62 63 64 65 66 67 68 69 70 71
Computing multiple sequence alignment
20500 similar sequences found (10000 uniref90, 10000 smallbfd, 500 mgnify)
Computing structures using 5 AlphaFold parameter sets:
 model_1 model_2 model_3 model_4

Restarted with high-ram runtime. Completed successfully in 2 hours 20 minutes.

comment:12 by Tom Goddard, 4 years ago

To run AlphaFold on Wynton can make singularity images using definition files

https://github.com/hyoo/alphafold_singularity

comment:13 by Tom Goddard, 4 years ago

Checked AWS EC2 (elastic compute service) cost for GPU nodes. Cost is about $1 per hour with 8 vCPU, 1 Nvidia GPU, 16 GB VRAM, 32 GB RAM, 225 GB ssd disk (instance type g4dn.2xlarge). So AlphaFold structure predictions on AWS would cost about $1-2 per structure -- rather expensive.

comment:14 by Tom Goddard, 4 years ago

If I start a prediction and close the prediction window and quit ChimeraX and restart and press predict and then Cancel instead of Run Anyway on the colab warning about running a notebook not from Google, my previous prediction job is still running. That is pretty cool. Not sure if it would time out eventually. I am using Colab Pro.

comment:15 by Tom Goddard, 4 years ago

Ran AlphaFold from Windows in ChimeraX Sept 8, 2021 daily build. CUDA error computing model_3. Sequence length only 137. Might help to run each model calculation in a separate subprocess.

Sequence length 137
Have Colab GPU runtime
Installing HMMER for computing sequence alignments
Installing AlphaFold
Installing OpenMM for structure energy minimization
Finding fastest mirror for sequence databases using united states
Searching sequence databases (147 Gbytes).
Search will take 29 minutes or more.
Searching uniref90 sequence database, 59 Gbytes
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
Searching smallbfd sequence database, 17 Gbytes
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Searching mgnify sequence database, 71 Gbytes
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
 61 62 63 64 65 66 67 68 69 70 71
Computing multiple sequence alignment
10220 similar sequences found (10000 uniref90, 74 smallbfd, 146 mgnify)
Computing structures using 5 AlphaFold parameter sets:
 model_1 model_2 model_3
---------------------------------------------------------------------------
UnfilteredStackTrace                      Traceback (most recent call last)
<ipython-input-1-042e9fd225a4> in <module>()
    430 
--> 431 run_prediction(sequence)

17 frames
<ipython-input-1-042e9fd225a4> in run_prediction(sequence, output_dir, INSTALL_LOG)
    413     else:
--> 414         predict_and_save(sequence, databases, output_dir)
    415 

<ipython-input-1-042e9fd225a4> in predict_and_save(sequence, databases, output_dir, model_names)
    373     # Predict structures
--> 374     unrelaxed_proteins, plddts, pae_outputs =         predict_structure(sequence, msas, deletion_matrices, model_names)
    375 

<ipython-input-1-042e9fd225a4> in predict_structure(sequence, msas, deletion_matrices, model_names)
    250         processed_feature_dict = model_runner.process_features(feature_dict, random_seed=0)
--> 251         prediction_result = model_runner.predict(processed_feature_dict)
    252 

/usr/local/lib/python3.7/dist-packages/alphafold/model/model.py in predict(self, feat)
    132                  tree.map_structure(lambda x: x.shape, feat))
--> 133     result = self.apply(self.params, jax.random.PRNGKey(0), feat)
    134     # This block is to ensure benchmark timings are accurate. Some blocking is

/usr/local/lib/python3.7/dist-packages/jax/_src/traceback_util.py in reraise_with_filtered_traceback(*args, **kwargs)
    161     try:
--> 162       return fun(*args, **kwargs)
    163     except Exception as e:

/usr/local/lib/python3.7/dist-packages/jax/_src/api.py in cache_miss(*args, **kwargs)
    407         device=device, backend=backend, name=flat_fun.__name__,
--> 408         donated_invars=donated_invars, inline=inline)
    409     out_pytree_def = out_tree()

/usr/local/lib/python3.7/dist-packages/jax/core.py in bind(self, fun, *args, **params)
   1613   def bind(self, fun, *args, **params):
-> 1614     return call_bind(self, fun, *args, **params)
   1615 

/usr/local/lib/python3.7/dist-packages/jax/core.py in call_bind(primitive, fun, *args, **params)
   1604   tracers = map(top_trace.full_raise, args)
-> 1605   outs = primitive.process(top_trace, fun, tracers, params)
   1606   return map(full_lower, apply_todos(env_trace_todo(), outs))

/usr/local/lib/python3.7/dist-packages/jax/core.py in process(self, trace, fun, tracers, params)
   1616   def process(self, trace, fun, tracers, params):
-> 1617     return trace.process_call(self, fun, tracers, params)
   1618 

/usr/local/lib/python3.7/dist-packages/jax/core.py in process_call(self, primitive, f, tracers, params)
    612   def process_call(self, primitive, f, tracers, params):
--> 613     return primitive.impl(f, *tracers, **params)
    614   process_map = process_call

/usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py in _xla_call_impl(***failed resolving arguments***)
    621   try:
--> 622     out = compiled_fun(*args)
    623   except FloatingPointError:

/usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py in _execute_compiled(compiled, avals, handlers, kept_var_idx, *args)
    911           if x is not token and i in kept_var_idx))
--> 912   out_bufs = compiled.execute(input_bufs)
    913   check_special(xla_call_p.name, out_bufs)

UnfilteredStackTrace: RuntimeError: Internal: Failed to launch CUDA kernel: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered

The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.

--------------------

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
<ipython-input-1-042e9fd225a4> in <module>()
    429 #sequence = 'QVQLVESGGGSVQAGGSLRLSCTASGGSEYSYSTFSLGWFRQAPGQEREAVAAIASMGGLTYYADSVKGRFTISRDNAKNTVTLQMNNLKPEDTAIYYCAAVRGYFMRLPSSHNFRYWGQGTQVTVSSRGR'  #@param {type:"string"}
    430 
--> 431 run_prediction(sequence)

<ipython-input-1-042e9fd225a4> in run_prediction(sequence, output_dir, INSTALL_LOG)
    412         predict_and_save(sequence, databases, output_dir, model_names = ['model_1'])
    413     else:
--> 414         predict_and_save(sequence, databases, output_dir)
    415 
    416     # Make a zip file of the predictions

<ipython-input-1-042e9fd225a4> in predict_and_save(sequence, databases, output_dir, model_names)
    372 
    373     # Predict structures
--> 374     unrelaxed_proteins, plddts, pae_outputs =         predict_structure(sequence, msas, deletion_matrices, model_names)
    375 
    376     # Write out PDB files and predicted errors

<ipython-input-1-042e9fd225a4> in predict_structure(sequence, msas, deletion_matrices, model_names)
    249         model_runner = model.RunModel(cfg, params)
    250         processed_feature_dict = model_runner.process_features(feature_dict, random_seed=0)
--> 251         prediction_result = model_runner.predict(processed_feature_dict)
    252 
    253         if 'predicted_aligned_error' in prediction_result:

/usr/local/lib/python3.7/dist-packages/alphafold/model/model.py in predict(self, feat)
    131     logging.info('Running predict with shape(feat) = %s',
    132                  tree.map_structure(lambda x: x.shape, feat))
--> 133     result = self.apply(self.params, jax.random.PRNGKey(0), feat)
    134     # This block is to ensure benchmark timings are accurate. Some blocking is
    135     # already happening when computing get_confidence_metrics, and this ensures

/usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py in _execute_compiled(compiled, avals, handlers, kept_var_idx, *args)
    910           for i, x in enumerate(args)
    911           if x is not token and i in kept_var_idx))
--> 912   out_bufs = compiled.execute(input_bufs)
    913   check_special(xla_call_p.name, out_bufs)
    914   return [handler(*bs) for handler, bs in zip(handlers, _partition_outputs(avals, out_bufs))]

RuntimeError: Internal: Failed to launch CUDA kernel: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered

App log from Google Colab

Filter
Timestamp	Level	Message
Sep 8, 2021, 12:33:23 PM	WARNING	2021-09-08 19:33:23.788837: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2085] Execution of replica 0 failed: Internal: Failed to launch CUDA kernel: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Sep 8, 2021, 12:23:25 PM	WARNING	2021-09-08 19:23:25.270763: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 19734784 exceeds 10% of free system memory.
Sep 8, 2021, 12:23:25 PM	WARNING	2021-09-08 19:23:25.202852: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 122410048 exceeds 10% of free system memory.
Sep 8, 2021, 12:23:24 PM	WARNING	2021-09-08 19:23:24.779899: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 19734784 exceeds 10% of free system memory.
Sep 8, 2021, 12:23:24 PM	WARNING	2021-09-08 19:23:24.639914: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 122410048 exceeds 10% of free system memory.
Sep 8, 2021, 12:23:24 PM	WARNING	2021-09-08 19:23:24.635982: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 123212320 exceeds 10% of free system memory.
Sep 8, 2021, 11:51:13 AM	INFO	Adapting to protocol v5.1 for kernel fee088be-c797-4edb-a119-84b03a828045
Sep 8, 2021, 11:51:11 AM	INFO	Kernel started: fee088be-c797-4edb-a119-84b03a828045

comment:16 by Tom Goddard, 4 years ago

Revised colab script to save alignments and restart using saved alignments and models. This allows restarting if an error occurs without recomputing results that took a long time. Tried it on 984 initial residues of myomesin and it got through all 5 models in 4 hours but then disconnected probably because computer had been sleeping a few hours. Reconnected and restarted and it has been energy minimizing for 60 minutes so far.

May need to add max_iterations = 1000(?) to script to get minimization to complete. Currently it uses max_iterations = 0 which iterates until convergence.

Last edited 4 years ago by Tom Goddard (previous) (diff)

comment:17 by Tom Goddard, 4 years ago

Need to add plot of sequence coverage depth back (from original Google Colab script). The 984 amino acid myomesin alignments look like they only cover a small part of the sequence, looking at the alignment files by hand in an editor. Need to see coverage to understand reliability of model. Will need to add matplotlib to install.

comment:18 by Tom Goddard, 4 years ago

The AlphaFold prediction capability broke on November 2, 2021 because the AlphaFold github source updated to version v2.1.0 and changed the return value of parse_stockholm() which was used by the ChimeraX colab script. This was reported on ChimeraX users. I fixed it my changing the ChimeraX colab script to clone AlphaFold v2.0.1 so it will not automatically use the latest version.

From: "Moustafa, Ibrahim M. via ChimeraX-users" <chimerax-users@cgl.ucsf.edu>
Subject: [chimerax-users] alpha-fold error
Date: November 4, 2021 at 1:07:42 PM PDT
To: ChimeraX Users Help <chimerax-users@cgl.ucsf.edu>
Reply-To: "Moustafa, Ibrahim M." 

Dear all,

  I am trying to use alpha fold in ChimeraX but I get the following error during computing multiple sequence alignmenet step:

--------------------------------------------------------
 TypeError  
2 frames
<ipython-input-1-a242dc837c9a> in multiple_seq_align(dbs)
    247       for i, result in enumerate(db_results):
    248         from alphafold.data import parsers
--> 249         msa, deletion_matrix, target_names = parsers.parse_stockholm(result['sto'])
    250         e_values_dict = parsers.parse_e_values_from_tblout(result['tbl'])
    251         e_values = [e_values_dict[t.split('/')[0]] for t in target_names]

TypeError: cannot unpack non-iterable Msa object  
---------------------------------------------------------

  I followed the instructions of ChimeraX tutorial. I was able to use it successfully before, but not sure why I keep getting this error.

  Any idea what could be wrong?

I use ChimeraX version 1.3 (09-08-2021)

  Thanks for help
   Ibrahim
Note: See TracTickets for help on using tickets.