Opened 4 years ago
Last modified 4 years ago
#5043 assigned enhancement
Add an AlphaFold structure prediction tool
Reported by: | Tom Goddard | Owned by: | Tom Goddard |
---|---|---|---|
Priority: | moderate | Milestone: | |
Component: | Structure Analysis | Version: | |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Notify when closed: | Platform: | all | |
Project: | ChimeraX |
Description
Would like the user to be able to specify a sequence and compute a predicted structure using AlphaFold.
An example of a server that does this is the AlphaFold Google Colab
https://colab.research.google.com/github/deepmind/alphafold/blob/main/notebooks/AlphaFold.ipynb
which uses free virtual machines with GPUs provided by Google that run Python in a web browser.
There are several requirements needed to run AlphaFold that are difficult to meet on the researcher's desktop computer so the compute job needs to be run on a server. Some of the requirements are:
1) Nvidia GPU supporting CUDA is needed. I believe AlphaFold cannot run without this because it uses DeepMinds JAX library for just-in-time compilation to the GPU and derivative calculation.
2) Linux operating system. It appears that AlphaFold has not been run on Windows. Mac does not support Nvidia GPU.
3) Large sequence and structure databases, ~2.2 Tbytes ideally on a fast SSD drive. The entire sequence database will be scanned for creating a multiple sequence alignment many thousands of sequences deep. Smaller databases can be used. The above mentioned Google Colab server uses 150 Gbytes that is streamed from the web on every prediction.
4) Multi-hour computation for each structure prediction. A 500 amino acid test with Google Colab took about 2 hours. For a ChimeraX service there may be hundreds of jobs running per day, requiring several hundred GPU compute hours per day. This amount of compute resources will be expensive to maintain. Google Colab is free even for commercial use but service terms are subject to change, and there are no guarantees of available resources, even getting a virtual machine with GPU depends on availability. Google offers a $10 Colab Pro service that gives priority access to resources, but again with no guarantees. Users could be encouraged to use their own Colab Pro account if ChimeraX utilized Google Colab resources.
Change History (18)
comment:1 by , 4 years ago
comment:2 by , 4 years ago
More testing of AlphaFold reveals various usability problems with longer sequences than the 100 amino acid test sequences I was using.
P1) Jackhmmer sequence search has very long startup time for longer sequences. Tried MYOM1_HUMAN, 1685 amino acids and first 1 GB of 150 GB of sequence search had not completed in 30 minutes. Reducing to using first 1000 residues and it took 30 minutes to an hour before it searched the first 1 GB, but then proceeded at a few Gbytes per minute, similar speed to short 100 aa sequences.
P2) Runs out of memory. Ran out of memory after 2 hours on first 1000 amino acids of MYOM1_HUMAN. Had completed multiple sequence alignment and started AlphaFold run using tensorflow which allocated too much memory. There were ~130,000 sequences in the multiple alignment, about 10x more than in previous small protein runs I tried. I believe the memory use is proportional to sequence length squared and number of sequences in alignment (from reading the methods paper). Standard Google Colab virtual machine has 12 Gbytes. With Colab Pro I can increase it to 25 Gbytes. That also ran out of memory and log showed tensorflow allocating 5 0.5 Gbytes chunks, then two 12 Gbyte blocks. With Colab Pro+ ($50/month) can use 52 Gbytes memory. Users will only have free colab account. I added code to limit each of 3 databases (uniref90, sbfd, mgnify) to at most 10,000 unique sequence hits. That kept the memory use down to 5 Gbytes.
P3) CUDA compiler fails when energy minimizing structure in AlphaFold code get_violation_metrics(). OpenMM minimization iterates until there are no violations. Traceback for failure is below. Memory use was 9 Gbytes on a 25 Gbyte VM so does not seem like the problem. The colab log suggests setting an XLA_FLAGS environment variable to force single-thread CUDA compilations which it said will be slower. The tensor flow variable TF_FORCE_UNIFIED_MEMORY=1 set by our notebook might be a cause. Could be the CUDA driver on my Colab VM got into a bad state. Needs more testing to find a way to work around this. Got this error with first 1000 amino acids and first 500 of MYOM1_HUMAN.
P4) The colab notebook code sometimes chooses Asia mirror instead of United States which probably leads to very slow fetch of the 150 Gbytes of databases. I took this code from the Google AlphaFold notebook. It simply queries US, Asia and Europe mirrors and uses the first to deliver a 1 Gbyte chunk. May need to allow the user to force this choice instead.
Traceback for failed energy minimization (P3) with first 1000 amino acids of MYOM1_HUMAN:
RuntimeError Traceback (most recent call last) <ipython-input-1-374326cb1a27> in <module>() 417 #sequence = 'QVQLVESGGGSVQAGGSLRLSCTASGGSEYSYSTFSLGWFRQAPGQEREAVAAIASMGGLTYYADSVKGRFTISRDNAKNTVTLQMNNLKPEDTAIYYCAAVRGYFMRLPSSHNFRYWGQGTQVTVSSRGR' #@param {type:"string"} 418 --> 419 run_prediction(sequence) 20 frames <ipython-input-1-374326cb1a27> in run_prediction(sequence, output_dir, INSTALL_LOG) 400 predict_and_save(sequence, databases, output_dir, model_names = ['model_1']) 401 else: --> 402 predict_and_save(sequence, databases, output_dir) 403 404 # Make a zip file of the predictions <ipython-input-1-374326cb1a27> in predict_and_save(sequence, databases, output_dir, model_names) 363 # Write out PDB files and predicted errors 364 write_unrelaxed_pdbs(unrelaxed_proteins, pae_outputs, output_dir) --> 365 write_best_pdb(plddts, unrelaxed_proteins, output_dir) 366 367 print ('Structure prediction completed.') <ipython-input-1-374326cb1a27> in write_best_pdb(plddts, unrelaxed_proteins, output_dir) 279 # AMBER relax the best model 280 print('Energy minimizing best structure with OpenMM') --> 281 relaxed_pdb = energy_minimize_structure(unrelaxed_proteins[best_model_name]) 282 283 # Write out the prediction <ipython-input-1-374326cb1a27> in energy_minimize_structure(pdb_model) 293 exclude_residues=[], 294 max_outer_iterations=20) --> 295 relaxed_pdb, _, _ = amber_relaxer.process(prot=pdb_model) 296 return relaxed_pdb 297 /usr/local/lib/python3.7/dist-packages/alphafold/relax/relax.py in process(self, prot) 60 tolerance=self._tolerance, stiffness=self._stiffness, 61 exclude_residues=self._exclude_residues, ---> 62 max_outer_iterations=self._max_outer_iterations) 63 min_pos = out['pos'] 64 start_pos = out['posinit'] /usr/local/lib/python3.7/dist-packages/alphafold/relax/amber_minimize.py in run_pipeline(prot, stiffness, max_outer_iterations, place_hydrogens_every_iteration, max_iterations, tolerance, restraint_set, max_attempts, checks, exclude_residues) 480 else: 481 pdb_string = ret["min_pdb"] --> 482 ret.update(get_violation_metrics(prot)) 483 ret.update({ 484 "num_exclusions": len(exclude_residues), /usr/local/lib/python3.7/dist-packages/alphafold/relax/amber_minimize.py in get_violation_metrics(prot) 354 def get_violation_metrics(prot: protein.Protein): 355 """Computes violation and alignment metrics.""" --> 356 structural_violations, struct_metrics = find_violations(prot) 357 violation_idx = np.flatnonzero( 358 structural_violations["total_per_residue_violations_mask"]) /usr/local/lib/python3.7/dist-packages/alphafold/relax/amber_minimize.py in find_violations(prot_np) 341 config=ml_collections.ConfigDict( 342 {"violation_tolerance_factor": 12, # Taken from model config. --> 343 "clash_overlap_tolerance": 1.5, # Taken from model config. 344 })) 345 violation_metrics = folding.compute_violation_metrics( /usr/local/lib/python3.7/dist-packages/alphafold/model/folding.py in find_structural_violations(batch, atom14_pred_positions, config) 765 residue_index=batch['residue_index'], 766 overlap_tolerance_soft=config.clash_overlap_tolerance, --> 767 overlap_tolerance_hard=config.clash_overlap_tolerance) 768 769 # Compute all within-residue violations (clashes, /usr/local/lib/python3.7/dist-packages/alphafold/model/all_atom.py in between_residue_clash_loss(atom14_pred_positions, atom14_atom_exists, atom14_atom_radius, residue_index, overlap_tolerance_soft, overlap_tolerance_hard) 832 # shape (N, 14) 833 per_atom_loss_sum = (jnp.sum(dists_to_low_error, axis=[0, 2]) + --> 834 jnp.sum(dists_to_low_error, axis=[1, 3])) 835 836 # Compute the hard clash mask. /usr/local/lib/python3.7/dist-packages/jax/_src/numpy/lax_numpy.py in sum(a, axis, dtype, out, keepdims, initial, where) 2145 bool_op=lax.bitwise_or, upcast_f16_for_computation=True, 2146 axis=axis, dtype=dtype, out=out, keepdims=keepdims, -> 2147 initial=initial, where_=where, parallel_reduce=lax.psum) 2148 2149 @_wraps(np.prod, skip_params=['out']) /usr/local/lib/python3.7/dist-packages/jax/_src/numpy/lax_numpy.py in _reduction(a, name, np_fun, op, init_val, has_identity, preproc, bool_op, upcast_f16_for_computation, axis, dtype, out, keepdims, initial, where_, parallel_reduce) 2098 result = parallel_reduce(a, dims) 2099 else: -> 2100 result = lax.reduce(a, init_val, op, dims) 2101 if initial is not None: 2102 result = op(_reduction_init_val(a, initial), result) /usr/local/lib/python3.7/dist-packages/jax/_src/lax/lax.py in reduce(operands, init_values, computation, dimensions) 1274 # monoid reducers bypass the weak_type_rule, so we set it explicitly. 1275 weak_type = dtypes.is_weakly_typed(*flat_operands) and dtypes.is_weakly_typed(*flat_init_values) -> 1276 return _convert_element_type(monoid_reducer(*flat_operands, dimensions), 1277 weak_type=weak_type) 1278 else: /usr/local/lib/python3.7/dist-packages/jax/_src/lax/lax.py in _reduce_sum(operand, axes) 1346 1347 def _reduce_sum(operand: Array, axes: Sequence[int]) -> Array: -> 1348 return reduce_sum_p.bind(operand, axes=tuple(axes)) 1349 1350 def _reduce_prod(operand: Array, axes: Sequence[int]) -> Array: /usr/local/lib/python3.7/dist-packages/jax/core.py in bind(self, *args, **params) 263 args, used_axis_names(self, params) if self._dispatch_on_params else None) 264 tracers = map(top_trace.full_raise, args) --> 265 out = top_trace.process_primitive(self, tracers, params) 266 return map(full_lower, out) if self.multiple_results else full_lower(out) 267 /usr/local/lib/python3.7/dist-packages/jax/core.py in process_primitive(self, primitive, tracers, params) 608 609 def process_primitive(self, primitive, tracers, params): --> 610 return primitive.impl(*tracers, **params) 611 612 def process_call(self, primitive, f, tracers, params): /usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py in apply_primitive(prim, *args, **params) 271 def apply_primitive(prim, *args, **params): 272 """Impl rule that compiles and runs a single primitive 'prim' using XLA.""" --> 273 compiled_fun = xla_primitive_callable(prim, *unsafe_map(arg_spec, args), **params) 274 return compiled_fun(*args) 275 /usr/local/lib/python3.7/dist-packages/jax/_src/util.py in wrapper(*args, **kwargs) 184 return f(*args, **kwargs) 185 else: --> 186 return cached(config._trace_context(), *args, **kwargs) 187 188 wrapper.cache_clear = cached.cache_clear /usr/local/lib/python3.7/dist-packages/jax/_src/util.py in cached(_, *args, **kwargs) 177 @functools.lru_cache(max_size) 178 def cached(_, *args, **kwargs): --> 179 return f(*args, **kwargs) 180 181 @functools.wraps(f) /usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py in xla_primitive_callable(prim, *arg_specs, **params) 320 device_assignment=device and (device.id,)) 321 options.parameter_is_tupled_arguments = tuple_args --> 322 compiled = backend_compile(backend, built_c, options) 323 if nreps == 1: 324 return partial(_execute_compiled_primitive, prim, compiled, handle_result) /usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py in backend_compile(backend, built_c, options) 383 # we use a separate function call to ensure that XLA compilation appears 384 # separately in Python profiling results --> 385 return backend.compile(built_c, compile_options=options) 386 387 def _execute_compiled_primitive(prim, compiled, result_handler, *args): RuntimeError: Unknown: an illegal memory access was encountered in external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_asm_compiler.cc(57): 'cuLinkCreate(0, nullptr, nullptr, &link_state)'
comment:3 by , 4 years ago
One more Google Colab problem
P5) If internet connection drops Colab GUI is broken. My wifi dropped for 10 minutes. The Colab GUI showed the calculation with a red play button and no log updating was happening. Pressing that button updated the calculation log by 20 one Gbyte sequence blocks processed, so it was still running. But other parts of the Colab GUI did not recover. The elapsed time is frozen at 15 minutes and says job completed, even though it is still running. The memory and disk use indicator says Busy and View Resource menu shows not info. The GUI seems to have lost track of the job. I wonder if it would have terminated the job after some time if the browser connection is lost.
follow-up: 4 comment:4 by , 4 years ago
Looks really promising! It would be great if models produced in this way provided some method to (a) identify them as an AlphaFold model, and (b) programmatically get the associated PAE matrix without the user having to worry about it. I can already do this for models fetched from the AlphaFold DB by parsing the UniProt ID from their mmCIF metadata (and use it to auto-adjust the weights of ISOLDE's distance restraints)... if it can be made similarly automatic for user-generated models that's one less piece of interface complexity to worry about. ________________________________ From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu> Sent: 21 August 2021 04:20 Cc: chimera-programmers@cgl.ucsf.edu <chimera-programmers@cgl.ucsf.edu>; goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>; meng@cgl.ucsf.edu <meng@cgl.ucsf.edu>; scooter@cgl.ucsf.edu <scooter@cgl.ucsf.edu>; tef@cgl.ucsf.edu <tef@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk> Subject: Re: [ChimeraX] #5043: Add an AlphaFold structure prediction tool #5043: Add an AlphaFold structure prediction tool -----------------------------------------+------------------------- Reporter: Tom Goddard | Owner: Tom Goddard Type: enhancement | Status: assigned Priority: moderate | Milestone: Component: Structure Analysis | Version: Resolution: | Keywords: Blocked By: | Blocking: Notify when closed: | Platform: all Project: ChimeraX | -----------------------------------------+------------------------- Comment (by Tom Goddard): One more Google Colab problem P5) If internet connection drops Colab GUI is broken. My wifi dropped for 10 minutes. The Colab GUI showed the calculation with a red play button and no log updating was happening. Pressing that button updated the calculation log by 20 one Gbyte sequence blocks processed, so it was still running. But other parts of the Colab GUI did not recover. The elapsed time is frozen at 15 minutes and says job completed, even though it is still running. The memory and disk use indicator says Busy and View Resource menu shows not info. The GUI seems to have lost track of the job. I wonder if it would have terminated the job after some time if the browser connection is lost. -- Ticket URL: <https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/5043#comment:3> ChimeraX <https://www.rbvi.ucsf.edu/chimerax/> ChimeraX Issue Tracker
comment:5 by , 4 years ago
Tristan mentions an AlphaFold / RoseTTAfold implementation using MMseqs2 that is much faster than Jackhmmer.
Begin forwarded message:
From: Tristan Croll
Subject: Re: AlphaFold GUI and commands
Date: August 24, 2021 at 9:51:06 AM PDT
To: Tom Goddard , Elaine Meng
Cc: Chimera Staff <chimera-staff@…>
Hi Tom,
You might find it worthwhile to take a look at https://www.biorxiv.org/content/10.1101/2021.08.15.456425v1. In particular, it claims that using MMseqs2 for the multiple sequence alignment gives a more accurate prediction while also being about 16 times faster than jackhmmer.
Best regards,
Tristan
comment:7 by , 4 years ago
AlphaFold failed fetching databases on 3nos chain A during niaid meeting demo two times. First it gave Content Unavailable when alphafold jackhammer was trying to fetch chunks of uniref90 sequence database. Then I ran again and it read the first 4 chunks and then gave ContentTooShortError, traceback below. It appears Google deleted and replaced all the sequence database files, so the first failure was because a chunk file was missing and the second because only half a file had been written. It is surprising that they did not stage an update in another directory and switch it over so there would be no failures -- but I guess it is not a money maker so they don't care much.
Should put code in my colab script to catch sequence database fetch errors and give a message that sequence data was not available instead of a traceback.
Sequence length 427 Have Colab GPU runtime Finding fastest mirror for sequence databases using united states Searching sequence databases (147 Gbytes). Search will take 29 minutes or more. Searching uniref90 sequence database, 59 Gbytes 1 2 3 4 --------------------------------------------------------------------------- ContentTooShortError Traceback (most recent call last) <ipython-input-2-89adc2f6833b> in <module>() 429 #sequence = 'QVQLVESGGGSVQAGGSLRLSCTASGGSEYSYSTFSLGWFRQAPGQEREAVAAIASMGGLTYYADSVKGRFTISRDNAKNTVTLQMNNLKPEDTAIYYCAAVRGYFMRLPSSHNFRYWGQGTQVTVSSRGR' #@param {type:"string"} 430 --> 431 run_prediction(sequence) 7 frames <ipython-input-2-89adc2f6833b> in run_prediction(sequence, output_dir, INSTALL_LOG) 412 predict_and_save(sequence, databases, output_dir, model_names = ['model_1']) 413 else: --> 414 predict_and_save(sequence, databases, output_dir) 415 416 # Make a zip file of the predictions <ipython-input-2-89adc2f6833b> in predict_and_save(sequence, databases, output_dir, model_names) 361 print ('Searching sequence databases (%d Gbytes).' % nchunks) 362 print ('Search will take %d minutes or more.' % max(1,nchunks//5)) --> 363 dbs = jackhmmer_sequence_search(seq_file, databases) 364 365 # Make multiple sequence alignment. <ipython-input-2-89adc2f6833b> in jackhmmer_sequence_search(seq_file, databases, jackhmmer_binary_path) 178 streaming_callback = progress_cb, 179 z_value=db['z value']) --> 180 dbs.append((db_name, jackhmmer_runner.query(seq_file), db['max hits'])) 181 print ('') 182 /usr/local/lib/python3.7/dist-packages/alphafold/data/tools/jackhmmer.py in query(self, input_fasta_path) 187 188 # Run Jackhmmer with the chunk --> 189 future.result() 190 chunked_output.append( 191 self._query_chunk(input_fasta_path, db_local_chunk(i))) /usr/lib/python3.7/concurrent/futures/_base.py in result(self, timeout) 433 raise CancelledError() 434 elif self._state == FINISHED: --> 435 return self.__get_result() 436 else: 437 raise TimeoutError() /usr/lib/python3.7/concurrent/futures/_base.py in __get_result(self) 382 def __get_result(self): 383 if self._exception: --> 384 raise self._exception 385 else: 386 return self._result /usr/lib/python3.7/concurrent/futures/thread.py in run(self) 55 56 try: ---> 57 result = self.fn(*self.args, **self.kwargs) 58 except BaseException as exc: 59 self.future.set_exception(exc) /usr/lib/python3.7/urllib/request.py in urlretrieve(url, filename, reporthook, data) 286 raise ContentTooShortError( 287 "retrieval incomplete: got only %i out of %i bytes" --> 288 % (read, size), result) 289 290 return result ContentTooShortError: <urlopen error retrieval incomplete: got only 83886080 out of 1073742897 bytes>
comment:8 by , 4 years ago
Cc: | removed |
---|
Remove CC recipients to avoid spamming them with technical details.
comment:9 by , 4 years ago
Trying to get 1000 amino acid sequence prediction. Failed.
Long sequence 7eir failed about 2 hours 50 minutes in almost at the end with CUDA_ERROR_ILLEGAL_ADDRESS.
Python3 process was showing 9 Gbytes resident, 34 Gbytes virtual, so maybe it was
out of memory. Try again with high memory runtime.
Sequence length 1021 Have Colab GPU runtime Finding fastest mirror for sequence databases using united states Searching sequence databases (147 Gbytes). Search will take 29 minutes or more. Searching uniref90 sequence database, 59 Gbytes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 Searching smallbfd sequence database, 17 Gbytes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Searching mgnify sequence database, 71 Gbytes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 Computing multiple sequence alignment 1735 similar sequences found (1180 uniref90, 55 smallbfd, 500 mgnify) Computing structures using 5 AlphaFold parameter sets: model_1 model_2 model_3 model_4 --------------------------------------------------------------------------- UnfilteredStackTrace Traceback (most recent call last) <ipython-input-4-e2d1039274c0> in <module>() 430 --> 431 run_prediction(sequence) 17 frames <ipython-input-4-e2d1039274c0> in run_prediction(sequence, output_dir, INSTALL_LOG) 413 else: --> 414 predict_and_save(sequence, databases, output_dir) 415 <ipython-input-4-e2d1039274c0> in predict_and_save(sequence, databases, output_dir, model_names) 373 # Predict structures --> 374 unrelaxed_proteins, plddts, pae_outputs = predict_structure(sequence, msas, deletion_matrices, model_names) 375 <ipython-input-4-e2d1039274c0> in predict_structure(sequence, msas, deletion_matrices, model_names) 250 processed_feature_dict = model_runner.process_features(feature_dict, random_seed=0) --> 251 prediction_result = model_runner.predict(processed_feature_dict) 252 /usr/local/lib/python3.7/dist-packages/alphafold/model/model.py in predict(self, feat) 132 tree.map_structure(lambda x: x.shape, feat)) --> 133 result = self.apply(self.params, jax.random.PRNGKey(0), feat) 134 # This block is to ensure benchmark timings are accurate. Some blocking is /usr/local/lib/python3.7/dist-packages/jax/_src/traceback_util.py in reraise_with_filtered_traceback(*args, **kwargs) 161 try: --> 162 return fun(*args, **kwargs) 163 except Exception as e: /usr/local/lib/python3.7/dist-packages/jax/_src/api.py in cache_miss(*args, **kwargs) 407 device=device, backend=backend, name=flat_fun.__name__, --> 408 donated_invars=donated_invars, inline=inline) 409 out_pytree_def = out_tree() /usr/local/lib/python3.7/dist-packages/jax/core.py in bind(self, fun, *args, **params) 1613 def bind(self, fun, *args, **params): -> 1614 return call_bind(self, fun, *args, **params) 1615 /usr/local/lib/python3.7/dist-packages/jax/core.py in call_bind(primitive, fun, *args, **params) 1604 tracers = map(top_trace.full_raise, args) -> 1605 outs = primitive.process(top_trace, fun, tracers, params) 1606 return map(full_lower, apply_todos(env_trace_todo(), outs)) /usr/local/lib/python3.7/dist-packages/jax/core.py in process(self, trace, fun, tracers, params) 1616 def process(self, trace, fun, tracers, params): -> 1617 return trace.process_call(self, fun, tracers, params) 1618 /usr/local/lib/python3.7/dist-packages/jax/core.py in process_call(self, primitive, f, tracers, params) 612 def process_call(self, primitive, f, tracers, params): --> 613 return primitive.impl(f, *tracers, **params) 614 process_map = process_call /usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py in _xla_call_impl(***failed resolving arguments***) 621 try: --> 622 out = compiled_fun(*args) 623 except FloatingPointError: /usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py in _execute_compiled(compiled, avals, handlers, kept_var_idx, *args) 911 if x is not token and i in kept_var_idx)) --> 912 out_bufs = compiled.execute(input_bufs) 913 check_special(xla_call_p.name, out_bufs) UnfilteredStackTrace: RuntimeError: Internal: Failed to launch CUDA kernel: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered The stack trace below excludes JAX-internal frames. The preceding is the original exception that occurred, unmodified. -------------------- The above exception was the direct cause of the following exception: RuntimeError Traceback (most recent call last) <ipython-input-4-e2d1039274c0> in <module>() 429 #sequence = 'QVQLVESGGGSVQAGGSLRLSCTASGGSEYSYSTFSLGWFRQAPGQEREAVAAIASMGGLTYYADSVKGRFTISRDNAKNTVTLQMNNLKPEDTAIYYCAAVRGYFMRLPSSHNFRYWGQGTQVTVSSRGR' #@param {type:"string"} 430 --> 431 run_prediction(sequence) <ipython-input-4-e2d1039274c0> in run_prediction(sequence, output_dir, INSTALL_LOG) 412 predict_and_save(sequence, databases, output_dir, model_names = ['model_1']) 413 else: --> 414 predict_and_save(sequence, databases, output_dir) 415 416 # Make a zip file of the predictions <ipython-input-4-e2d1039274c0> in predict_and_save(sequence, databases, output_dir, model_names) 372 373 # Predict structures --> 374 unrelaxed_proteins, plddts, pae_outputs = predict_structure(sequence, msas, deletion_matrices, model_names) 375 376 # Write out PDB files and predicted errors <ipython-input-4-e2d1039274c0> in predict_structure(sequence, msas, deletion_matrices, model_names) 249 model_runner = model.RunModel(cfg, params) 250 processed_feature_dict = model_runner.process_features(feature_dict, random_seed=0) --> 251 prediction_result = model_runner.predict(processed_feature_dict) 252 253 if 'predicted_aligned_error' in prediction_result: /usr/local/lib/python3.7/dist-packages/alphafold/model/model.py in predict(self, feat) 131 logging.info('Running predict with shape(feat) = %s', 132 tree.map_structure(lambda x: x.shape, feat)) --> 133 result = self.apply(self.params, jax.random.PRNGKey(0), feat) 134 # This block is to ensure benchmark timings are accurate. Some blocking is 135 # already happening when computing get_confidence_metrics, and this ensures /usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py in _execute_compiled(compiled, avals, handlers, kept_var_idx, *args) 910 for i, x in enumerate(args) 911 if x is not token and i in kept_var_idx)) --> 912 out_bufs = compiled.execute(input_bufs) 913 check_special(xla_call_p.name, out_bufs) 914 return [handler(*bs) for handler, bs in zip(handlers, _partition_outputs(avals, out_bufs))] RuntimeError: Internal: Failed to launch CUDA kernel: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
comment:10 by , 4 years ago
Rerunning 1021 amino acid prediction with High-RAM Google Colab runtime. Failed.
Rerunning, got to model_1 in under 55 minutes, much faster, with high-RAM (25 GB) runtime.
Possibly it was thrashing with the standard runtime (12 GB) in the sequence search phase.
At about 1 hour 55 minutes finished all 5 models, no error, started openmm minimization.
Top showed virtual memory use at 37 G, resident 7.5 G. But RAM icon showed only 3.9 GB of 25 GB used.
OpenMM minimization failed with illegal memory access in cuda compiler. Top reports memory use
at 0.013t (13 GB?) and virtual 43 GB. RAM icon reports 8 GB of 25 GB. Maybe OpenMM just blew up
sending atoms into deep space.
Sequence length 1021 Have Colab GPU runtime Installing HMMER for computing sequence alignments Installing AlphaFold Installing OpenMM for structure energy minimization Finding fastest mirror for sequence databases using united states Searching sequence databases (147 Gbytes). Search will take 29 minutes or more. Searching uniref90 sequence database, 59 Gbytes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 Searching smallbfd sequence database, 17 Gbytes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Searching mgnify sequence database, 71 Gbytes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 Computing multiple sequence alignment 1735 similar sequences found (1180 uniref90, 55 smallbfd, 500 mgnify) Computing structures using 5 AlphaFold parameter sets: model_1 model_2 model_3 model_4 model_5 Energy minimizing best structure with OpenMM --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) <ipython-input-1-e2d1039274c0> in <module>() 429 #sequence = 'QVQLVESGGGSVQAGGSLRLSCTASGGSEYSYSTFSLGWFRQAPGQEREAVAAIASMGGLTYYADSVKGRFTISRDNAKNTVTLQMNNLKPEDTAIYYCAAVRGYFMRLPSSHNFRYWGQGTQVTVSSRGR' #@param {type:"string"} 430 --> 431 run_prediction(sequence) 20 frames <ipython-input-1-e2d1039274c0> in run_prediction(sequence, output_dir, INSTALL_LOG) 412 predict_and_save(sequence, databases, output_dir, model_names = ['model_1']) 413 else: --> 414 predict_and_save(sequence, databases, output_dir) 415 416 # Make a zip file of the predictions <ipython-input-1-e2d1039274c0> in predict_and_save(sequence, databases, output_dir, model_names) 376 # Write out PDB files and predicted errors 377 write_unrelaxed_pdbs(unrelaxed_proteins, pae_outputs, output_dir) --> 378 write_best_pdb(plddts, unrelaxed_proteins, output_dir) 379 380 print ('Structure prediction completed.') <ipython-input-1-e2d1039274c0> in write_best_pdb(plddts, unrelaxed_proteins, output_dir) 292 # AMBER relax the best model 293 print('Energy minimizing best structure with OpenMM') --> 294 relaxed_pdb = energy_minimize_structure(unrelaxed_proteins[best_model_name]) 295 296 # Write out the prediction <ipython-input-1-e2d1039274c0> in energy_minimize_structure(pdb_model) 306 exclude_residues=[], 307 max_outer_iterations=20) --> 308 relaxed_pdb, _, _ = amber_relaxer.process(prot=pdb_model) 309 return relaxed_pdb 310 /usr/local/lib/python3.7/dist-packages/alphafold/relax/relax.py in process(self, prot) 60 tolerance=self._tolerance, stiffness=self._stiffness, 61 exclude_residues=self._exclude_residues, ---> 62 max_outer_iterations=self._max_outer_iterations) 63 min_pos = out['pos'] 64 start_pos = out['posinit'] /usr/local/lib/python3.7/dist-packages/alphafold/relax/amber_minimize.py in run_pipeline(prot, stiffness, max_outer_iterations, place_hydrogens_every_iteration, max_iterations, tolerance, restraint_set, max_attempts, checks, exclude_residues) 480 else: 481 pdb_string = ret["min_pdb"] --> 482 ret.update(get_violation_metrics(prot)) 483 ret.update({ 484 "num_exclusions": len(exclude_residues), /usr/local/lib/python3.7/dist-packages/alphafold/relax/amber_minimize.py in get_violation_metrics(prot) 354 def get_violation_metrics(prot: protein.Protein): 355 """Computes violation and alignment metrics.""" --> 356 structural_violations, struct_metrics = find_violations(prot) 357 violation_idx = np.flatnonzero( 358 structural_violations["total_per_residue_violations_mask"]) /usr/local/lib/python3.7/dist-packages/alphafold/relax/amber_minimize.py in find_violations(prot_np) 341 config=ml_collections.ConfigDict( 342 {"violation_tolerance_factor": 12, # Taken from model config. --> 343 "clash_overlap_tolerance": 1.5, # Taken from model config. 344 })) 345 violation_metrics = folding.compute_violation_metrics( /usr/local/lib/python3.7/dist-packages/alphafold/model/folding.py in find_structural_violations(batch, atom14_pred_positions, config) 765 residue_index=batch['residue_index'], 766 overlap_tolerance_soft=config.clash_overlap_tolerance, --> 767 overlap_tolerance_hard=config.clash_overlap_tolerance) 768 769 # Compute all within-residue violations (clashes, /usr/local/lib/python3.7/dist-packages/alphafold/model/all_atom.py in between_residue_clash_loss(atom14_pred_positions, atom14_atom_exists, atom14_atom_radius, residue_index, overlap_tolerance_soft, overlap_tolerance_hard) 832 # shape (N, 14) 833 per_atom_loss_sum = (jnp.sum(dists_to_low_error, axis=[0, 2]) + --> 834 jnp.sum(dists_to_low_error, axis=[1, 3])) 835 836 # Compute the hard clash mask. /usr/local/lib/python3.7/dist-packages/jax/_src/numpy/lax_numpy.py in sum(a, axis, dtype, out, keepdims, initial, where) 2145 bool_op=lax.bitwise_or, upcast_f16_for_computation=True, 2146 axis=axis, dtype=dtype, out=out, keepdims=keepdims, -> 2147 initial=initial, where_=where, parallel_reduce=lax.psum) 2148 2149 @_wraps(np.prod, skip_params=['out']) /usr/local/lib/python3.7/dist-packages/jax/_src/numpy/lax_numpy.py in _reduction(a, name, np_fun, op, init_val, has_identity, preproc, bool_op, upcast_f16_for_computation, axis, dtype, out, keepdims, initial, where_, parallel_reduce) 2098 result = parallel_reduce(a, dims) 2099 else: -> 2100 result = lax.reduce(a, init_val, op, dims) 2101 if initial is not None: 2102 result = op(_reduction_init_val(a, initial), result) /usr/local/lib/python3.7/dist-packages/jax/_src/lax/lax.py in reduce(operands, init_values, computation, dimensions) 1274 # monoid reducers bypass the weak_type_rule, so we set it explicitly. 1275 weak_type = dtypes.is_weakly_typed(*flat_operands) and dtypes.is_weakly_typed(*flat_init_values) -> 1276 return _convert_element_type(monoid_reducer(*flat_operands, dimensions), 1277 weak_type=weak_type) 1278 else: /usr/local/lib/python3.7/dist-packages/jax/_src/lax/lax.py in _reduce_sum(operand, axes) 1346 1347 def _reduce_sum(operand: Array, axes: Sequence[int]) -> Array: -> 1348 return reduce_sum_p.bind(operand, axes=tuple(axes)) 1349 1350 def _reduce_prod(operand: Array, axes: Sequence[int]) -> Array: /usr/local/lib/python3.7/dist-packages/jax/core.py in bind(self, *args, **params) 263 args, used_axis_names(self, params) if self._dispatch_on_params else None) 264 tracers = map(top_trace.full_raise, args) --> 265 out = top_trace.process_primitive(self, tracers, params) 266 return map(full_lower, out) if self.multiple_results else full_lower(out) 267 /usr/local/lib/python3.7/dist-packages/jax/core.py in process_primitive(self, primitive, tracers, params) 608 609 def process_primitive(self, primitive, tracers, params): --> 610 return primitive.impl(*tracers, **params) 611 612 def process_call(self, primitive, f, tracers, params): /usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py in apply_primitive(prim, *args, **params) 271 def apply_primitive(prim, *args, **params): 272 """Impl rule that compiles and runs a single primitive 'prim' using XLA.""" --> 273 compiled_fun = xla_primitive_callable(prim, *unsafe_map(arg_spec, args), **params) 274 return compiled_fun(*args) 275 /usr/local/lib/python3.7/dist-packages/jax/_src/util.py in wrapper(*args, **kwargs) 184 return f(*args, **kwargs) 185 else: --> 186 return cached(config._trace_context(), *args, **kwargs) 187 188 wrapper.cache_clear = cached.cache_clear /usr/local/lib/python3.7/dist-packages/jax/_src/util.py in cached(_, *args, **kwargs) 177 @functools.lru_cache(max_size) 178 def cached(_, *args, **kwargs): --> 179 return f(*args, **kwargs) 180 181 @functools.wraps(f) /usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py in xla_primitive_callable(prim, *arg_specs, **params) 320 device_assignment=device and (device.id,)) 321 options.parameter_is_tupled_arguments = tuple_args --> 322 compiled = backend_compile(backend, built_c, options) 323 if nreps == 1: 324 return partial(_execute_compiled_primitive, prim, compiled, handle_result) /usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py in backend_compile(backend, built_c, options) 383 # we use a separate function call to ensure that XLA compilation appears 384 # separately in Python profiling results --> 385 return backend.compile(built_c, compile_options=options) 386 387 def _execute_compiled_primitive(prim, compiled, result_handler, *args): RuntimeError: Unknown: an illegal memory access was encountered in external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_asm_compiler.cc(57): 'cuLinkCreate(0, nullptr, nullptr, &link_state)' Colab logs suggest setting gpu compilation to single threaded. Filter Timestamp Level Message Aug 26, 2021, 9:19:13 PM WARNING 2021-08-27 04:19:13.486258: E external/org_tensorflow/tensorflow/compiler/xla/service/gpu/gpu_compiler.cc:980] The CUDA linking API did not work. Please use XLA_FLAGS=--xla_gpu_force_compilation_parallelism=1 to bypass it, but expect to get longer compilation time due to the lack of multi-threading. Aug 26, 2021, 9:19:04 PM WARNING tcmalloc: large alloc 2451824640 bytes == 0x556848c0a000 @ 0x7f83a2d391e7 0x7f839a1af46e 0x7f839a1ffc7b 0x7f839a1ffd18 0x7f839a2bbd79 0x7f839a2bee4c 0x7f839a3dde7f 0x7f839a3e3fb5 0x7f839a3e5e3d 0x7f839a3e7516 0x55671d910280 0x55671d90fe59 0x7f839a2c60db 0x55671d9f8e52 0x55671d97f834 0x55671d91065a 0x55671d97eb0e 0x55671d97dc35 0x55671d91073a 0x55671d97f93b 0x55671d97dc35 0x55671d91073a 0x55671d97f93b 0x55671d91065a 0x55671d97eb0e 0x55671d91065a 0x55671d97eb0e 0x55671d97dc35 0x55671d91073a 0x55671d97f93b 0x55671d97e235
comment:11 by , 4 years ago
Tried 800 amino acid prediction. Failed with Standard memory runtime with out of memory when computing 4th model. Succeeded on High-RAM runtime. Here is failed run.
Trying 7egf_E 800 amino acid alphafold prediction. Crap using Europe mirror. Probably why
it has only handled 25 of 147 sequence chunks in 1 hour. Need to change the logic for
determining which mirror to use. Saw it choose Asia before too. Forced it to use united states
but still very slow, about 2 hours for first 100 Gbytes. GPU is P100 (2016 model, 16 GB), enthusiast
level $2000. After 3 hours, says runtime crashed after using all available ram showing model_4
being computed.
Sequence length 800 Have Colab GPU runtime Finding fastest mirror for sequence databases using united states Searching sequence databases (147 Gbytes). Search will take 29 minutes or more. Searching uniref90 sequence database, 59 Gbytes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 Searching smallbfd sequence database, 17 Gbytes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Searching mgnify sequence database, 71 Gbytes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 Computing multiple sequence alignment 20500 similar sequences found (10000 uniref90, 10000 smallbfd, 500 mgnify) Computing structures using 5 AlphaFold parameter sets: model_1 model_2 model_3 model_4
Restarted with high-ram runtime. Completed successfully in 2 hours 20 minutes.
comment:12 by , 4 years ago
To run AlphaFold on Wynton can make singularity images using definition files
comment:13 by , 4 years ago
Checked AWS EC2 (elastic compute service) cost for GPU nodes. Cost is about $1 per hour with 8 vCPU, 1 Nvidia GPU, 16 GB VRAM, 32 GB RAM, 225 GB ssd disk (instance type g4dn.2xlarge). So AlphaFold structure predictions on AWS would cost about $1-2 per structure -- rather expensive.
comment:14 by , 4 years ago
If I start a prediction and close the prediction window and quit ChimeraX and restart and press predict and then Cancel instead of Run Anyway on the colab warning about running a notebook not from Google, my previous prediction job is still running. That is pretty cool. Not sure if it would time out eventually. I am using Colab Pro.
comment:15 by , 4 years ago
Ran AlphaFold from Windows in ChimeraX Sept 8, 2021 daily build. CUDA error computing model_3. Sequence length only 137. Might help to run each model calculation in a separate subprocess.
Sequence length 137 Have Colab GPU runtime Installing HMMER for computing sequence alignments Installing AlphaFold Installing OpenMM for structure energy minimization Finding fastest mirror for sequence databases using united states Searching sequence databases (147 Gbytes). Search will take 29 minutes or more. Searching uniref90 sequence database, 59 Gbytes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 Searching smallbfd sequence database, 17 Gbytes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Searching mgnify sequence database, 71 Gbytes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 Computing multiple sequence alignment 10220 similar sequences found (10000 uniref90, 74 smallbfd, 146 mgnify) Computing structures using 5 AlphaFold parameter sets: model_1 model_2 model_3 --------------------------------------------------------------------------- UnfilteredStackTrace Traceback (most recent call last) <ipython-input-1-042e9fd225a4> in <module>() 430 --> 431 run_prediction(sequence) 17 frames <ipython-input-1-042e9fd225a4> in run_prediction(sequence, output_dir, INSTALL_LOG) 413 else: --> 414 predict_and_save(sequence, databases, output_dir) 415 <ipython-input-1-042e9fd225a4> in predict_and_save(sequence, databases, output_dir, model_names) 373 # Predict structures --> 374 unrelaxed_proteins, plddts, pae_outputs = predict_structure(sequence, msas, deletion_matrices, model_names) 375 <ipython-input-1-042e9fd225a4> in predict_structure(sequence, msas, deletion_matrices, model_names) 250 processed_feature_dict = model_runner.process_features(feature_dict, random_seed=0) --> 251 prediction_result = model_runner.predict(processed_feature_dict) 252 /usr/local/lib/python3.7/dist-packages/alphafold/model/model.py in predict(self, feat) 132 tree.map_structure(lambda x: x.shape, feat)) --> 133 result = self.apply(self.params, jax.random.PRNGKey(0), feat) 134 # This block is to ensure benchmark timings are accurate. Some blocking is /usr/local/lib/python3.7/dist-packages/jax/_src/traceback_util.py in reraise_with_filtered_traceback(*args, **kwargs) 161 try: --> 162 return fun(*args, **kwargs) 163 except Exception as e: /usr/local/lib/python3.7/dist-packages/jax/_src/api.py in cache_miss(*args, **kwargs) 407 device=device, backend=backend, name=flat_fun.__name__, --> 408 donated_invars=donated_invars, inline=inline) 409 out_pytree_def = out_tree() /usr/local/lib/python3.7/dist-packages/jax/core.py in bind(self, fun, *args, **params) 1613 def bind(self, fun, *args, **params): -> 1614 return call_bind(self, fun, *args, **params) 1615 /usr/local/lib/python3.7/dist-packages/jax/core.py in call_bind(primitive, fun, *args, **params) 1604 tracers = map(top_trace.full_raise, args) -> 1605 outs = primitive.process(top_trace, fun, tracers, params) 1606 return map(full_lower, apply_todos(env_trace_todo(), outs)) /usr/local/lib/python3.7/dist-packages/jax/core.py in process(self, trace, fun, tracers, params) 1616 def process(self, trace, fun, tracers, params): -> 1617 return trace.process_call(self, fun, tracers, params) 1618 /usr/local/lib/python3.7/dist-packages/jax/core.py in process_call(self, primitive, f, tracers, params) 612 def process_call(self, primitive, f, tracers, params): --> 613 return primitive.impl(f, *tracers, **params) 614 process_map = process_call /usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py in _xla_call_impl(***failed resolving arguments***) 621 try: --> 622 out = compiled_fun(*args) 623 except FloatingPointError: /usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py in _execute_compiled(compiled, avals, handlers, kept_var_idx, *args) 911 if x is not token and i in kept_var_idx)) --> 912 out_bufs = compiled.execute(input_bufs) 913 check_special(xla_call_p.name, out_bufs) UnfilteredStackTrace: RuntimeError: Internal: Failed to launch CUDA kernel: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered The stack trace below excludes JAX-internal frames. The preceding is the original exception that occurred, unmodified. -------------------- The above exception was the direct cause of the following exception: RuntimeError Traceback (most recent call last) <ipython-input-1-042e9fd225a4> in <module>() 429 #sequence = 'QVQLVESGGGSVQAGGSLRLSCTASGGSEYSYSTFSLGWFRQAPGQEREAVAAIASMGGLTYYADSVKGRFTISRDNAKNTVTLQMNNLKPEDTAIYYCAAVRGYFMRLPSSHNFRYWGQGTQVTVSSRGR' #@param {type:"string"} 430 --> 431 run_prediction(sequence) <ipython-input-1-042e9fd225a4> in run_prediction(sequence, output_dir, INSTALL_LOG) 412 predict_and_save(sequence, databases, output_dir, model_names = ['model_1']) 413 else: --> 414 predict_and_save(sequence, databases, output_dir) 415 416 # Make a zip file of the predictions <ipython-input-1-042e9fd225a4> in predict_and_save(sequence, databases, output_dir, model_names) 372 373 # Predict structures --> 374 unrelaxed_proteins, plddts, pae_outputs = predict_structure(sequence, msas, deletion_matrices, model_names) 375 376 # Write out PDB files and predicted errors <ipython-input-1-042e9fd225a4> in predict_structure(sequence, msas, deletion_matrices, model_names) 249 model_runner = model.RunModel(cfg, params) 250 processed_feature_dict = model_runner.process_features(feature_dict, random_seed=0) --> 251 prediction_result = model_runner.predict(processed_feature_dict) 252 253 if 'predicted_aligned_error' in prediction_result: /usr/local/lib/python3.7/dist-packages/alphafold/model/model.py in predict(self, feat) 131 logging.info('Running predict with shape(feat) = %s', 132 tree.map_structure(lambda x: x.shape, feat)) --> 133 result = self.apply(self.params, jax.random.PRNGKey(0), feat) 134 # This block is to ensure benchmark timings are accurate. Some blocking is 135 # already happening when computing get_confidence_metrics, and this ensures /usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py in _execute_compiled(compiled, avals, handlers, kept_var_idx, *args) 910 for i, x in enumerate(args) 911 if x is not token and i in kept_var_idx)) --> 912 out_bufs = compiled.execute(input_bufs) 913 check_special(xla_call_p.name, out_bufs) 914 return [handler(*bs) for handler, bs in zip(handlers, _partition_outputs(avals, out_bufs))] RuntimeError: Internal: Failed to launch CUDA kernel: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered App log from Google Colab Filter Timestamp Level Message Sep 8, 2021, 12:33:23 PM WARNING 2021-09-08 19:33:23.788837: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2085] Execution of replica 0 failed: Internal: Failed to launch CUDA kernel: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Sep 8, 2021, 12:23:25 PM WARNING 2021-09-08 19:23:25.270763: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 19734784 exceeds 10% of free system memory. Sep 8, 2021, 12:23:25 PM WARNING 2021-09-08 19:23:25.202852: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 122410048 exceeds 10% of free system memory. Sep 8, 2021, 12:23:24 PM WARNING 2021-09-08 19:23:24.779899: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 19734784 exceeds 10% of free system memory. Sep 8, 2021, 12:23:24 PM WARNING 2021-09-08 19:23:24.639914: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 122410048 exceeds 10% of free system memory. Sep 8, 2021, 12:23:24 PM WARNING 2021-09-08 19:23:24.635982: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 123212320 exceeds 10% of free system memory. Sep 8, 2021, 11:51:13 AM INFO Adapting to protocol v5.1 for kernel fee088be-c797-4edb-a119-84b03a828045 Sep 8, 2021, 11:51:11 AM INFO Kernel started: fee088be-c797-4edb-a119-84b03a828045
comment:16 by , 4 years ago
Revised colab script to save alignments and restart using saved alignments and models. This allows restarting if an error occurs without recomputing results that took a long time. Tried it on 984 initial residues of myomesin and it got through all 5 models in 4 hours but then disconnected probably because computer had been sleeping a few hours. Reconnected and restarted and it has been energy minimizing for 60 minutes so far.
May need to add max_iterations = 1000(?) to script to get minimization to complete. Currently it uses max_iterations = 0 which iterates until convergence.
comment:17 by , 4 years ago
Need to add plot of sequence coverage depth back (from original Google Colab script). The 984 amino acid myomesin alignments look like they only cover a small part of the sequence, looking at the alignment files by hand in an editor. Need to see coverage to understand reliability of model. Will need to add matplotlib to install.
comment:18 by , 4 years ago
The AlphaFold prediction capability broke on November 2, 2021 because the AlphaFold github source updated to version v2.1.0 and changed the return value of parse_stockholm() which was used by the ChimeraX colab script. This was reported on ChimeraX users. I fixed it my changing the ChimeraX colab script to clone AlphaFold v2.0.1 so it will not automatically use the latest version.
From: "Moustafa, Ibrahim M. via ChimeraX-users" <chimerax-users@cgl.ucsf.edu> Subject: [chimerax-users] alpha-fold error Date: November 4, 2021 at 1:07:42 PM PDT To: ChimeraX Users Help <chimerax-users@cgl.ucsf.edu> Reply-To: "Moustafa, Ibrahim M." Dear all, I am trying to use alpha fold in ChimeraX but I get the following error during computing multiple sequence alignmenet step: -------------------------------------------------------- TypeError 2 frames <ipython-input-1-a242dc837c9a> in multiple_seq_align(dbs) 247 for i, result in enumerate(db_results): 248 from alphafold.data import parsers --> 249 msa, deletion_matrix, target_names = parsers.parse_stockholm(result['sto']) 250 e_values_dict = parsers.parse_e_values_from_tblout(result['tbl']) 251 e_values = [e_values_dict[t.split('/')[0]] for t in target_names] TypeError: cannot unpack non-iterable Msa object --------------------------------------------------------- I followed the instructions of ChimeraX tutorial. I was able to use it successfully before, but not sure why I keep getting this error. Any idea what could be wrong? I use ChimeraX version 1.3 (09-08-2021) Thanks for help Ibrahim
I tried making an AlphaFold prediction tool in ChimeraX using a web viewer panel that shows Google Colab with the chosen sequence pasted into an AlphaFold notebook and run, and the prediction downloaded and displayed. Tested with nanobody structure 3dwt chain A, about 1 hour to predict a structure. It works pretty nicely. Here are some of the drawbacks.
D1) Requires a standard Google account (used for Drive / Calendar / Colab...) so the user can use Google Colab. Most users will already have this.
D2) Requires logging into the Google account from the ChimeraX AlphaFold browser. This only has to be done one time -- the login cookies are remembered.
D3) Have to click "Run Anyway" on a popup that says the interactive python notebook does not come from Google, it comes from GitHub and may not be safe. Has to be done each ChimeraX session that uses AlphaFold prediction.
D4) The full Google Colab web user interface is shown which can be intimidating. The user does not need to use it in any way since ChimeraX pastes in the sequence, runs, and downloads the result. So it functions mainly as a log to show the calculation progress.
D5) The web panel always floats on top of the ChimeraX window and cannot be iconified. This is an annoying limitation of all our ChimeraX tool panels. The pane can be docked but need to be pretty large to fit the Colab UI and log messages. A possible fix would be to have a general AlphaFold tool pane offering database search and prediction to start the job and it can have an iconify/deiconify button "Prediction Log".
D6) The ChimeraX tool uses implementation details of the Google Colab site to inject the protein sequence and run the notebook without user interaction. That will likely break if Google changes the web site. There are no APIs to programmatically control Colab. We should do a status check to an RBVI url that can announce if the service is broken and recommend updating to newer working ChimeraX.
D7) Google Colab does not guarantee service and free accounts seem to be limited to 2 hours per day, so about 1 or 2 structure predictions. I hit this limit during development so bought Colab Pro service for $10 / month which tries to provide 12 hours / day service and priority access to faster GPUs. We should describe this cheap service in the docs and link to it from a general AlphaFold gui. Code will need to be improved to gracefully handle session expiration.
D8) The Colab AlphaFold uses reduced sequence databases (150 Gbytes) compared to full AlphaFold (2.2 Tbytes) and uses no structure templates. This can result in worse predictions. We should offer another way to do a full calculation possibly using AWS or RBVI servers.
D9) The tool needs weeks of improvement to handle errors gracefully, improve the user interface, and offer better output such as all 5 models that are computed. Currently it is run with the "alphafold predict <chain-spec>" command.
This Colab AlphaFold solution is the easiest way I could come up with to provide the heavy compute resources (Linux + Nvidia GPU + hundreds of Gbytes to Tbytes of disk + hour long run times per job). I think it is a pretty usable first try.