Opened 3 years ago

Closed 3 years ago

#7260 closed enhancement (fixed)

Change AlphaFold prediction to use ColabFold for 5x faster runs

Reported by: Tom Goddard Owned by: Tom Goddard
Priority: moderate Milestone:
Component: Structure Prediction Version:
Keywords: Cc: Elaine Meng
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

The ColabFold google colab page for running AlphaFold jobs uses a separate sequence alignment compute server and some other tricks described in the ColabFold paper to speed up predictions about 5-10 fold with no loss of quality. It would be a great improvement saving users lots of time if ChimeraX could use the ColabFold prediction method.

Change History (7)

comment:1 by Tom Goddard, 3 years ago

Certainly need to ask ColabFold authors if this is ok since it may increase load on their sequence server significantly.

ChimeraX prediction job statistics for 3 months April, May, June 2022: 12376 jobs (137 per day) submitted by 2696 unique IP addresses.

comment:2 by Tom Goddard, 3 years ago

I sent email to the ColabFold authors to see if it is ok to use their server.

From: Tom Goddard 
Subject: Using ColabFold in ChimeraX
Date: July 11, 2022 at 6:32:36 PM PDT
To: milot.mirdita at MPG, so at Harvard, martin.steinegger at SNU Korea

Hi Milot, Sergey, Martin,

  I have been amazed by ColabFold's fast and accurate calculations and would love to use it in ChimeraX.  Would it be ok if I added ColabFold structure prediction in ChimeraX?  I would use my own Colab script, but would use your fast sequence alignment server.  Currently researchers run about 140 AlphaFold jobs from ChimeraX per day (last 3 months 12376 jobs from 2696 unique IP addresses).  With your faster service the use may go up, maybe as much as 500 jobs per day. There are about 10000 active ChimeraX users.

  My main interest is making it easier for ChimeraX users to predict structures and analyze them and use them in cryoEM modeling.  I added AlphaFold prediction to ChimeraX in August 2021 using a modified version of Google's original slow Colab script that streams the sequence databases and uses jackhmmer and hhblits.  Since then I've added tools to look at PAE plots and PAE at interfaces compute domains from PAE and use AlphaFold Database models.  I'm trying to make it easier for researchers to use AlphaFold models.

  Thanks for the amazing work you've done with ColabFold.  This is revolutionizing structural biology.

	Tom Goddard
	ChimeraX developer
	University of California, San Francisco

comment:3 by Tom Goddard, 3 years ago

Sergey responded it is fine to use ColabFold from ChimeraX.

From: "Ovchinnikov, Sergey"
Subject: Re: Using ColabFold in ChimeraX
Date: July 11, 2022 at 6:45:01 PM PDT
To: Tom Goddard , milot.mirdita, martin.steinegger

Hi Tom,
Good to hear from you! You are more than welcome to incorporate any parts of ColabFold into ChimeraX!

 Besides just faster MSA generation, we've also done work to speedup AlphaFold itself. More specifically, we reduced the compile time from ~5 mins to ~20 seconds (and avoiding compile time all together). This I think will be helpful for users with smaller proteins < 256. We also moved recycles outside of alphafold, into a manual recycle loop. The advantage of this is that you can technically save all the pdbs at each recycle... and potentially keep running for more recycles if you are not happy without having to restart the job from scratch (though the latter is not yet implemented)

See here:
https://github.com/steineggerlab/alphafold/blob/main/alphafold/model/model.py#L184

I love the 3D-PAE plots! I was actually thinking about implementing something similar to this but using the predicted contacts from AlphaFold for every pair of residues (instead of actual distance from model as you have, I was thinking summing the probabilities of contacts < 8 angstroms), then coloring/scaling by the probability of contact for every pair of positions. This should also capture the confidence between protein pairs/domains, but maybe a little easier to interpret and provide slightly more sparse distribution of contacts. 😄

Tell us if there is anything we can do to help!
-Sergey

comment:4 by Tom Goddard, 3 years ago

I read the ColabFold colab notebook script. It is pretty short, 400 lines, containing mostly just the user interface code, with the actual code that runs AlphaFold in a pip installed colabfold module. This separation of the google colab gui and the AlphaFold running code is very nice. I can easily replace the google colab gui with a ChimeraX version. Both the notebook and the colabfold module are at Sergey's github ColabFold repository

https://github.com/sokrypton/ColabFold

I didn't realize pip install can install directly from github and does not even need a wheel on github. pip can use the pyproject.toml (or legacy setup.py) in the github project to build the wheel.

The colabfold module uses the Steinegger lab fork of AlphaFold which is on PyPi as alphafold-colabfold and at github here

https://github.com/steineggerlab/alphafold

It was updatd 6 times in June, versions 2.1.9 to 2.1.14.

Last edited 3 years ago by Tom Goddard (previous) (diff)

comment:5 by Tom Goddard, 3 years ago

Here's what the Google Colab /contents directory contains after running the stardard ColabFold notebook. The name of the run in the ColabFold gui was "test" and it added the hash "3219c" to the name.

/content# ls -l .
total 1672
-rw-r--r-- 1 root   root    2697 Jul 12 17:36 cite.bibtex
-rw-r--r-- 1 root   root       0 Jul 12 17:36 COLABFOLD_READY
-rw-r--r-- 1 root   root     704 Jul 12 17:36 config.json
-rw-r--r-- 1 root   root     857 Jul 12 17:38 log.txt
drwxrwxr-x 2 652857 89939   4096 Jul 12 17:36 params
drwxr-xr-x 1 root   root    4096 Jul  6 13:22 sample_data
-rw-r--r-- 1 root   root  703424 Jul 12 17:36 test_3219c.a3m
-rw-r--r-- 1 root   root   98040 Jul 12 17:38 test_3219c_coverage.png
-rw-r--r-- 1 root   root      54 Jul 12 17:35 test_3219c.csv
-rw-r--r-- 1 root   root       0 Jul 12 17:38 test_3219c.done.txt
drwxr-xr-x 2 root   root    4096 Jul 12 17:36 test_3219c_env
-rw-r--r-- 1 root   root   45670 Jul 12 17:38 test_3219c_PAE.png
-rw-r--r-- 1 root   root  128679 Jul 12 17:38 test_3219c_plddt.png
-rw-r--r-- 1 root   root    9516 Jul 12 17:38 test_3219c_predicted_aligned_error_v1.json
-rw-r--r-- 1 root   root  541439 Jul 12 17:38 test_3219c.result.zip
-rw-r--r-- 1 root   root   18711 Jul 12 17:38 test_3219c_unrelaxed_rank_1_model_4.pdb
-rw-r--r-- 1 root   root    6346 Jul 12 17:38 test_3219c_unrelaxed_rank_1_model_4_scores.json
-rw-r--r-- 1 root   root   18711 Jul 12 17:38 test_3219c_unrelaxed_rank_2_model_3.pdb
-rw-r--r-- 1 root   root    6397 Jul 12 17:38 test_3219c_unrelaxed_rank_2_model_3_scores.json
-rw-r--r-- 1 root   root   18711 Jul 12 17:38 test_3219c_unrelaxed_rank_3_model_1.pdb
-rw-r--r-- 1 root   root    6387 Jul 12 17:38 test_3219c_unrelaxed_rank_3_model_1_scores.json
-rw-r--r-- 1 root   root   18711 Jul 12 17:38 test_3219c_unrelaxed_rank_4_model_2.pdb
-rw-r--r-- 1 root   root    6381 Jul 12 17:38 test_3219c_unrelaxed_rank_4_model_2_scores.json
-rw-r--r-- 1 root   root   18711 Jul 12 17:38 test_3219c_unrelaxed_rank_5_model_5.pdb
-rw-r--r-- 1 root   root    6398 Jul 12 17:38 test_3219c_unrelaxed_rank_5_model_5_scores.json

comment:6 by Tom Goddard, 3 years ago

Cc: Elaine Meng added

Done.

I made ChimeraX daily builds, 1.4 and 1.3 use the ColabFold including improved logging of results with colabfold matplotlib plots. I tweeted it and made a video on YouTube showing its use to make a 5 minute prediction of a 200 amino acid dimer.

https://twitter.com/UCSFChimeraX/status/1549069503513866240

https://youtu.be/gIbCAcMDM7E

The daily build AlphaFold GUI also has a new option shown after the Options button is pressed "Use PDB templates when predicting structures", default false.

Also I change the default for the energy minimize option for the AlphaFold GUI and command to false since now it is relatively quick to rerun the prediction with true (ticket #7236).

comment:7 by Tom Goddard, 3 years ago

Resolution: fixed
Status: assignedclosed
Note: See TracTickets for help on using tickets.