Opened 3 years ago
Closed 3 years ago
#7260 closed enhancement (fixed)
Change AlphaFold prediction to use ColabFold for 5x faster runs
| Reported by: | Tom Goddard | Owned by: | Tom Goddard |
|---|---|---|---|
| Priority: | moderate | Milestone: | |
| Component: | Structure Prediction | Version: | |
| Keywords: | Cc: | Elaine Meng | |
| Blocked By: | Blocking: | ||
| Notify when closed: | Platform: | all | |
| Project: | ChimeraX |
Description
The ColabFold google colab page for running AlphaFold jobs uses a separate sequence alignment compute server and some other tricks described in the ColabFold paper to speed up predictions about 5-10 fold with no loss of quality. It would be a great improvement saving users lots of time if ChimeraX could use the ColabFold prediction method.
Change History (7)
comment:1 by , 3 years ago
comment:2 by , 3 years ago
I sent email to the ColabFold authors to see if it is ok to use their server.
From: Tom Goddard Subject: Using ColabFold in ChimeraX Date: July 11, 2022 at 6:32:36 PM PDT To: milot.mirdita at MPG, so at Harvard, martin.steinegger at SNU Korea Hi Milot, Sergey, Martin, I have been amazed by ColabFold's fast and accurate calculations and would love to use it in ChimeraX. Would it be ok if I added ColabFold structure prediction in ChimeraX? I would use my own Colab script, but would use your fast sequence alignment server. Currently researchers run about 140 AlphaFold jobs from ChimeraX per day (last 3 months 12376 jobs from 2696 unique IP addresses). With your faster service the use may go up, maybe as much as 500 jobs per day. There are about 10000 active ChimeraX users. My main interest is making it easier for ChimeraX users to predict structures and analyze them and use them in cryoEM modeling. I added AlphaFold prediction to ChimeraX in August 2021 using a modified version of Google's original slow Colab script that streams the sequence databases and uses jackhmmer and hhblits. Since then I've added tools to look at PAE plots and PAE at interfaces compute domains from PAE and use AlphaFold Database models. I'm trying to make it easier for researchers to use AlphaFold models. Thanks for the amazing work you've done with ColabFold. This is revolutionizing structural biology. Tom Goddard ChimeraX developer University of California, San Francisco
comment:3 by , 3 years ago
Sergey responded it is fine to use ColabFold from ChimeraX.
From: "Ovchinnikov, Sergey" Subject: Re: Using ColabFold in ChimeraX Date: July 11, 2022 at 6:45:01 PM PDT To: Tom Goddard , milot.mirdita, martin.steinegger Hi Tom, Good to hear from you! You are more than welcome to incorporate any parts of ColabFold into ChimeraX! Besides just faster MSA generation, we've also done work to speedup AlphaFold itself. More specifically, we reduced the compile time from ~5 mins to ~20 seconds (and avoiding compile time all together). This I think will be helpful for users with smaller proteins < 256. We also moved recycles outside of alphafold, into a manual recycle loop. The advantage of this is that you can technically save all the pdbs at each recycle... and potentially keep running for more recycles if you are not happy without having to restart the job from scratch (though the latter is not yet implemented) See here: https://github.com/steineggerlab/alphafold/blob/main/alphafold/model/model.py#L184 I love the 3D-PAE plots! I was actually thinking about implementing something similar to this but using the predicted contacts from AlphaFold for every pair of residues (instead of actual distance from model as you have, I was thinking summing the probabilities of contacts < 8 angstroms), then coloring/scaling by the probability of contact for every pair of positions. This should also capture the confidence between protein pairs/domains, but maybe a little easier to interpret and provide slightly more sparse distribution of contacts. 😄 Tell us if there is anything we can do to help! -Sergey
comment:4 by , 3 years ago
I read the ColabFold colab notebook script. It is pretty short, 400 lines, containing mostly just the user interface code, with the actual code that runs AlphaFold in a pip installed colabfold module. This separation of the google colab gui and the AlphaFold running code is very nice. I can easily replace the google colab gui with a ChimeraX version. Both the notebook and the colabfold module are at Sergey's github ColabFold repository
I didn't realize pip install can install directly from github and does not even need a wheel on github. pip can use the pyproject.toml (or legacy setup.py) in the github project to build the wheel.
comment:5 by , 3 years ago
Here's what the Google Colab /contents directory contains after running the stardard ColabFold notebook. The name of the run in the ColabFold gui was "test" and it added the hash "3219c" to the name.
/content# ls -l . total 1672 -rw-r--r-- 1 root root 2697 Jul 12 17:36 cite.bibtex -rw-r--r-- 1 root root 0 Jul 12 17:36 COLABFOLD_READY -rw-r--r-- 1 root root 704 Jul 12 17:36 config.json -rw-r--r-- 1 root root 857 Jul 12 17:38 log.txt drwxrwxr-x 2 652857 89939 4096 Jul 12 17:36 params drwxr-xr-x 1 root root 4096 Jul 6 13:22 sample_data -rw-r--r-- 1 root root 703424 Jul 12 17:36 test_3219c.a3m -rw-r--r-- 1 root root 98040 Jul 12 17:38 test_3219c_coverage.png -rw-r--r-- 1 root root 54 Jul 12 17:35 test_3219c.csv -rw-r--r-- 1 root root 0 Jul 12 17:38 test_3219c.done.txt drwxr-xr-x 2 root root 4096 Jul 12 17:36 test_3219c_env -rw-r--r-- 1 root root 45670 Jul 12 17:38 test_3219c_PAE.png -rw-r--r-- 1 root root 128679 Jul 12 17:38 test_3219c_plddt.png -rw-r--r-- 1 root root 9516 Jul 12 17:38 test_3219c_predicted_aligned_error_v1.json -rw-r--r-- 1 root root 541439 Jul 12 17:38 test_3219c.result.zip -rw-r--r-- 1 root root 18711 Jul 12 17:38 test_3219c_unrelaxed_rank_1_model_4.pdb -rw-r--r-- 1 root root 6346 Jul 12 17:38 test_3219c_unrelaxed_rank_1_model_4_scores.json -rw-r--r-- 1 root root 18711 Jul 12 17:38 test_3219c_unrelaxed_rank_2_model_3.pdb -rw-r--r-- 1 root root 6397 Jul 12 17:38 test_3219c_unrelaxed_rank_2_model_3_scores.json -rw-r--r-- 1 root root 18711 Jul 12 17:38 test_3219c_unrelaxed_rank_3_model_1.pdb -rw-r--r-- 1 root root 6387 Jul 12 17:38 test_3219c_unrelaxed_rank_3_model_1_scores.json -rw-r--r-- 1 root root 18711 Jul 12 17:38 test_3219c_unrelaxed_rank_4_model_2.pdb -rw-r--r-- 1 root root 6381 Jul 12 17:38 test_3219c_unrelaxed_rank_4_model_2_scores.json -rw-r--r-- 1 root root 18711 Jul 12 17:38 test_3219c_unrelaxed_rank_5_model_5.pdb -rw-r--r-- 1 root root 6398 Jul 12 17:38 test_3219c_unrelaxed_rank_5_model_5_scores.json
comment:6 by , 3 years ago
| Cc: | added |
|---|
Done.
I made ChimeraX daily builds, 1.4 and 1.3 use the ColabFold including improved logging of results with colabfold matplotlib plots. I tweeted it and made a video on YouTube showing its use to make a 5 minute prediction of a 200 amino acid dimer.
https://twitter.com/UCSFChimeraX/status/1549069503513866240
The daily build AlphaFold GUI also has a new option shown after the Options button is pressed "Use PDB templates when predicting structures", default false.
Also I change the default for the energy minimize option for the AlphaFold GUI and command to false since now it is relatively quick to rerun the prediction with true (ticket #7236).
comment:7 by , 3 years ago
| Resolution: | → fixed |
|---|---|
| Status: | assigned → closed |
Certainly need to ask ColabFold authors if this is ok since it may increase load on their sequence server significantly.
ChimeraX prediction job statistics for 3 months April, May, June 2022: 12376 jobs (137 per day) submitted by 2696 unique IP addresses.