Opened 4 years ago
Last modified 4 years ago
#4969 assigned enhancement
Improvements to fetching AlphaFold models
| Reported by: | Tom Goddard | Owned by: | Tom Goddard |
|---|---|---|---|
| Priority: | moderate | Milestone: | |
| Component: | Structure Analysis | Version: | |
| Keywords: | Cc: | Elaine Meng | |
| Blocked By: | Blocking: | ||
| Notify when closed: | Platform: | all | |
| Project: | ChimeraX |
Description
Many improvements to fetching AlphaFold models would be nice.
1) Allow fetch by sequence search for chain of open structures. Use a web service to do the search. Might have options like minimum sequence identity.
2) Allow opening an AlphaFold model by the uniprot name e.g. LDLR_HUMAN instead of P01130. Will need to use a web service.
3) Handle uniprot ids that have been superceded. For instance pdb entry 3p5b chain L lists obsolete Q59FQ1 which is superceded by P01130. Will need to use a web service.
4) Allow fetching an alphafold model directly with a specified sequence.
5) Make a user interface panel to fetch AlphaFold models for an open structure, uniprot id, or sequence. Easier to learn than a command.
6) Allow coloring AlphaFold model per-residue in some new ways: by distance from experimental model, highlight residues not in experimental model.
Change History (11)
comment:1 by , 4 years ago
comment:2 by , 4 years ago
Added a chain table to the log after fetching AlphaFold chains matching a structure. It gives RMSD per chain which is useful. Would like to report percent sequence identity too but currently can't get that out of the Python MatchMaker command, ticket #4979.
follow-up: 3 comment:3 by , 4 years ago
Another idea is to add an option to show the Matchmaker pairwise sequence alignments. That gives much more useful information about which parts are more or less similar in sequence than a single % identity number.
follow-up: 4 comment:4 by , 4 years ago
Yep, I definitely plan to add that option, forgot to write it down in the ticket. Thanks. Don't think it can be default true because often I think this will be used for complexes with many chains and that would make too many alignment panels.
comment:5 by , 4 years ago
7) Allow fetching N top hits from sequence search. Elaine suggests this could be useful for looking at alternative loops.
comment:6 by , 4 years ago
8) Add option to show matchmaker pairwise alignments.
9) Add percent sequence identity to alphafold chain table computed by matchmaker.
comment:7 by , 4 years ago
Tristan tried partitioning the AlphaFold model residues using the pairwise CA error estimate file available from the AlphaFold database. Ticket #4966. Would be cool if those pieces could be moved rigidly to fit an experimental structure. Even if the linkers were physically unreasonable, this would allow seeing if the AlphaFold domain fold matches the experimental structure.
10) Align AlphaFold domains identified by pairwise CA error matrix to experimental structure.
comment:8 by , 4 years ago
11) Add a color key option to show the confidence coloring scale. Not clear how to space the colors in the key (equally or proportional to confidence value).
comment:9 by , 4 years ago
12) Make chain table with mostly identical rows only duplicate different values. Eric suggests "Another issue is that the table frequently splits identical chains into distinct rows because of slight difference in RMSD or "Seen" (e.g. 1www has only two distinct types of chains, but "alphafold match #1" produces a 4-row table). The table could be smarter and row-span identical UniProt Names/IDs into single cells. If you row-spanned Length, then perhaps clicking on that could select all chains of that type."
comment:11 by , 4 years ago
14) Make alphafold match use compute_ss = false option with matchmaker to speed up aligning structures to large models many fold. See comment 7 of #5370 for timings where match takes 9 minutes but reduces to 2 minutes of dssp is not run on the whole structure for every aligned chain.
I added a sequence search web service using blat on plato enabled with the open command "search true" option. By default it is off. Want to improve it and then make default on. Might want a mode where it tries the file uniprot ids and if those don't exist or no alpha fold model then fallback to search. Currently if search is used then file uniprot ids are ignored. That is also a reasonable mode.