#16088 closed enhancement (fixed)
Fetch model archive structures and plot PAE values
| Reported by: | Owned by: | Tom Goddard | |
|---|---|---|---|
| Priority: | moderate | Milestone: | |
| Component: | Input/Output | Version: | |
| Keywords: | Cc: | Elaine Meng, Eric Pettersen | |
| Blocked By: | Blocking: | ||
| Notify when closed: | Platform: | all | |
| Project: | ChimeraX |
Description
Gerardo contributed code to fetch Model Archive structures and plot PAE files derived from some simpler ChimeraX Recipes code. Would be nice to add this into the ChimeraX distribution.
From: Gerardo Tauriello
Subject: [chimerax-users] Re: [EXTERN] Re: Support for _ma_qa_metric_local_pairwise to display PAE matrices stored in ModelArchive
Date: October 8, 2024 at 3:45:16 AM PDT
To: Tom Goddard
Cc: "chimerax-users@…" <chimerax-users@…>
Dear Tom,
First of all: thank you very much for that recipe. Seems to work great.
As I was motivated to learn how to script ChimeraX, I extended the recipe (new file attached) to better support data as stored in ModelArchive (MA). It now includes the following:
Preference for pairwise metric of type PAE (instead of blindly picking the first one) and some info from the metadata on the chosen metric
Handling of PAE files as accompanying data. It can handle additional files stored locally in the same folder as the main ModelCIF file or files stored remotely and it can handle files stored within ZIP files (as used in all cases in MA)
An additional "modelarchive open" command to fetch files directly from MA given an MA ID and add PAE if available. Example usages for the two cases mentioned in my original post: "modelarchive open ma-dm-hisrep-003" and "modelarchive open ma-bak-cepc-0944"
One observation when looking at sparsely populated PAE matrices as in ma-bak-cepc-0944: the mouse-over text which displays the score does not match the display. In ma-bak-cepc-0944 only inter-chain scores are available between chains A and B and the display shows that on the top right, while the mouse-over shows values in the bottom-left.
I have no reason to believe that this is specific to ModelCIF and so I think that all PAE matrices suffer from the same display issue. This is barely noticeable as PAE matrices are almost symmetric but very much visible in this case.
In terms of handling other pairwise scores beyond AlphaFold PAE. I tried to see how this looks in ma-bak-cepc-0944 which has an additional score of type contact probability (value in [0,1]). If I load it with "modelcif pae #2 metric_id 4 default_score 0" it works ok and the only thing preventing me from using it, is that the display is very PAE centric with the used titles and color scale. Just being able to set a color scale which depends on the range of data in the matrix would already make this a useful workaround... 😉
Regards,
Gerardo
Attachments (1)
Change History (7)
by , 13 months ago
| Attachment: | modelcif_pae.py added |
|---|
comment:1 by , 13 months ago
| Cc: | added |
|---|---|
| Resolution: | → fixed |
| Status: | assigned → closed |
Done.
I added the code to ChimeraX changing it in a few small ways. I made fetching the Model Archive files use the standard ChimeraX command syntax
open ma-bak-cepc-0944 from modelarchive
There is one option to the open command "pae true|false", default false. If "pae true" then in tries to plot PAE values for the structure fetching them if necessary.
The code also defines a separate command to fetch the PAE values which provides more control of what is plotted:
modelcif pae #1 metric_id 4 default_score 0 palette rainbow range 0,1
Here #1 is the already opened Model Archive structure. metric_id is optional and specifies which pairwise residue score is to be plotted, if not specified it looks for a score called "PAE" and if that does not exist it uses the first metric_id in the file. The pairwise scores from ModelCIF format does not have to contain a score for every residue pair. The default_score value says what score to use if none is specified (default 100). The scores are shown with the AlphaFold PAE plot and an alternate palette and range can be specified for the colors and score range for that plot. One additional option to the modelcif pae command is jsonOutputPath. The score values from a ModelCIF file are written to a .json temporary file if this path is not specified and opened with the "alphafold pae" command. If the file is specified then the scores will be written to that file using the standard AlphaFold PAE file format.
comment:2 by , 13 months ago
Another change I made is that the associated files fetched when opening PAE are saved in the ~/Downloads/ChimeraX/ModelArchive directory. The original code just used temporary files for those so they had to be fetched every time they were used. In my changed version to avoid refetching the ModelArchive directory contents look like the following with zip files extracted.
/Users/goddard/Downloads/ChimeraX/ModelArchive: total used in directory 9264 available 239.1 GiB drwxr-xr-x@ 8 goddard staff 256 Oct 8 17:38 . drwxr-xr-x@ 19 goddard staff 608 Oct 8 12:22 .. -rw-r--r-- 1 goddard staff 521379 Mar 4 2024 ma-bak-cepc-0944.cif drwxr-xr-x 3 goddard staff 96 Oct 8 17:29 ma-bak-cepc-0944_assoc -rw-r--r-- 1 goddard staff 686931 Mar 4 2024 ma-bak-cepc-0944_assoc.zip -rw-r--r--@ 1 goddard staff 384641 Aug 23 05:37 ma-dm-hisrep-003.cif drwxr-xr-x 3 goddard staff 96 Oct 8 17:38 ma-dm-hisrep-003_assoc -rw-r--r-- 1 goddard staff 3112903 Aug 23 05:37 ma-dm-hisrep-003_assoc.zip
comment:3 by , 13 months ago
Elaine could you document this and credit Gerardo Tauriello for contributing the code?
comment:4 by , 13 months ago
Thanks for the info - will work on this soon and make sure to include the credit. Any plans to put ModelArchive (+PAE) in the Fetch by ID dialog? I didn't see it in the daily build.
Elaine
comment:5 by , 13 months ago
Hi Elaine,
I see Model Archive in Fetch by Id. It must be added automatically since I did nothing to add it. You probably won't see it in the current daily build because all the daily builds failed last night.
Tom
comment:6 by , 13 months ago
The metric_id option is an integer and the user isn't likely to know the right value to use. So I replaced the metric_id option with metric_name (e.g. "PAE" or "contact probability") from the ma_qa_metric table
https://mmcif.wwpdb.org/dictionaries/mmcif_ma.dic/Categories/ma_qa_metric.html
Contributed code from Gerardo Tauriello