Opened 13 months ago

Closed 13 months ago

Last modified 13 months ago

#16088 closed enhancement (fixed)

Fetch model archive structures and plot PAE values

Reported by: gerardo.tauriello@… Owned by: Tom Goddard
Priority: moderate Milestone:
Component: Input/Output Version:
Keywords: Cc: Elaine Meng, Eric Pettersen
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

Gerardo contributed code to fetch Model Archive structures and plot PAE files derived from some simpler ChimeraX Recipes code. Would be nice to add this into the ChimeraX distribution.

From: Gerardo Tauriello
Subject: [chimerax-users] Re: [EXTERN] Re: Support for _ma_qa_metric_local_pairwise to display PAE matrices stored in ModelArchive
Date: October 8, 2024 at 3:45:16 AM PDT
To: Tom Goddard
Cc: "chimerax-users@…" <chimerax-users@…>

Dear Tom,

First of all: thank you very much for that recipe. Seems to work great.

As I was motivated to learn how to script ChimeraX, I extended the recipe (new file attached) to better support data as stored in ModelArchive (MA). It now includes the following:
Preference for pairwise metric of type PAE (instead of blindly picking the first one) and some info from the metadata on the chosen metric
Handling of PAE files as accompanying data. It can handle additional files stored locally in the same folder as the main ModelCIF file or files stored remotely and it can handle files stored within ZIP files (as used in all cases in MA)
An additional "modelarchive open" command to fetch files directly from MA given an MA ID and add PAE if available. Example usages for the two cases mentioned in my original post: "modelarchive open ma-dm-hisrep-003" and "modelarchive open ma-bak-cepc-0944"

One observation when looking at sparsely populated PAE matrices as in ma-bak-cepc-0944: the mouse-over text which displays the score does not match the display. In ma-bak-cepc-0944 only inter-chain scores are available between chains A and B and the display shows that on the top right, while the mouse-over shows values in the bottom-left.
I have no reason to believe that this is specific to ModelCIF and so I think that all PAE matrices suffer from the same display issue. This is barely noticeable as PAE matrices are almost symmetric but very much visible in this case.

In terms of handling other pairwise scores beyond AlphaFold PAE. I tried to see how this looks in ma-bak-cepc-0944 which has an additional score of type contact probability (value in [0,1]). If I load it with "modelcif pae #2 metric_id 4 default_score 0" it works ok and the only thing preventing me from using it, is that the display is very PAE centric with the used titles and color scale. Just being able to set a color scale which depends on the range of data in the matrix would already make this a useful workaround... 😉

Regards,
Gerardo

Attachments (1)

modelcif_pae.py (10.7 KB ) - added by Tom Goddard 13 months ago.
Contributed code from Gerardo Tauriello

Download all attachments as: .zip

Change History (7)

by Tom Goddard, 13 months ago

Attachment: modelcif_pae.py added

Contributed code from Gerardo Tauriello

comment:1 by Tom Goddard, 13 months ago

Cc: Elaine Meng Eric Pettersen added
Resolution: fixed
Status: assignedclosed

Done.

I added the code to ChimeraX changing it in a few small ways. I made fetching the Model Archive files use the standard ChimeraX command syntax

open ma-bak-cepc-0944 from modelarchive

There is one option to the open command "pae true|false", default false. If "pae true" then in tries to plot PAE values for the structure fetching them if necessary.

The code also defines a separate command to fetch the PAE values which provides more control of what is plotted:

modelcif pae #1 metric_id 4 default_score 0 palette rainbow range 0,1

Here #1 is the already opened Model Archive structure. metric_id is optional and specifies which pairwise residue score is to be plotted, if not specified it looks for a score called "PAE" and if that does not exist it uses the first metric_id in the file. The pairwise scores from ModelCIF format does not have to contain a score for every residue pair. The default_score value says what score to use if none is specified (default 100). The scores are shown with the AlphaFold PAE plot and an alternate palette and range can be specified for the colors and score range for that plot. One additional option to the modelcif pae command is jsonOutputPath. The score values from a ModelCIF file are written to a .json temporary file if this path is not specified and opened with the "alphafold pae" command. If the file is specified then the scores will be written to that file using the standard AlphaFold PAE file format.

comment:2 by Tom Goddard, 13 months ago

Another change I made is that the associated files fetched when opening PAE are saved in the ~/Downloads/ChimeraX/ModelArchive directory. The original code just used temporary files for those so they had to be fetched every time they were used. In my changed version to avoid refetching the ModelArchive directory contents look like the following with zip files extracted.

  /Users/goddard/Downloads/ChimeraX/ModelArchive:
  total used in directory 9264 available 239.1 GiB
  drwxr-xr-x@  8 goddard  staff      256 Oct  8 17:38 .
  drwxr-xr-x@ 19 goddard  staff      608 Oct  8 12:22 ..
  -rw-r--r--   1 goddard  staff   521379 Mar  4  2024 ma-bak-cepc-0944.cif
  drwxr-xr-x   3 goddard  staff       96 Oct  8 17:29 ma-bak-cepc-0944_assoc
  -rw-r--r--   1 goddard  staff   686931 Mar  4  2024 ma-bak-cepc-0944_assoc.zip
  -rw-r--r--@  1 goddard  staff   384641 Aug 23 05:37 ma-dm-hisrep-003.cif
  drwxr-xr-x   3 goddard  staff       96 Oct  8 17:38 ma-dm-hisrep-003_assoc
  -rw-r--r--   1 goddard  staff  3112903 Aug 23 05:37 ma-dm-hisrep-003_assoc.zip

comment:3 by Tom Goddard, 13 months ago

Elaine could you document this and credit Gerardo Tauriello for contributing the code?

comment:4 by Elaine Meng, 13 months ago

Thanks for the info - will work on this soon and make sure to include the credit. Any plans to put ModelArchive (+PAE) in the Fetch by ID dialog? I didn't see it in the daily build.

Elaine

Last edited 13 months ago by Tom Goddard (previous) (diff)

comment:5 by goddard@…, 13 months ago

Hi Elaine,

  I see Model Archive in Fetch by Id.  It must be added automatically since I did nothing to add it.  You probably won't see it in the current daily build because all the daily builds failed last night.

	Tom
Version 0, edited 13 months ago by goddard@… (next)

comment:6 by Tom Goddard, 13 months ago

The metric_id option is an integer and the user isn't likely to know the right value to use. So I replaced the metric_id option with metric_name (e.g. "PAE" or "contact probability") from the ma_qa_metric table

https://mmcif.wwpdb.org/dictionaries/mmcif_ma.dic/Categories/ma_qa_metric.html

Note: See TracTickets for help on using tickets.