Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#4966 closed enhancement (fixed)

Alphafold: fetch the predicted aligned error file

Reported by: Tristan Croll Owned by: Tom Goddard
Priority: normal Milestone:
Component: Input/Output Version:
Keywords: Cc:
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

The following bug report has been submitted:
Platform:        Linux-3.10.0-1160.25.1.el7.x86_64-x86_64-with-glibc2.17
ChimeraX Version: 1.3.dev202107240347 (2021-07-24 03:47:26 UTC)
Description
Really loving the new alphafold plugin! Just wondering... would you be willing to add a small convenience function to fetch the predicted aligned error file (has the format AF-{UniProt ID}-F1-predicted_aligned_error_v1.json) via the same API? This probably won't be useful for visualisation, but provides really useful information for weighting distance restraints (it's a 2D matrix giving the predicted error in CA-CA distance for every pair of residues in the model). 

OpenGL version: 3.3.0 NVIDIA 465.19.01
OpenGL renderer: NVIDIA TITAN Xp/PCIe/SSE2
OpenGL vendor: NVIDIA Corporation
Manufacturer: Dell Inc.
Model: Precision T5600
OS: CentOS Linux 7 Core
Architecture: 64bit ELF
Virutal Machine: none
CPU: 32 Intel(R) Xeon(R) CPU E5-2687W 0 @ 3.10GHz
Cache Size: 20480 KB
Memory:
	              total        used        free      shared  buff/cache   available
	Mem:            62G         23G         22G        320M         16G         38G
	Swap:          4.9G          0B        4.9G

Graphics:
	03:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP102 [TITAN Xp] [10de:1b02] (rev a1)	
	Subsystem: NVIDIA Corporation Device [10de:11df]	
	Kernel driver in use: nvidia
Locale: ('en_GB', 'UTF-8')
PyQt5 5.15.2, Qt 5.15.2
Installed Packages:
    alabaster: 0.7.12
    appdirs: 1.4.4
    Babel: 2.9.1
    backcall: 0.2.0
    blockdiag: 2.0.1
    certifi: 2021.5.30
    cftime: 1.5.0
    chardet: 4.0.0
    ChimeraX-AddCharge: 1.1.4
    ChimeraX-AddH: 2.1.10
    ChimeraX-AlignmentAlgorithms: 2.0
    ChimeraX-AlignmentHdrs: 3.2
    ChimeraX-AlignmentMatrices: 2.0
    ChimeraX-Alignments: 2.1
    ChimeraX-AlphaFold: 1.0
    ChimeraX-AltlocExplorer: 1.0
    ChimeraX-AmberInfo: 1.0
    ChimeraX-Arrays: 1.0
    ChimeraX-Atomic: 1.27.1
    ChimeraX-AtomicLibrary: 4.0
    ChimeraX-AtomSearch: 2.0
    ChimeraX-AtomSearchLibrary: 1.0
    ChimeraX-AxesPlanes: 2.0
    ChimeraX-BasicActions: 1.1
    ChimeraX-BILD: 1.0
    ChimeraX-BlastProtein: 1.1.1
    ChimeraX-BondRot: 2.0
    ChimeraX-BugReporter: 1.0
    ChimeraX-BuildStructure: 2.5.2
    ChimeraX-Bumps: 1.0
    ChimeraX-BundleBuilder: 1.1
    ChimeraX-ButtonPanel: 1.0
    ChimeraX-CageBuilder: 1.0
    ChimeraX-CellPack: 1.0
    ChimeraX-Centroids: 1.1
    ChimeraX-ChemGroup: 2.0
    ChimeraX-Clashes: 2.1
    ChimeraX-Clipper: 0.17.0
    ChimeraX-ColorActions: 1.0
    ChimeraX-ColorGlobe: 1.0
    ChimeraX-ColorKey: 1.3.2
    ChimeraX-CommandLine: 1.1.4
    ChimeraX-ConnectStructure: 2.0
    ChimeraX-Contacts: 1.0
    ChimeraX-Core: 1.3.dev202107240347
    ChimeraX-CoreFormats: 1.0
    ChimeraX-coulombic: 1.3
    ChimeraX-Crosslinks: 1.0
    ChimeraX-Crystal: 1.0
    ChimeraX-CrystalContacts: 1.0
    ChimeraX-DataFormats: 1.2
    ChimeraX-Dicom: 1.0
    ChimeraX-DistMonitor: 1.1.4
    ChimeraX-DistUI: 1.0
    ChimeraX-Dssp: 2.0
    ChimeraX-EMDB-SFF: 1.0
    ChimeraX-ExperimentalCommands: 1.0
    ChimeraX-FileHistory: 1.0
    ChimeraX-FunctionKey: 1.0
    ChimeraX-Geometry: 1.1
    ChimeraX-gltf: 1.0
    ChimeraX-Graphics: 1.1
    ChimeraX-Hbonds: 2.1
    ChimeraX-Help: 1.1
    ChimeraX-HKCage: 1.3
    ChimeraX-IHM: 1.1
    ChimeraX-ImageFormats: 1.1
    ChimeraX-IMOD: 1.0
    ChimeraX-IO: 1.0.1
    ChimeraX-ISOLDE: 1.3.dev32
    ChimeraX-ItemsInspection: 1.0
    ChimeraX-Label: 1.1
    ChimeraX-LinuxSupport: 1.0
    ChimeraX-ListInfo: 1.1.1
    ChimeraX-Log: 1.1.4
    ChimeraX-LookingGlass: 1.1
    ChimeraX-Maestro: 1.8.1
    ChimeraX-Map: 1.1
    ChimeraX-MapData: 2.0
    ChimeraX-MapEraser: 1.0
    ChimeraX-MapFilter: 2.0
    ChimeraX-MapFit: 2.0
    ChimeraX-MapSeries: 2.1
    ChimeraX-Markers: 1.0
    ChimeraX-Mask: 1.0
    ChimeraX-MatchMaker: 1.2.1
    ChimeraX-MDcrds: 2.4
    ChimeraX-MedicalToolbar: 1.0.1
    ChimeraX-Meeting: 1.0
    ChimeraX-MLP: 1.1
    ChimeraX-mmCIF: 2.3
    ChimeraX-MMTF: 2.1
    ChimeraX-Modeller: 1.0.2
    ChimeraX-ModelPanel: 1.1
    ChimeraX-ModelSeries: 1.0
    ChimeraX-Mol2: 2.0
    ChimeraX-Morph: 1.0
    ChimeraX-MouseModes: 1.1
    ChimeraX-Movie: 1.0
    ChimeraX-Neuron: 1.0
    ChimeraX-Nucleotides: 2.0.2
    ChimeraX-OpenCommand: 1.6.2
    ChimeraX-PDB: 2.4.4
    ChimeraX-PDBBio: 1.0
    ChimeraX-PDBLibrary: 1.0.1
    ChimeraX-PDBMatrices: 1.0
    ChimeraX-Phenix: 0.3
    ChimeraX-PickBlobs: 1.0
    ChimeraX-Positions: 1.0
    ChimeraX-PresetMgr: 1.0.1
    ChimeraX-PubChem: 2.1
    ChimeraX-ReadPbonds: 1.0
    ChimeraX-Registration: 1.1
    ChimeraX-RemoteControl: 1.0
    ChimeraX-ResidueFit: 1.0
    ChimeraX-RestServer: 1.1
    ChimeraX-RNALayout: 1.0
    ChimeraX-RotamerLibMgr: 2.0
    ChimeraX-RotamerLibsDunbrack: 2.0
    ChimeraX-RotamerLibsDynameomics: 2.0
    ChimeraX-RotamerLibsRichardson: 2.0
    ChimeraX-Sample: 0.1
    ChimeraX-SaveCommand: 1.4.1
    ChimeraX-SchemeMgr: 1.0
    ChimeraX-SDF: 2.0
    ChimeraX-Segger: 1.0
    ChimeraX-Segment: 1.0
    ChimeraX-SelInspector: 1.0
    ChimeraX-SeqView: 2.4.1
    ChimeraX-Shape: 1.0.1
    ChimeraX-Shell: 1.0
    ChimeraX-Shortcuts: 1.1
    ChimeraX-ShowAttr: 1.0
    ChimeraX-ShowSequences: 1.0
    ChimeraX-SideView: 1.0
    ChimeraX-Smiles: 2.1
    ChimeraX-SmoothLines: 1.0
    ChimeraX-SpaceNavigator: 1.0
    ChimeraX-StdCommands: 1.6
    ChimeraX-STL: 1.0
    ChimeraX-Storm: 1.0
    ChimeraX-Struts: 1.0
    ChimeraX-Surface: 1.0
    ChimeraX-SwapAA: 2.0
    ChimeraX-SwapRes: 2.1
    ChimeraX-TapeMeasure: 1.0
    ChimeraX-Test: 1.0
    ChimeraX-Toolbar: 1.1
    ChimeraX-ToolshedUtils: 1.2
    ChimeraX-Tug: 1.0
    ChimeraX-UI: 1.10.1
    ChimeraX-uniprot: 2.1
    ChimeraX-UnitCell: 1.0
    ChimeraX-ViewDockX: 1.0.1
    ChimeraX-Vive: 1.1
    ChimeraX-VolumeMenu: 1.0
    ChimeraX-Voyager: 0.1
    ChimeraX-VTK: 1.0
    ChimeraX-WavefrontOBJ: 1.0
    ChimeraX-WebCam: 1.0
    ChimeraX-WebServices: 1.0
    ChimeraX-Zone: 1.0
    colorama: 0.4.4
    comtypes: 1.1.10
    cxservices: 1.0
    cycler: 0.10.0
    Cython: 0.29.23
    decorator: 4.4.2
    distlib: 0.3.1
    distro: 1.5.0
    docutils: 0.17.1
    filelock: 3.0.12
    funcparserlib: 0.3.6
    grako: 3.16.5
    h5py: 3.3.0
    html2text: 2020.1.16
    idna: 2.10
    ihm: 0.20
    imagecodecs: 2021.4.28
    imagesize: 1.2.0
    ipykernel: 5.5.5
    ipython: 7.23.1
    ipython-genutils: 0.2.0
    jedi: 0.18.0
    Jinja2: 2.11.3
    jupyter-client: 6.1.12
    jupyter-core: 4.7.1
    kiwisolver: 1.3.1
    line-profiler: 3.3.0
    lxml: 4.6.3
    lz4: 3.1.3
    MarkupSafe: 1.1.1
    matplotlib: 3.4.2
    matplotlib-inline: 0.1.2
    msgpack: 1.0.2
    netCDF4: 1.5.6
    networkx: 2.5.1
    numexpr: 2.7.3
    numpy: 1.21.0
    numpydoc: 1.1.0
    OpenMM: 7.6.0
    openvr: 1.16.801
    packaging: 21.0
    ParmEd: 3.2.0
    parso: 0.8.2
    pexpect: 4.8.0
    pickleshare: 0.7.5
    Pillow: 8.2.0
    pip: 21.1.1
    pkginfo: 1.7.0
    prompt-toolkit: 3.0.19
    psutil: 5.8.0
    ptyprocess: 0.7.0
    pycollada: 0.7.1
    pydicom: 2.1.2
    Pygments: 2.9.0
    PyOpenGL: 3.1.5
    PyOpenGL-accelerate: 3.1.5
    pyparsing: 2.4.7
    PyQt5-commercial: 5.15.2
    PyQt5-sip: 12.8.1
    PyQtWebEngine-commercial: 5.15.2
    python-dateutil: 2.8.1
    pytz: 2021.1
    pyzmq: 22.1.0
    qtconsole: 5.1.0
    QtPy: 1.9.0
    RandomWords: 0.3.0
    requests: 2.25.1
    scipy: 1.6.3
    setuptools: 57.0.0
    sfftk-rw: 0.7.0.post1
    six: 1.16.0
    snowballstemmer: 2.1.0
    sortedcontainers: 2.4.0
    Sphinx: 4.0.1
    sphinxcontrib-applehelp: 1.0.2
    sphinxcontrib-blockdiag: 2.0.0
    sphinxcontrib-devhelp: 1.0.2
    sphinxcontrib-htmlhelp: 2.0.0
    sphinxcontrib-jsmath: 1.0.1
    sphinxcontrib-qthelp: 1.0.3
    sphinxcontrib-serializinghtml: 1.1.5
    suds-jurko: 0.6
    tables: 3.6.1
    tifffile: 2021.4.8
    tinyarray: 1.2.3
    tornado: 6.1
    traitlets: 5.0.5
    urllib3: 1.26.6
    wcwidth: 0.2.5
    webcolors: 1.11.1
    wheel: 0.36.2
    wheel-filename: 1.3.0

Attachments (3)

insr_partitions.jpg (163.1 KB ) - added by Tristan Croll 4 years ago.
Added by email2trac
heatmap.png (1.9 MB ) - added by Tom Goddard 4 years ago.
q5vsl9_domains.png (236.0 KB ) - added by Tom Goddard 4 years ago.
Domain coloring for AlphaFold database entry Q5VSL9 showing one core domain, and 3 disordered regions.

Change History (28)

comment:1 by pett, 4 years ago

Component: UnassignedInput/Output
Owner: set to Tom Goddard
Platform: all
Project: ChimeraX
Status: newassigned
Summary: ChimeraX bug report submissionAlphafold: fetch the predicted aligned error file
Type: defectenhancement

comment:2 by Tom Goddard, 4 years ago

Wondering how you want to use this. Shouldn't the code that wants to use it, for example in ISOLDE to generate restraints fetch the file? The uniprot ids can be obtained from the new chimerax.atomic uniprot_ids() method which is what the alphafold fetch uses to read uniprot ids from mmCIF and PDB file headers. And the chimerax.core.fetch retrieve_url() routine can fetch the file, or if you want it cached locally the fetch_file() routine.

in reply to:  3 ; comment:3 by Tristan Croll, 4 years ago

Fair enough points. I was thinking that having it in the AlphaFold plugin would minimise code duplication since the framework is already there - but the way you describe it here I suppose there's not much code to duplicate!

Apart from my main aim of using it to better weight distance restraints, another use I can think of would be partitioning the prediction into distinct domains (that you could then overlay separately onto the relevant parts of the reference model). But that would be a bit of work, and probably quite hard to make fast.
________________________________
From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
Sent: 28 July 2021 18:04
Cc: goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk>
Subject: Re: [ChimeraX] #4966: Alphafold: fetch the predicted aligned error file

#4966: Alphafold: fetch the predicted aligned error file
------------------------------------+-------------------------
          Reporter:  Tristan Croll  |      Owner:  Tom Goddard
              Type:  enhancement    |     Status:  assigned
          Priority:  normal         |  Milestone:
         Component:  Input/Output   |    Version:
        Resolution:                 |   Keywords:
        Blocked By:                 |   Blocking:
Notify when closed:                 |   Platform:  all
           Project:  ChimeraX       |
------------------------------------+-------------------------

Comment (by Tom Goddard):

 Wondering how you want to use this.  Shouldn't the code that wants to use
 it, for example in ISOLDE to generate restraints fetch the file?  The
 uniprot ids can be obtained from the new chimerax.atomic uniprot_ids()
 method which is what the alphafold fetch uses to read uniprot ids from
 mmCIF and PDB file headers.  And the chimerax.core.fetch  retrieve_url()
 routine can fetch the file, or if you want it cached locally the
 fetch_file() routine.

--
Ticket URL: <https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/4966#comment:2>
ChimeraX <https://www.rbvi.ucsf.edu/chimerax/>
ChimeraX Issue Tracker

in reply to:  4 ; comment:4 by Tristan Croll, 4 years ago

Basic working implementation (fetching the PAE matrix and using it to partition a model into domains) at https://github.com/tristanic/isolde/tree/master/isolde/src/reference_model/alphafold. Requires NetworkX to be updated to 2.6.2 (adds a "resolution" keyword to the greedy_modularity_communities() method controlling the stringency of the clustering. Attached example for the insulin receptor (P06213) using the default arguments. Actually pretty quick, all told (a few seconds).
________________________________
From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
Sent: 28 July 2021 18:10
To: goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk>
Subject: Re: [ChimeraX] #4966: Alphafold: fetch the predicted aligned error file

#4966: Alphafold: fetch the predicted aligned error file
------------------------------------+-------------------------
          Reporter:  Tristan Croll  |      Owner:  Tom Goddard
              Type:  enhancement    |     Status:  assigned
          Priority:  normal         |  Milestone:
         Component:  Input/Output   |    Version:
        Resolution:                 |   Keywords:
        Blocked By:                 |   Blocking:
Notify when closed:                 |   Platform:  all
           Project:  ChimeraX       |
------------------------------------+-------------------------

Comment (by Tristan Croll):

 {{{
 Fair enough points. I was thinking that having it in the AlphaFold plugin
 would minimise code duplication since the framework is already there - but
 the way you describe it here I suppose there's not much code to duplicate!

 Apart from my main aim of using it to better weight distance restraints,
 another use I can think of would be partitioning the prediction into
 distinct domains (that you could then overlay separately onto the relevant
 parts of the reference model). But that would be a bit of work, and
 probably quite hard to make fast.
 ________________________________
 From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
 Sent: 28 July 2021 18:04
 Cc: goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>; Tristan Croll
 <tic20@cam.ac.uk>
 Subject: Re: [ChimeraX] #4966: Alphafold: fetch the predicted aligned
 error file

 #4966: Alphafold: fetch the predicted aligned error file
 ------------------------------------+-------------------------
           Reporter:  Tristan Croll  |      Owner:  Tom Goddard
               Type:  enhancement    |     Status:  assigned
           Priority:  normal         |  Milestone:
          Component:  Input/Output   |    Version:
         Resolution:                 |   Keywords:
         Blocked By:                 |   Blocking:
 Notify when closed:                 |   Platform:  all
            Project:  ChimeraX       |
 ------------------------------------+-------------------------

 Comment (by Tom Goddard):

  Wondering how you want to use this.  Shouldn't the code that wants to use
  it, for example in ISOLDE to generate restraints fetch the file?  The
  uniprot ids can be obtained from the new chimerax.atomic uniprot_ids()
  method which is what the alphafold fetch uses to read uniprot ids from
  mmCIF and PDB file headers.  And the chimerax.core.fetch  retrieve_url()
  routine can fetch the file, or if you want it cached locally the
  fetch_file() routine.

 --
 Ticket URL:
 <https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/4966#comment:2>
 ChimeraX <https://www.rbvi.ucsf.edu/chimerax/>
 ChimeraX Issue Tracker
 }}}

--
Ticket URL: <https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/4966#comment:3>
ChimeraX <https://www.rbvi.ucsf.edu/chimerax/>
ChimeraX Issue Tracker

insr_partitions.jpg

by Tristan Croll, 4 years ago

Attachment: insr_partitions.jpg added

Added by email2trac

in reply to:  6 comment:5 by pett, 4 years ago

Nice!

comment:6 by Tom Goddard, 4 years ago

Resolution: fixed
Status: assignedclosed

Cool. Seems to pick up several fragments (e.g. loops) that also are not well connected and so become separate domains. Some kind of domain identification would be nice to have in ChimeraX.

I think we may have an alphafold command with various subcommands to for instance fetch a model, run a structure prediction, color in various ways. This domain identification could be added to that. How will these domains be used? Probably need something beyond coloring, like naming them. And probably a user will want to fix the mistakes, joining the fragments to the domain they belong to.

A related ticket on improvements to fetching AlphaFold models is #4969.

in reply to:  8 comment:7 by Tristan Croll, 4 years ago

This isn’t true “domain identification” as such - it knows nothing about covalent connectivity, for a start. Rather, it’s picking out the blocks that AlphaFold2 predicts will be approximately rigid. I can think of a few applications for this - for some it would make sense to roll in the loops, but for others they’re best left out:

- molecular replacement relies heavily on finding the biggest fragment(s) that will overlay rigidly with the structure in the crystal itself (which of course you don’t know ahead of time in most cases). Current methods tend to rely on manual identification of domains, then pruning by B-factor and truncating non-identical sidechains. Where there are multiple existing homologues it’s possible to do better by pruning to the parts that rigidly overlay each other, but there’s still room for improvement. Looks like the AF2 models could dramatically simplify that - decompose into domains based on predicted aligned error, then further prune based on predicted LDDT.

- in a similar way, this could help people docking into cryo-EM density. Dock the individual rigid components then build in the connections - or perhaps even better, use the docking results to set targets for ISOLDE to flexibly fit the full chain.

- for your existing “fetch AF2 models and overlay on the experimental structure” tool, this could let you identify big conformational differences and provide separate overlays for the different components. For example in 6o85, chains A and B each have a C-terminal domain connected by a long flexible linker - the AF2 model aligns to the N-terminus leaving the C-terminus in completely the wrong place.

in reply to:  9 ; comment:8 by Tristan Croll, 4 years ago

Case in point: in the image I attached of the insulin receptor model, that enormous dark cyan domain on the upper left is actually the kinase domain (and the bright-cyan helix on lower left is the transmembrane domain) - since AF2 knows nothing about membranes it's done some pretty weird gymnastics with these.
________________________________
From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
Sent: 30 July 2021 08:56
To: goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk>
Subject: Re: [ChimeraX] #4966: Alphafold: fetch the predicted aligned error file

#4966: Alphafold: fetch the predicted aligned error file
------------------------------------+-------------------------
          Reporter:  Tristan Croll  |      Owner:  Tom Goddard
              Type:  enhancement    |     Status:  closed
          Priority:  normal         |  Milestone:
         Component:  Input/Output   |    Version:
        Resolution:  fixed          |   Keywords:
        Blocked By:                 |   Blocking:
Notify when closed:                 |   Platform:  all
           Project:  ChimeraX       |
------------------------------------+-------------------------

Comment (by Tristan Croll):

 {{{
 This isn’t true “domain identification” as such - it knows nothing about
 covalent connectivity, for a start. Rather, it’s picking out the blocks
 that AlphaFold2 predicts will be approximately rigid. I can think of a few
 applications for this - for some it would make sense to roll in the loops,
 but for others they’re best left out:

 - molecular replacement relies heavily on finding the biggest fragment(s)
 that will overlay rigidly with the structure in the crystal itself (which
 of course you don’t know ahead of time in most cases). Current methods
 tend to rely on manual identification of domains, then pruning by B-factor
 and truncating non-identical sidechains. Where there are multiple existing
 homologues it’s possible to do better by pruning to the parts that rigidly
 overlay each other, but there’s still room for improvement. Looks like the
 AF2 models could dramatically simplify that - decompose into domains based
 on predicted aligned error, then further prune based on predicted LDDT.

 - in a similar way, this could help people docking into cryo-EM density.
 Dock the individual rigid components then build in the connections - or
 perhaps even better, use the docking results to set targets for ISOLDE to
 flexibly fit the full chain.

 - for your existing “fetch AF2 models and overlay on the experimental
 structure” tool, this could let you identify big conformational
 differences and provide separate overlays for the different components.
 For example in 6o85, chains A and B each have a C-terminal domain
 connected by a long flexible linker - the AF2 model aligns to the
 N-terminus leaving the C-terminus in completely the wrong place.

 }}}

--
Ticket URL: <https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/4966#comment:7>
ChimeraX <https://www.rbvi.ucsf.edu/chimerax/>
ChimeraX Issue Tracker

comment:9 by Tom Goddard, 4 years ago

Yes I see it could be useful for fitting rigid pieces or x-ray phasing searches.

comment:10 by Tristan Croll, 4 years ago

Today I tried out using python-igraph in place of NetworkX for the community clustering. Gives very similar results but is much, much faster - roughly 40X overall, if the PAE matrix has already been imported to a numpy array and adjust_weights_for_distance=False (this was an option I wanted to try out, but doesn't seem to add much in general). Can get an extra ~30% speedup if pae_cutoff is reduced from 5 to 3. Overall runtime comes down from 6-10 seconds to ~150-250ms for my two test cases (847 and ~2450 residues). Fast enough to potentially become an interactive tool with sliders for the settings.

comment:11 by Tristan Croll, 4 years ago

Would probably need python-igraph to be bundled within ChimeraX, though. For some reason, despite being distributed as Linux, Windows and MacOS wheels it still does some compilation on pip install (and fails in my ~vanilla CentOS 7 environment because it requires cmake 3).

comment:12 by Tristan Croll, 4 years ago

Made some more intelligent use of Numpy for the CA-CA distance matrix, bringing the impact of adjust_weights_for_distance=True down from ~2s to ~70ms.

comment:13 by Tom Goddard, 4 years ago

That is cool that igraph works well. But that is a nightmare that they don't allow you to install without build tools. Are you sure that is right? I have never seen a case where a binary wheel needs builds tools like cmake. Maybe it is that igraph has a dependency that is only available as a source wheel and needs cmake. I think we don't have cmake in our build environment and have rejected other packages that would need it to build. You know my position from OpenMM -- I am against including packages where the developers of those packages don't have the time to make an easy to use installation. We can't afford to do the work for them to make their packages usable.

comment:14 by Tristan Croll, 4 years ago

Hmm... my bad. The pip install failure was on my system python 3.7 - turns out the wheel file required a newer version of pip - but rather than tell me that, it was just falling back to grabbing the source tarball. The binary wheel installs just fine in ChimeraX.

comment:15 by Tom Goddard, 4 years ago

Great! So you can add the igraph dependency to your bundle. When we need graph manipulation for other tools we will definitely look at it as an alternative to networkx.

in reply to:  17 comment:16 by Tristan Croll, 4 years ago

The funny thing is that I don't actually have an immediate use for this domain parsing in ISOLDE itself (although it might come in handy later as part of a toolkit for building up models into maps from scratch). But Airlie McCoy and Claudia Nebot in the lab here are very keen to set up a ChimeraX front-end to Phaser TNG (the new, heavily rewritten and re-thought version of Phaser: https://journals.iucr.org/d/issues/2021/01/00/ba5309/index.html). Still very early days on that one, but this will have clear applications there as a tool to generate suitable molecular replacement search models from a predicted structure.

________________________________
From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
Sent: 31 August 2021 17:35
Cc: goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk>
Subject: Re: [ChimeraX] #4966: Alphafold: fetch the predicted aligned error file

#4966: Alphafold: fetch the predicted aligned error file
------------------------------------+-------------------------
          Reporter:  Tristan Croll  |      Owner:  Tom Goddard
              Type:  enhancement    |     Status:  closed
          Priority:  normal         |  Milestone:
         Component:  Input/Output   |    Version:
        Resolution:  fixed          |   Keywords:
        Blocked By:                 |   Blocking:
Notify when closed:                 |   Platform:  all
           Project:  ChimeraX       |
------------------------------------+-------------------------

Comment (by Tom Goddard):

 Great!  So you can add the igraph dependency to your bundle.  When we need
 graph manipulation for other tools we will definitely look at it as an
 alternative to networkx.

--
Ticket URL: <https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/4966#comment:15>
ChimeraX <https://www.rbvi.ucsf.edu/chimerax/>
ChimeraX Issue Tracker

comment:17 by Tom Goddard, 4 years ago

Ok. For cryoEM maps it could be useful to move domains to match the map to get an initial model. How to select the domains becomes a basic issue.

comment:18 by Tristan Croll, 4 years ago

The way I'm doing it now is to define a new per-residue integer property (currently Residue.isolde_domain) and give all residues in a given cluster the same isolde_domain value. Makes a good handle for selecting the domains later. The general idea I'm thinking of would be to split the domains to be used for rigid docking into separate models, dock them, and then use the docked positions as targets to steer the (restrained with local distance and torsion restraints) complete chain in a simulation, allowing the connecting regions to adjust to suit.

Fast global docking into cryo-EM maps (and improved docking into difficult cryo-EM density) is one of the goals of the new Phaser, by the way.

comment:19 by Tom Goddard, 4 years ago

That sounds like a reasonable plan. I was thinking of using the split command to make separate domain models, then doing rigid fitting with the fitmap command, the copying the coordinates from the domains back to the unsplit structure using the mcopy command (which needs to be ported from Chimera to ChimeraX). This would just be done by the user by hand.

in reply to:  21 comment:20 by Tristan Croll, 4 years ago

I wonder if the ChimeraX model morphing code could be used to help define a more meaningful steering trajectory than simple position restraints, for the cases where domains have to rotate substantially?
________________________________
From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
Sent: 31 August 2021 22:09
Cc: goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk>
Subject: Re: [ChimeraX] #4966: Alphafold: fetch the predicted aligned error file

#4966: Alphafold: fetch the predicted aligned error file
------------------------------------+-------------------------
          Reporter:  Tristan Croll  |      Owner:  Tom Goddard
              Type:  enhancement    |     Status:  closed
          Priority:  normal         |  Milestone:
         Component:  Input/Output   |    Version:
        Resolution:  fixed          |   Keywords:
        Blocked By:                 |   Blocking:
Notify when closed:                 |   Platform:  all
           Project:  ChimeraX       |
------------------------------------+-------------------------

Comment (by Tom Goddard):

 That sounds like a reasonable plan.  I was thinking of using the split
 command to make separate domain models, then doing rigid fitting with the
 fitmap command, the copying the coordinates from the domains back to the
 unsplit structure using the mcopy command (which needs to be ported from
 Chimera to ChimeraX).  This would just be done by the user by hand.

--
Ticket URL: <https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/4966#comment:19>
ChimeraX <https://www.rbvi.ucsf.edu/chimerax/>
ChimeraX Issue Tracker

comment:21 by Tom Goddard, 4 years ago

Morphing is just driving torsion angles. Possibly you could use the rigid rotation computed by the align command and apply a torque to the whole domain to rotate it. But maybe not necessary and I would first want to see a case where just position restraints has trouble and there is a large rotation before working on that.

comment:22 by Tom Goddard, 4 years ago

I added the ability to show AlphaFold predicted aligned error as a heat map, and to color residues by domains determined by Tristan's networkx cluster code. The heat map display allows dragging a box around any region on the heat map to color the corresponding residues. This is copied from the AlphaFold database web pages, including the green color scheme for the heat map. To fetch the PAE matrix data from the AlphaFold database there is a new "pae" option to the alphafold fetch command, for example

alphafold fetch q5vsl9 pae true

If instead you want to open the PAE JSON or pickle file from an AlphaFold run you can use the new command "alphafold pae", for instance,

alphafold pae result_model_1_multimer.pkl model #2

Here you specify which atomic structure model to associate with the PAE data. If only one atomic structure is open you can omit the model option.

Both these commands display the heat map panel which has buttons "Color PAE Domains" and "Color pLDDT". The first computes the domains and colors each a different (random) color. The second gives the standard AlphaFold blue-red per-residue pLDDT score coloring using the bfactor field of the structure.

I've attached an image of the new heat map panel.

by Tom Goddard, 4 years ago

Attachment: heatmap.png added

by Tom Goddard, 4 years ago

Attachment: q5vsl9_domains.png added

Domain coloring for AlphaFold database entry Q5VSL9 showing one core domain, and 3 disordered regions.

comment:23 by Tom Goddard, 4 years ago

I hope to show this PAE heatmap display in an AlphaFold SBGrid webinar talk I am giving on Tuesday March 29, 2022.

comment:24 by Tristan Croll, 4 years ago

This is most excellent, and will be really useful! I'm particularly thankful that you've added the ability to load the PAE matrix from a user-generated prediction. A couple of suggestions:

  • for the heatmap display I think it would be a nice touch to add a slider to adjust the maximum of the colormap range - while the PAE values go up to 32-ish, for many purposes the most interesting PAE values are below around 4A.
  • for the domain parsing, it would be a good idea to expose the resolution argument of the community clustering method in some way (perhaps as a combo box with a step size of 0.5 or so). It controls how aggressive the algorithm is when defining individual clusters, and the optimum does tend to vary a bit from case to case (and application to application) - I haven't found a really good one-size-fits-all, but useful values are generally in the range of 0.5-5.

comment:25 by Tom Goddard, 4 years ago

Thanks for the suggestions. I've added options to the "alphafold pae" command to set the range of PAE values for the colormap and also for the colors to use (range and palette options). And I added command options to do the domain coloring (option "color_domains") and specify graph resolution (option "clustering", default 0.5) and cutoff (option "connect_max_pae", default 5). I'll add options to the gui when it becomes clearer what options are most commonly useful. So far I did not find using more limited pae color range useful.

Note: See TracTickets for help on using tickets.