#4966 closed enhancement (fixed)
Alphafold: fetch the predicted aligned error file
Reported by: | Tristan Croll | Owned by: | Tom Goddard |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Input/Output | Version: | |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Notify when closed: | Platform: | all | |
Project: | ChimeraX |
Description
The following bug report has been submitted: Platform: Linux-3.10.0-1160.25.1.el7.x86_64-x86_64-with-glibc2.17 ChimeraX Version: 1.3.dev202107240347 (2021-07-24 03:47:26 UTC) Description Really loving the new alphafold plugin! Just wondering... would you be willing to add a small convenience function to fetch the predicted aligned error file (has the format AF-{UniProt ID}-F1-predicted_aligned_error_v1.json) via the same API? This probably won't be useful for visualisation, but provides really useful information for weighting distance restraints (it's a 2D matrix giving the predicted error in CA-CA distance for every pair of residues in the model). OpenGL version: 3.3.0 NVIDIA 465.19.01 OpenGL renderer: NVIDIA TITAN Xp/PCIe/SSE2 OpenGL vendor: NVIDIA Corporation Manufacturer: Dell Inc. Model: Precision T5600 OS: CentOS Linux 7 Core Architecture: 64bit ELF Virutal Machine: none CPU: 32 Intel(R) Xeon(R) CPU E5-2687W 0 @ 3.10GHz Cache Size: 20480 KB Memory: total used free shared buff/cache available Mem: 62G 23G 22G 320M 16G 38G Swap: 4.9G 0B 4.9G Graphics: 03:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP102 [TITAN Xp] [10de:1b02] (rev a1) Subsystem: NVIDIA Corporation Device [10de:11df] Kernel driver in use: nvidia Locale: ('en_GB', 'UTF-8') PyQt5 5.15.2, Qt 5.15.2 Installed Packages: alabaster: 0.7.12 appdirs: 1.4.4 Babel: 2.9.1 backcall: 0.2.0 blockdiag: 2.0.1 certifi: 2021.5.30 cftime: 1.5.0 chardet: 4.0.0 ChimeraX-AddCharge: 1.1.4 ChimeraX-AddH: 2.1.10 ChimeraX-AlignmentAlgorithms: 2.0 ChimeraX-AlignmentHdrs: 3.2 ChimeraX-AlignmentMatrices: 2.0 ChimeraX-Alignments: 2.1 ChimeraX-AlphaFold: 1.0 ChimeraX-AltlocExplorer: 1.0 ChimeraX-AmberInfo: 1.0 ChimeraX-Arrays: 1.0 ChimeraX-Atomic: 1.27.1 ChimeraX-AtomicLibrary: 4.0 ChimeraX-AtomSearch: 2.0 ChimeraX-AtomSearchLibrary: 1.0 ChimeraX-AxesPlanes: 2.0 ChimeraX-BasicActions: 1.1 ChimeraX-BILD: 1.0 ChimeraX-BlastProtein: 1.1.1 ChimeraX-BondRot: 2.0 ChimeraX-BugReporter: 1.0 ChimeraX-BuildStructure: 2.5.2 ChimeraX-Bumps: 1.0 ChimeraX-BundleBuilder: 1.1 ChimeraX-ButtonPanel: 1.0 ChimeraX-CageBuilder: 1.0 ChimeraX-CellPack: 1.0 ChimeraX-Centroids: 1.1 ChimeraX-ChemGroup: 2.0 ChimeraX-Clashes: 2.1 ChimeraX-Clipper: 0.17.0 ChimeraX-ColorActions: 1.0 ChimeraX-ColorGlobe: 1.0 ChimeraX-ColorKey: 1.3.2 ChimeraX-CommandLine: 1.1.4 ChimeraX-ConnectStructure: 2.0 ChimeraX-Contacts: 1.0 ChimeraX-Core: 1.3.dev202107240347 ChimeraX-CoreFormats: 1.0 ChimeraX-coulombic: 1.3 ChimeraX-Crosslinks: 1.0 ChimeraX-Crystal: 1.0 ChimeraX-CrystalContacts: 1.0 ChimeraX-DataFormats: 1.2 ChimeraX-Dicom: 1.0 ChimeraX-DistMonitor: 1.1.4 ChimeraX-DistUI: 1.0 ChimeraX-Dssp: 2.0 ChimeraX-EMDB-SFF: 1.0 ChimeraX-ExperimentalCommands: 1.0 ChimeraX-FileHistory: 1.0 ChimeraX-FunctionKey: 1.0 ChimeraX-Geometry: 1.1 ChimeraX-gltf: 1.0 ChimeraX-Graphics: 1.1 ChimeraX-Hbonds: 2.1 ChimeraX-Help: 1.1 ChimeraX-HKCage: 1.3 ChimeraX-IHM: 1.1 ChimeraX-ImageFormats: 1.1 ChimeraX-IMOD: 1.0 ChimeraX-IO: 1.0.1 ChimeraX-ISOLDE: 1.3.dev32 ChimeraX-ItemsInspection: 1.0 ChimeraX-Label: 1.1 ChimeraX-LinuxSupport: 1.0 ChimeraX-ListInfo: 1.1.1 ChimeraX-Log: 1.1.4 ChimeraX-LookingGlass: 1.1 ChimeraX-Maestro: 1.8.1 ChimeraX-Map: 1.1 ChimeraX-MapData: 2.0 ChimeraX-MapEraser: 1.0 ChimeraX-MapFilter: 2.0 ChimeraX-MapFit: 2.0 ChimeraX-MapSeries: 2.1 ChimeraX-Markers: 1.0 ChimeraX-Mask: 1.0 ChimeraX-MatchMaker: 1.2.1 ChimeraX-MDcrds: 2.4 ChimeraX-MedicalToolbar: 1.0.1 ChimeraX-Meeting: 1.0 ChimeraX-MLP: 1.1 ChimeraX-mmCIF: 2.3 ChimeraX-MMTF: 2.1 ChimeraX-Modeller: 1.0.2 ChimeraX-ModelPanel: 1.1 ChimeraX-ModelSeries: 1.0 ChimeraX-Mol2: 2.0 ChimeraX-Morph: 1.0 ChimeraX-MouseModes: 1.1 ChimeraX-Movie: 1.0 ChimeraX-Neuron: 1.0 ChimeraX-Nucleotides: 2.0.2 ChimeraX-OpenCommand: 1.6.2 ChimeraX-PDB: 2.4.4 ChimeraX-PDBBio: 1.0 ChimeraX-PDBLibrary: 1.0.1 ChimeraX-PDBMatrices: 1.0 ChimeraX-Phenix: 0.3 ChimeraX-PickBlobs: 1.0 ChimeraX-Positions: 1.0 ChimeraX-PresetMgr: 1.0.1 ChimeraX-PubChem: 2.1 ChimeraX-ReadPbonds: 1.0 ChimeraX-Registration: 1.1 ChimeraX-RemoteControl: 1.0 ChimeraX-ResidueFit: 1.0 ChimeraX-RestServer: 1.1 ChimeraX-RNALayout: 1.0 ChimeraX-RotamerLibMgr: 2.0 ChimeraX-RotamerLibsDunbrack: 2.0 ChimeraX-RotamerLibsDynameomics: 2.0 ChimeraX-RotamerLibsRichardson: 2.0 ChimeraX-Sample: 0.1 ChimeraX-SaveCommand: 1.4.1 ChimeraX-SchemeMgr: 1.0 ChimeraX-SDF: 2.0 ChimeraX-Segger: 1.0 ChimeraX-Segment: 1.0 ChimeraX-SelInspector: 1.0 ChimeraX-SeqView: 2.4.1 ChimeraX-Shape: 1.0.1 ChimeraX-Shell: 1.0 ChimeraX-Shortcuts: 1.1 ChimeraX-ShowAttr: 1.0 ChimeraX-ShowSequences: 1.0 ChimeraX-SideView: 1.0 ChimeraX-Smiles: 2.1 ChimeraX-SmoothLines: 1.0 ChimeraX-SpaceNavigator: 1.0 ChimeraX-StdCommands: 1.6 ChimeraX-STL: 1.0 ChimeraX-Storm: 1.0 ChimeraX-Struts: 1.0 ChimeraX-Surface: 1.0 ChimeraX-SwapAA: 2.0 ChimeraX-SwapRes: 2.1 ChimeraX-TapeMeasure: 1.0 ChimeraX-Test: 1.0 ChimeraX-Toolbar: 1.1 ChimeraX-ToolshedUtils: 1.2 ChimeraX-Tug: 1.0 ChimeraX-UI: 1.10.1 ChimeraX-uniprot: 2.1 ChimeraX-UnitCell: 1.0 ChimeraX-ViewDockX: 1.0.1 ChimeraX-Vive: 1.1 ChimeraX-VolumeMenu: 1.0 ChimeraX-Voyager: 0.1 ChimeraX-VTK: 1.0 ChimeraX-WavefrontOBJ: 1.0 ChimeraX-WebCam: 1.0 ChimeraX-WebServices: 1.0 ChimeraX-Zone: 1.0 colorama: 0.4.4 comtypes: 1.1.10 cxservices: 1.0 cycler: 0.10.0 Cython: 0.29.23 decorator: 4.4.2 distlib: 0.3.1 distro: 1.5.0 docutils: 0.17.1 filelock: 3.0.12 funcparserlib: 0.3.6 grako: 3.16.5 h5py: 3.3.0 html2text: 2020.1.16 idna: 2.10 ihm: 0.20 imagecodecs: 2021.4.28 imagesize: 1.2.0 ipykernel: 5.5.5 ipython: 7.23.1 ipython-genutils: 0.2.0 jedi: 0.18.0 Jinja2: 2.11.3 jupyter-client: 6.1.12 jupyter-core: 4.7.1 kiwisolver: 1.3.1 line-profiler: 3.3.0 lxml: 4.6.3 lz4: 3.1.3 MarkupSafe: 1.1.1 matplotlib: 3.4.2 matplotlib-inline: 0.1.2 msgpack: 1.0.2 netCDF4: 1.5.6 networkx: 2.5.1 numexpr: 2.7.3 numpy: 1.21.0 numpydoc: 1.1.0 OpenMM: 7.6.0 openvr: 1.16.801 packaging: 21.0 ParmEd: 3.2.0 parso: 0.8.2 pexpect: 4.8.0 pickleshare: 0.7.5 Pillow: 8.2.0 pip: 21.1.1 pkginfo: 1.7.0 prompt-toolkit: 3.0.19 psutil: 5.8.0 ptyprocess: 0.7.0 pycollada: 0.7.1 pydicom: 2.1.2 Pygments: 2.9.0 PyOpenGL: 3.1.5 PyOpenGL-accelerate: 3.1.5 pyparsing: 2.4.7 PyQt5-commercial: 5.15.2 PyQt5-sip: 12.8.1 PyQtWebEngine-commercial: 5.15.2 python-dateutil: 2.8.1 pytz: 2021.1 pyzmq: 22.1.0 qtconsole: 5.1.0 QtPy: 1.9.0 RandomWords: 0.3.0 requests: 2.25.1 scipy: 1.6.3 setuptools: 57.0.0 sfftk-rw: 0.7.0.post1 six: 1.16.0 snowballstemmer: 2.1.0 sortedcontainers: 2.4.0 Sphinx: 4.0.1 sphinxcontrib-applehelp: 1.0.2 sphinxcontrib-blockdiag: 2.0.0 sphinxcontrib-devhelp: 1.0.2 sphinxcontrib-htmlhelp: 2.0.0 sphinxcontrib-jsmath: 1.0.1 sphinxcontrib-qthelp: 1.0.3 sphinxcontrib-serializinghtml: 1.1.5 suds-jurko: 0.6 tables: 3.6.1 tifffile: 2021.4.8 tinyarray: 1.2.3 tornado: 6.1 traitlets: 5.0.5 urllib3: 1.26.6 wcwidth: 0.2.5 webcolors: 1.11.1 wheel: 0.36.2 wheel-filename: 1.3.0
Attachments (3)
Change History (28)
comment:1 by , 4 years ago
Component: | Unassigned → Input/Output |
---|---|
Owner: | set to |
Platform: | → all |
Project: | → ChimeraX |
Status: | new → assigned |
Summary: | ChimeraX bug report submission → Alphafold: fetch the predicted aligned error file |
Type: | defect → enhancement |
comment:2 by , 4 years ago
follow-up: 3 comment:3 by , 4 years ago
Fair enough points. I was thinking that having it in the AlphaFold plugin would minimise code duplication since the framework is already there - but the way you describe it here I suppose there's not much code to duplicate! Apart from my main aim of using it to better weight distance restraints, another use I can think of would be partitioning the prediction into distinct domains (that you could then overlay separately onto the relevant parts of the reference model). But that would be a bit of work, and probably quite hard to make fast. ________________________________ From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu> Sent: 28 July 2021 18:04 Cc: goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk> Subject: Re: [ChimeraX] #4966: Alphafold: fetch the predicted aligned error file #4966: Alphafold: fetch the predicted aligned error file ------------------------------------+------------------------- Reporter: Tristan Croll | Owner: Tom Goddard Type: enhancement | Status: assigned Priority: normal | Milestone: Component: Input/Output | Version: Resolution: | Keywords: Blocked By: | Blocking: Notify when closed: | Platform: all Project: ChimeraX | ------------------------------------+------------------------- Comment (by Tom Goddard): Wondering how you want to use this. Shouldn't the code that wants to use it, for example in ISOLDE to generate restraints fetch the file? The uniprot ids can be obtained from the new chimerax.atomic uniprot_ids() method which is what the alphafold fetch uses to read uniprot ids from mmCIF and PDB file headers. And the chimerax.core.fetch retrieve_url() routine can fetch the file, or if you want it cached locally the fetch_file() routine. -- Ticket URL: <https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/4966#comment:2> ChimeraX <https://www.rbvi.ucsf.edu/chimerax/> ChimeraX Issue Tracker
follow-up: 4 comment:4 by , 4 years ago
Basic working implementation (fetching the PAE matrix and using it to partition a model into domains) at https://github.com/tristanic/isolde/tree/master/isolde/src/reference_model/alphafold. Requires NetworkX to be updated to 2.6.2 (adds a "resolution" keyword to the greedy_modularity_communities() method controlling the stringency of the clustering. Attached example for the insulin receptor (P06213) using the default arguments. Actually pretty quick, all told (a few seconds). ________________________________ From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu> Sent: 28 July 2021 18:10 To: goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk> Subject: Re: [ChimeraX] #4966: Alphafold: fetch the predicted aligned error file #4966: Alphafold: fetch the predicted aligned error file ------------------------------------+------------------------- Reporter: Tristan Croll | Owner: Tom Goddard Type: enhancement | Status: assigned Priority: normal | Milestone: Component: Input/Output | Version: Resolution: | Keywords: Blocked By: | Blocking: Notify when closed: | Platform: all Project: ChimeraX | ------------------------------------+------------------------- Comment (by Tristan Croll): {{{ Fair enough points. I was thinking that having it in the AlphaFold plugin would minimise code duplication since the framework is already there - but the way you describe it here I suppose there's not much code to duplicate! Apart from my main aim of using it to better weight distance restraints, another use I can think of would be partitioning the prediction into distinct domains (that you could then overlay separately onto the relevant parts of the reference model). But that would be a bit of work, and probably quite hard to make fast. ________________________________ From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu> Sent: 28 July 2021 18:04 Cc: goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk> Subject: Re: [ChimeraX] #4966: Alphafold: fetch the predicted aligned error file #4966: Alphafold: fetch the predicted aligned error file ------------------------------------+------------------------- Reporter: Tristan Croll | Owner: Tom Goddard Type: enhancement | Status: assigned Priority: normal | Milestone: Component: Input/Output | Version: Resolution: | Keywords: Blocked By: | Blocking: Notify when closed: | Platform: all Project: ChimeraX | ------------------------------------+------------------------- Comment (by Tom Goddard): Wondering how you want to use this. Shouldn't the code that wants to use it, for example in ISOLDE to generate restraints fetch the file? The uniprot ids can be obtained from the new chimerax.atomic uniprot_ids() method which is what the alphafold fetch uses to read uniprot ids from mmCIF and PDB file headers. And the chimerax.core.fetch retrieve_url() routine can fetch the file, or if you want it cached locally the fetch_file() routine. -- Ticket URL: <https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/4966#comment:2> ChimeraX <https://www.rbvi.ucsf.edu/chimerax/> ChimeraX Issue Tracker }}} -- Ticket URL: <https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/4966#comment:3> ChimeraX <https://www.rbvi.ucsf.edu/chimerax/> ChimeraX Issue Tracker
follow-up: 5 comment:6 by , 4 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Cool. Seems to pick up several fragments (e.g. loops) that also are not well connected and so become separate domains. Some kind of domain identification would be nice to have in ChimeraX.
I think we may have an alphafold command with various subcommands to for instance fetch a model, run a structure prediction, color in various ways. This domain identification could be added to that. How will these domains be used? Probably need something beyond coloring, like naming them. And probably a user will want to fix the mistakes, joining the fragments to the domain they belong to.
A related ticket on improvements to fetching AlphaFold models is #4969.
comment:7 by , 4 years ago
This isn’t true “domain identification” as such - it knows nothing about covalent connectivity, for a start. Rather, it’s picking out the blocks that AlphaFold2 predicts will be approximately rigid. I can think of a few applications for this - for some it would make sense to roll in the loops, but for others they’re best left out: - molecular replacement relies heavily on finding the biggest fragment(s) that will overlay rigidly with the structure in the crystal itself (which of course you don’t know ahead of time in most cases). Current methods tend to rely on manual identification of domains, then pruning by B-factor and truncating non-identical sidechains. Where there are multiple existing homologues it’s possible to do better by pruning to the parts that rigidly overlay each other, but there’s still room for improvement. Looks like the AF2 models could dramatically simplify that - decompose into domains based on predicted aligned error, then further prune based on predicted LDDT. - in a similar way, this could help people docking into cryo-EM density. Dock the individual rigid components then build in the connections - or perhaps even better, use the docking results to set targets for ISOLDE to flexibly fit the full chain. - for your existing “fetch AF2 models and overlay on the experimental structure” tool, this could let you identify big conformational differences and provide separate overlays for the different components. For example in 6o85, chains A and B each have a C-terminal domain connected by a long flexible linker - the AF2 model aligns to the N-terminus leaving the C-terminus in completely the wrong place.
follow-up: 7 comment:8 by , 4 years ago
Case in point: in the image I attached of the insulin receptor model, that enormous dark cyan domain on the upper left is actually the kinase domain (and the bright-cyan helix on lower left is the transmembrane domain) - since AF2 knows nothing about membranes it's done some pretty weird gymnastics with these. ________________________________ From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu> Sent: 30 July 2021 08:56 To: goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk> Subject: Re: [ChimeraX] #4966: Alphafold: fetch the predicted aligned error file #4966: Alphafold: fetch the predicted aligned error file ------------------------------------+------------------------- Reporter: Tristan Croll | Owner: Tom Goddard Type: enhancement | Status: closed Priority: normal | Milestone: Component: Input/Output | Version: Resolution: fixed | Keywords: Blocked By: | Blocking: Notify when closed: | Platform: all Project: ChimeraX | ------------------------------------+------------------------- Comment (by Tristan Croll): {{{ This isn’t true “domain identification” as such - it knows nothing about covalent connectivity, for a start. Rather, it’s picking out the blocks that AlphaFold2 predicts will be approximately rigid. I can think of a few applications for this - for some it would make sense to roll in the loops, but for others they’re best left out: - molecular replacement relies heavily on finding the biggest fragment(s) that will overlay rigidly with the structure in the crystal itself (which of course you don’t know ahead of time in most cases). Current methods tend to rely on manual identification of domains, then pruning by B-factor and truncating non-identical sidechains. Where there are multiple existing homologues it’s possible to do better by pruning to the parts that rigidly overlay each other, but there’s still room for improvement. Looks like the AF2 models could dramatically simplify that - decompose into domains based on predicted aligned error, then further prune based on predicted LDDT. - in a similar way, this could help people docking into cryo-EM density. Dock the individual rigid components then build in the connections - or perhaps even better, use the docking results to set targets for ISOLDE to flexibly fit the full chain. - for your existing “fetch AF2 models and overlay on the experimental structure” tool, this could let you identify big conformational differences and provide separate overlays for the different components. For example in 6o85, chains A and B each have a C-terminal domain connected by a long flexible linker - the AF2 model aligns to the N-terminus leaving the C-terminus in completely the wrong place. }}} -- Ticket URL: <https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/4966#comment:7> ChimeraX <https://www.rbvi.ucsf.edu/chimerax/> ChimeraX Issue Tracker
follow-up: 8 comment:9 by , 4 years ago
Yes I see it could be useful for fitting rigid pieces or x-ray phasing searches.
comment:10 by , 4 years ago
Today I tried out using python-igraph
in place of NetworkX
for the community clustering. Gives very similar results but is much, much faster - roughly 40X overall, if the PAE matrix has already been imported to a numpy array and adjust_weights_for_distance=False
(this was an option I wanted to try out, but doesn't seem to add much in general). Can get an extra ~30% speedup if pae_cutoff
is reduced from 5 to 3. Overall runtime comes down from 6-10 seconds to ~150-250ms for my two test cases (847 and ~2450 residues). Fast enough to potentially become an interactive tool with sliders for the settings.
comment:11 by , 4 years ago
Would probably need python-igraph
to be bundled within ChimeraX, though. For some reason, despite being distributed as Linux, Windows and MacOS wheels it still does some compilation on pip install
(and fails in my ~vanilla CentOS 7 environment because it requires cmake
3).
comment:12 by , 4 years ago
Made some more intelligent use of Numpy for the CA-CA distance matrix, bringing the impact of adjust_weights_for_distance=True
down from ~2s to ~70ms.
comment:13 by , 4 years ago
That is cool that igraph works well. But that is a nightmare that they don't allow you to install without build tools. Are you sure that is right? I have never seen a case where a binary wheel needs builds tools like cmake. Maybe it is that igraph has a dependency that is only available as a source wheel and needs cmake. I think we don't have cmake in our build environment and have rejected other packages that would need it to build. You know my position from OpenMM -- I am against including packages where the developers of those packages don't have the time to make an easy to use installation. We can't afford to do the work for them to make their packages usable.
comment:14 by , 4 years ago
Hmm... my bad. The pip install
failure was on my system python 3.7 - turns out the wheel file required a newer version of pip
- but rather than tell me that, it was just falling back to grabbing the source tarball. The binary wheel installs just fine in ChimeraX.
comment:15 by , 4 years ago
Great! So you can add the igraph dependency to your bundle. When we need graph manipulation for other tools we will definitely look at it as an alternative to networkx.
comment:16 by , 4 years ago
The funny thing is that I don't actually have an immediate use for this domain parsing in ISOLDE itself (although it might come in handy later as part of a toolkit for building up models into maps from scratch). But Airlie McCoy and Claudia Nebot in the lab here are very keen to set up a ChimeraX front-end to Phaser TNG (the new, heavily rewritten and re-thought version of Phaser: https://journals.iucr.org/d/issues/2021/01/00/ba5309/index.html). Still very early days on that one, but this will have clear applications there as a tool to generate suitable molecular replacement search models from a predicted structure. ________________________________ From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu> Sent: 31 August 2021 17:35 Cc: goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk> Subject: Re: [ChimeraX] #4966: Alphafold: fetch the predicted aligned error file #4966: Alphafold: fetch the predicted aligned error file ------------------------------------+------------------------- Reporter: Tristan Croll | Owner: Tom Goddard Type: enhancement | Status: closed Priority: normal | Milestone: Component: Input/Output | Version: Resolution: fixed | Keywords: Blocked By: | Blocking: Notify when closed: | Platform: all Project: ChimeraX | ------------------------------------+------------------------- Comment (by Tom Goddard): Great! So you can add the igraph dependency to your bundle. When we need graph manipulation for other tools we will definitely look at it as an alternative to networkx. -- Ticket URL: <https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/4966#comment:15> ChimeraX <https://www.rbvi.ucsf.edu/chimerax/> ChimeraX Issue Tracker
follow-up: 16 comment:17 by , 4 years ago
Ok. For cryoEM maps it could be useful to move domains to match the map to get an initial model. How to select the domains becomes a basic issue.
comment:18 by , 4 years ago
The way I'm doing it now is to define a new per-residue integer property (currently Residue.isolde_domain
) and give all residues in a given cluster the same isolde_domain
value. Makes a good handle for selecting the domains later. The general idea I'm thinking of would be to split the domains to be used for rigid docking into separate models, dock them, and then use the docked positions as targets to steer the (restrained with local distance and torsion restraints) complete chain in a simulation, allowing the connecting regions to adjust to suit.
Fast global docking into cryo-EM maps (and improved docking into difficult cryo-EM density) is one of the goals of the new Phaser, by the way.
comment:19 by , 4 years ago
That sounds like a reasonable plan. I was thinking of using the split command to make separate domain models, then doing rigid fitting with the fitmap command, the copying the coordinates from the domains back to the unsplit structure using the mcopy command (which needs to be ported from Chimera to ChimeraX). This would just be done by the user by hand.
comment:20 by , 4 years ago
I wonder if the ChimeraX model morphing code could be used to help define a more meaningful steering trajectory than simple position restraints, for the cases where domains have to rotate substantially? ________________________________ From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu> Sent: 31 August 2021 22:09 Cc: goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk> Subject: Re: [ChimeraX] #4966: Alphafold: fetch the predicted aligned error file #4966: Alphafold: fetch the predicted aligned error file ------------------------------------+------------------------- Reporter: Tristan Croll | Owner: Tom Goddard Type: enhancement | Status: closed Priority: normal | Milestone: Component: Input/Output | Version: Resolution: fixed | Keywords: Blocked By: | Blocking: Notify when closed: | Platform: all Project: ChimeraX | ------------------------------------+------------------------- Comment (by Tom Goddard): That sounds like a reasonable plan. I was thinking of using the split command to make separate domain models, then doing rigid fitting with the fitmap command, the copying the coordinates from the domains back to the unsplit structure using the mcopy command (which needs to be ported from Chimera to ChimeraX). This would just be done by the user by hand. -- Ticket URL: <https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/4966#comment:19> ChimeraX <https://www.rbvi.ucsf.edu/chimerax/> ChimeraX Issue Tracker
follow-up: 20 comment:21 by , 4 years ago
Morphing is just driving torsion angles. Possibly you could use the rigid rotation computed by the align command and apply a torque to the whole domain to rotate it. But maybe not necessary and I would first want to see a case where just position restraints has trouble and there is a large rotation before working on that.
comment:22 by , 4 years ago
I added the ability to show AlphaFold predicted aligned error as a heat map, and to color residues by domains determined by Tristan's networkx cluster code. The heat map display allows dragging a box around any region on the heat map to color the corresponding residues. This is copied from the AlphaFold database web pages, including the green color scheme for the heat map. To fetch the PAE matrix data from the AlphaFold database there is a new "pae" option to the alphafold fetch command, for example
alphafold fetch q5vsl9 pae true
If instead you want to open the PAE JSON or pickle file from an AlphaFold run you can use the new command "alphafold pae", for instance,
alphafold pae result_model_1_multimer.pkl model #2
Here you specify which atomic structure model to associate with the PAE data. If only one atomic structure is open you can omit the model option.
Both these commands display the heat map panel which has buttons "Color PAE Domains" and "Color pLDDT". The first computes the domains and colors each a different (random) color. The second gives the standard AlphaFold blue-red per-residue pLDDT score coloring using the bfactor field of the structure.
I've attached an image of the new heat map panel.
by , 4 years ago
Attachment: | heatmap.png added |
---|
by , 4 years ago
Attachment: | q5vsl9_domains.png added |
---|
Domain coloring for AlphaFold database entry Q5VSL9 showing one core domain, and 3 disordered regions.
comment:23 by , 4 years ago
I hope to show this PAE heatmap display in an AlphaFold SBGrid webinar talk I am giving on Tuesday March 29, 2022.
comment:24 by , 4 years ago
This is most excellent, and will be really useful! I'm particularly thankful that you've added the ability to load the PAE matrix from a user-generated prediction. A couple of suggestions:
- for the heatmap display I think it would be a nice touch to add a slider to adjust the maximum of the colormap range - while the PAE values go up to 32-ish, for many purposes the most interesting PAE values are below around 4A.
- for the domain parsing, it would be a good idea to expose the
resolution
argument of the community clustering method in some way (perhaps as a combo box with a step size of 0.5 or so). It controls how aggressive the algorithm is when defining individual clusters, and the optimum does tend to vary a bit from case to case (and application to application) - I haven't found a really good one-size-fits-all, but useful values are generally in the range of 0.5-5.
comment:25 by , 4 years ago
Thanks for the suggestions. I've added options to the "alphafold pae" command to set the range of PAE values for the colormap and also for the colors to use (range and palette options). And I added command options to do the domain coloring (option "color_domains") and specify graph resolution (option "clustering", default 0.5) and cutoff (option "connect_max_pae", default 5). I'll add options to the gui when it becomes clearer what options are most commonly useful. So far I did not find using more limited pae color range useful.
Wondering how you want to use this. Shouldn't the code that wants to use it, for example in ISOLDE to generate restraints fetch the file? The uniprot ids can be obtained from the new chimerax.atomic uniprot_ids() method which is what the alphafold fetch uses to read uniprot ids from mmCIF and PDB file headers. And the chimerax.core.fetch retrieve_url() routine can fetch the file, or if you want it cached locally the fetch_file() routine.