Opened 18 months ago
Last modified 18 months ago
#15100 assigned enhancement
diffplot: Cluster atomic models by projecting atom positions with UMAP
Reported by: | Tom Goddard | Owned by: | Tom Goddard |
---|---|---|---|
Priority: | moderate | Milestone: | |
Component: | Structure Comparison | Version: | |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Notify when closed: | Platform: | all | |
Project: | ChimeraX |
Description
Klim Verba, Katrina Black and Tom Goddard discussed on April 9, 2024 how to visualize differences in hundreds of B-Raf kinase structure from the PDB and AlphaFold.
Tom suggested using UMAP (we have R01 funding for exploring its uses) where we feed the C-alpha atom positions of residues near the binding site for each of the aligned structures as a vector (length 3 times the number of residues) and project to 2 dimensions. If different clusters are observed we can try to morph between them to more clearly see how they differ.
Change History (2)
comment:1 by , 18 months ago
comment:2 by , 18 months ago
Klim suggested adding the "drop" option. The default is "drop structures" which means structures that don't have the specified residues get dropped. The "drop residues" option instead drops any of the residues that are not found in all the structures.
From: Verba, Kliment
Subject: Re: Molecular breathing 2.0
Date: May 2, 2024 at 1:26:05 PM PDT
To: Tom Goddard, Black, Katrina
Hi Tom,
I have been playing around with this tool, its incredibly useful!
One thing that comes to mind, is there a way to have an option (like a checkbox) where one can select a mode where the residues that are missing from some structures get dropped (ie, residues selection automatically trimmed to inlcude all the structures) from the selection rather than the structure itself where the residues are missing. Effectively to do automatically what you describe in your tutorial, where you redo selection with a smaller set as to include more structures. Thinking back to old film cameras, its like to have two modes: "structure priority" vs "residue priority".
This would allow much more flexibility in the regions that one selects for structural clustering as right now one has to be very careful to make sure the structures one cares about have all the residues in the selection. As you know, often there will be one or two residues in a loop missing which may be completely irrelevant for structural analysis but not having them will drop those structures from the analysis.
Thanks!
Klim
I implemented a command "diffplot" that does this. Here is the current syntax
An example showing how it is used is here