Opened 3 weeks ago
Last modified 3 weeks ago
#19165 assigned enhancement
Improving usability of deep mutational scan visualization tools
| Reported by: | Tom Goddard | Owned by: | Tom Goddard |
|---|---|---|---|
| Priority: | moderate | Milestone: | |
| Component: | Structure Analysis | Version: | |
| Keywords: | Cc: | Willow.Coyote-Maestas@… | |
| Blocked By: | Blocking: | ||
| Notify when closed: | Platform: | all | |
| Project: | ChimeraX |
Description
Willow and I discussed for an hour today a dozen aspects of how to visualize and explore deep mutational scan data, looking at his opioid receptor data with 12 scores per mutation (11 drugs plus surface expression).
We looked at the trace plots I tried in #19444 and at heatmaps, discussed DNA codons used in DMS experiments and possible effects of the codon choices, and Willow showed me the features of ChimeraX he uses most often.
We agreed that the most useful next steps on ChimeraX DMS tools would be making some current capabilities that are heavily used simpler. Here are some of the user interfaces which would be valuable to simplify that Willow mentioned.
1) Render by attribute to show residue scores. It involves choosing Residues to even see the DMS score names, then menu actions to choose the scores. Ideally would want to just be clicking on score names to quickly switch between for instance different opioid receptor drug scores. The ability to adjust the coloring ranges is valuable.
2) A few of the menu entries on the scatter plots are heavily used and most others very rarely used. "Color mutations for R181" (2nd entry in menu) most used. "Select" (structure residue R181). "Clear plot colors" followed by "Color synonymous mutations blue".
3) Command to define new scores including filtering to a subset of residues requires always refering to the documentation: e.g. 'mut def dox_only from dox ranges "(dox >= 0.4 or dox <= -0.5) and -0.3 <= mtx <= 0.3 and -0.8 <= sn38 <= 0.8"' and requires lots of careful typing with often very long score names (e.g. effect_buprenorphine). Then the score is use with Render by Attribute to color structures.
We discussed making more tailored GUI interfaces to do these most often used tasks.
We also talked about heatmaps. They can show all residues on the horizontal axis and 20 amino acid mutations on vertical axis. Or they can show assay score names on the vertical axis and be colored by a mean score for each residue for all mutations of that residue. Ordering the assay scores on the horizontal axis using a hierarchical clustering so most similar heatmap rows and grouped together is helpful. Clustering of the residue columns can also be useful to show residues showing a similar vector of scores for all drugs (assay scores) could be useful. Heatmap display and interactive capabilities are lower priority than optimizing current capabilities.
Change History (2)
comment:1 by , 3 weeks ago
comment:2 by , 3 weeks ago
A special-purpose stripped down version of Render By Attribute could be the way to go, but one small improvement to the current tool that would help a bit is to make the kinds of attributes you looked at "sticky", so if the last time you used the tool you were looking at residue attributes, it would start the menu off set to residue attributes.
Here are Willow's notes about today's discussion:
From: "Coyote-Maestas, Willow"
Subject: Summary of meeting ideas from my perspective
Date: October 15, 2025 at 1:07:21 PM PDT
To: Tom Goddard
Hey Tom,
Lovely chatting as always. So fun! Here are the notes I wrote down. I didn't necessarily put priorities on here for better or worse. In my opinion one thing that could be nice is having an interactive workflow going from scatterplots to the new line plots you've made to look at scores that then you could use those scores to then map onto the structure and perhaps go back and forth across those. Atleast I recall getting really hyped about that idea. Notes below.
Ideas from discussion:
Improving usability
(1) Scatterplot:
Making more useful features from the right click menu easier to access. To me, these are the ‘color mutations for residue’, color synonymous blue (often these get written over when coloring mutations for a residue, and so I have to uncolor and recolor these), ‘select’ these allows me to identify where this residue is in the structure.
(2) Defining subsets of mutations based on multiple scores. This is the mutationscores define command. It’s pretty much the most frequently used command but determining subsets of mutations to calculate scores is extremely cumbersome. It involves two steps currently where you define the subset of mutations then calculate the score. For defining the subsets of mutations perhaps we can drag a box on the scatterplot. Having the written command could still be useful if one wanted to define sets across more than 2 phenotypes at once.
(3) Scoring: Currently this is done using the subsets of mutations or the total dataset for a given score by command line. Having a more interactive or straightforward way of determining scores would be useful.
(4) Coloring the structure. Currently we use the render attribute panel. This works great but is confusing for many people because the default is ‘atoms’ and not ‘residues’ so oftentimes they don’t know what is associated. Further, one challenge is that it’s not quick to swap mappings if we wanted to directly compare two different sets of scores quickly. Probably having perhaps a similar function but with less features may be helpful.
(5) Normalizing across scores. Technically, this feauture is in chimeraX but I have no idea how to use it. From discussing with you I am also not sure it's the best implementation of this. Instead, perhaps fitting a loess curve to the entire data or just the synonymous variants then calculating the minimal distance (euclidean distnace) from that line.
New functionalities:
Line/barplot: (1) Having the ability to directly map scores from line plot onto structures.
(2) Being able to normalize scores interactively in this plot
(3) Plotting scores for subsets of mutations vs all mutations.
(4) Zooming in and subsetting.
(5) Calculating alternative summary statistics such as mean, median, etc.
Heatmap:
(1) To map dms scores as normal,
(2) to calculate summary statistics
(3) Hierarchical clustering then coloring residues based on hierarchical clusters across positions. This could be used for determining which sets of residues behave similarly.