Opened 4 years ago

Closed 4 years ago

#5228 closed defect (fixed)

Blastprotein: Visual fixes

Reported by: Zach Pearson Owned by: Zach Pearson
Priority: high Milestone:
Component: UI Version:
Keywords: Cc: Eric Pettersen, Elaine Meng
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description (last modified by Zach Pearson)

Paraphrased from Elaine's email:

1) Make buttons more like other dialog: i.e. get rid of "Blast" button and instead have Help, Apply (run calculation and keep dialog up), Reset (reset to initial default parameters), Close (dismiss), OK (run calculation and dismiss). See for example Hbonds (in menu under Tools... Structure Analysis).

(1a) Title bar should say "Blast Protein" instead of "Blastprotein"

(1b) Default matrix is wrong, should be BLOSUM62 not BLOSUM45

(2) title bar should say "Blast Protein Results" instead of "BlastProtein Results" (i.e. just insert a space).

(2b) Seems like the default name for the search is None, so one can end up with multiple results panels all with the same name. Is that a problem? (e.g. for sessions?) Maybe the default name should be automatically based on the query, e.g. set to the uniprot ID or structure-chain spec or first N chars of raw sequence, or sequential (bp1, bp1, bp3 ...)

(3) Initial sort order should be highest to lowest score. I was surprised to see it was not in this order. Do you need Score to be shown in the table to sort on it? If so, add it to the suggested default columns in #6 below, next to Evalue.

(4) There is some "processing hits 0/N" line at the bottom that seems to be taking space for no reason. I've never seen it say anything other than 0/N. What processing? maybe it's non-0 at some point but should be removed when it's no longer showing any progress.

(5a) More column name tweaks.
Replace all "Num" and "Nr" with "#" (or at least "Num Polymers" -> "Nr polymers" so they're consistent)
Chain Species -> Species

(5b)
See also #6, column width. Where the widest thing in the column is the header, maybe we can make it shorter, e.g.
Resolution -> Resln

See also below, if we get rid of some of the column choices, that allows further simplification.

(6) Default columns. We definitely don't want ALL of the checkboxes turned on by default. Also is it possible to control the left-to-right-order of columns? I recommend the following:
=-=-=-=-=-=
For AlphaFold, defaults (and left-to-right order):

Name
Evalue
[Score, if we need it for sorting]
Description

Often some of the column values (Title, Species, ...) are blank but can be discerned from the Description value. The contents of Chain Sequence ID are inconsistent and perhaps that column should be nuked (for AlphaFold searches only, I mean). See ticket #5216. Sometimes the same hit is listed multiple times, #5217.

=-=-=-=-=-=
For PDB/NR, defaults in the order:

Name
Evalue
[Score, if we need it for sorting]
Description
Resolution
Ligand Symbols

Several hits are missing Structure ID even though it is obvious from Name what they should be. But I favor nuking the Structure ID column anyway.

For PDB searches, Chain Sequence ID is a UniProt identifier so I suggest renaming that column UniProt. For AlphaFold searches, above and in #5216 I suggested nuking Chain Sequence ID... if that doesn't conflict with the PDB suggestion for this column.

In Chimera, we offered fewer columns than those I see the current ChimeraX dialog. I lean toward paring down the ChimeraX choices similarly, which would allow simplifying some headers. E.g.

  • Date – structure deposition date (thus omitting Revision Date and Publish Date)
  • Authors – structure authors (thus omitting Citation Authors)
  • PubMedPubMed identifier of literature reference, if any
  • UniProtUniProt identifier, if any, for hit chain (this is currently called Chain Sequence ID)

Others I consider nukable are Replaces, Status, and Structure ID (as mentioned above, is obvious from Name).

(7) Column widths. Initial width should be based on widest item, but needs to be capped to some maximum (although the user should be able to adjust manually to wider later) and the text should wrap instead of showing an ellipsis. Typically there very long values in several different columns (Title, Description, Species, Ligand Name, etc.) so the simplest would be a single max for all, or if it's not a nightmare, perhaps a somewhat larger cap for Title and Description than for the others. Eric can probably help with that.

(8) I'm thinking maybe the results list should be above the columns for all the checkboxes. The checkboxes section could be shown/hidden with a Columns button like in Chimera (Zach probably you should try the Chimera one). Also would be nice if somebody could save their checkbox configuration as a preference, another thing that Eric can probably help with.

Attachments (5)

nrdescription 2021-09-29.png (285.4 KB ) - added by Elaine Meng 4 years ago.
see comment 12
blast4hhb.png (203.9 KB ) - added by Elaine Meng 4 years ago.
blast results screenshot showing entities = polymers
Screen Shot 2021-10-06 at 6.49.56 PM.png (6.0 KB ) - added by Zach Pearson 4 years ago.
Column spaces
Screen Shot 2021-10-06 at 6.50.16 PM.png (5.7 KB ) - added by Zach Pearson 4 years ago.
No small empty column on the left
Screen Shot 2021-10-06 at 6.24.12 PM.png (44.0 KB ) - added by Zach Pearson 4 years ago.
Using row numbers as ranks. I'm not sure if the problem of ties resulting in misordering of the numbers would be better with a "Rank" column

Download all attachments as: .zip

Change History (40)

comment:1 by Zach Pearson, 4 years ago

(It's a checkbox)
1
1a
1b
2
2b (as part of #5267)
3
4
5a
5b
6
7
8

Version 7, edited 4 years ago by Zach Pearson (previous) (next) (diff)

comment:2 by Zach Pearson, 4 years ago

Description: modified (diff)

comment:3 by Zach Pearson, 4 years ago

4: Should either update the progress bar to move as data is added to the hit sequences -- where it can make the most impact right now -- or refactor it or remove it altogether.

Last edited 4 years ago by Zach Pearson (previous) (diff)

comment:4 by Zach Pearson, 4 years ago

Cc: Eric Pettersen added

In the notes of the initializer of ItemTable in https://github.com/RBVI/ChimeraX/blob/69bcb3ed8957507a8f1c7d74a46f328cc9e0d1e5/src/bundles/ui/src/widgets/item_table.py:

https://github.com/RBVI/ChimeraX/blob/69bcb3ed8957507a8f1c7d74a46f328cc9e0d1e5/src/bundles/ui/src/widgets/item_table.py#L159-L164

We promise consumers that they can specify a boolean for whether columns should be shown or hidden by default if they're not in a dictionary of default columns that should always be shown.

https://github.com/RBVI/ChimeraX/blob/69bcb3ed8957507a8f1c7d74a46f328cc9e0d1e5/src/bundles/ui/src/widgets/item_table.py#L174-L180

But later on in the code in ItemTable.add_column, it looks like we check the settings object we pass in for what to do instead of the dictionary we asked for.

https://github.com/RBVI/ChimeraX/blob/69bcb3ed8957507a8f1c7d74a46f328cc9e0d1e5/src/bundles/ui/src/widgets/item_table.py#L291-L295

Do you think it's safe to change it, Eric?

Last edited 4 years ago by Zach Pearson (previous) (diff)

comment:5 by Eric Pettersen, 4 years ago

Cc: Elaine Meng added

The "fallback default" is for columns dynamically added to a table, i.e. ones whose existence you can't know about beforehand. That isn't the case with the Blast Results table. Your proposed code change would disable the user's ability to save their own default set of columns (well, to have that saved default have any effect).

What is the problem you are trying to solve here?

comment:6 by Zach Pearson, 4 years ago

When I give the ItemTable a dict of default columns and set the fallback to false, it seems to want to display all the columns anyway.

comment:7 by Eric Pettersen, 4 years ago

The values in the dict should be True for columns you want displayed by default and False for columns you want hidden by default. 'fallback' shouldn't come into play at all since all your columns should be listed in that dict. Like I mentioned before, 'fallback' is for columns created on the fly where you cannot know ahead of time the columns name because it comes from runtime data or from the user, etc.

Assuming all your columns _are_ listed correctly in the dict, what happens when you click on the Default button? That is supposed to show the default set of columns.

in reply to:  8 ; comment:8 by Elaine Meng, 4 years ago

Some revisions to my earlier opinions:

(1) initial sort order should be smallest E-value to highest E-value.  I thought this was always the same as highest score to lowest score, so I'd suggested that as the initial order, but apparently it is not exactly the same order.

(2) if you do improve parsing to populate Species, etc. when the info is in the Description field but those columns (Species, etc.) are blank, then later (could be after release 1.3) we could revisit which should be the initial default columns.  My initial recommendations of initial default columns were under the assumption that Species, etc. might be blank for a significant proportion of hits, and those recommendations are good enough for 1.3.

(3) I thought of one case where the E-value rank (formerly ID column now nuked) might conceivably be useful:  when somebody wants the top N hits in an alignment, it's hard to figure out if you're at row N when trying to select that many hits.  I still lean toward not wasting the column space, but I'm open to going back to including it (but renamed something other than ID, say Rank) depending on any feedback from others.

in reply to:  7 comment:9 by Zach Pearson, 4 years ago

Replying to Eric Pettersen:

The values in the dict should be True for columns you want displayed by default and False for columns you want hidden by default. 'fallback' shouldn't come into play at all since all your columns should be listed in that dict. Like I mentioned before, 'fallback' is for columns created on the fly where you cannot know ahead of time the columns name because it comes from runtime data or from the user, etc.

Assuming all your columns _are_ listed correctly in the dict, what happens when you click on the Default button? That is supposed to show the default set of columns.

I have to click the button a couple of times, see #5307, but the non-default columns get hidden. It's just that when they're first displayed, they're _all_ displayed.

comment:10 by Zach Pearson, 4 years ago

I actually have to click Default, then All, then Default again.

comment:11 by Elaine Meng, 4 years ago

For AlphaFold, the set of columns looking very good in today's build. The only additional recommendations for AlphaFold columns are to add a Rank column, include it on the far left in the standard set of columns, and rename the Chain Species column to Species.

For PDB/NR/UnirefN I'll have to re-evaluate the set of columns after seeing what it looks like after the disappeared ones come back.

Other "visual" things I can see still need doing are limiting column width and wrapping text within, and debloating the blank space of the dialogs (both input and results) as suggested by Eric.

comment:12 by Elaine Meng, 4 years ago

Another recommendation is to parse the PDB/NR results Description field into Title and Species, similar to what you did with Alphafold results. For both structure hits and sequence-only hits, as far as I can tell, the Description is basically a title but with the species in square brackets at the end. See attachment "nrdescription 2021-09-29.png" ... some of them have "synthetic construct" in the square brackets but I think that would be OK to put in the Species field.

by Elaine Meng, 4 years ago

see comment 12

in reply to:  14 comment:13 by Elaine Meng, 4 years ago

Another suggestion is to change "Evalue" to "E-Value"

in reply to:  15 ; comment:14 by Elaine Meng, 4 years ago

The set of columns I had in mind are in this documentation preview (not yet committed/pushed):  https://www.cgl.ucsf.edu/home/meng/chimerax/vdocs/user/tools/blastprotein.html#results

comment:15 by Zach Pearson, 4 years ago

I spun off 5b and comment 12 into new tickets, and I think I've fixed most of the other things in this ticket. I think it's good enough to take the 1.3 tag off, but I will continue working on fixes.

comment:16 by Zach Pearson, 4 years ago

Milestone: 1.3

in reply to:  18 comment:17 by Elaine Meng, 4 years ago

Related to #5327 (parsing Description into Title and Species), I now recommend putting Title and Species columns in that order in the factory default columns ("Standard") for PDB/NR results,  like they already are for AlphaFold results.  Resolution etc. would still be to the right of those.

Stll need to add a Rank column (1,2, ... in initial sort order by most significant E-value) on the far left for all results (AlphaFold, PDB, or NR).

To nitpick, Score and Resolution could be narrower and Title wider when the table initially appears, if that's possible... but maybe you're still working on that.

in reply to:  19 ; comment:18 by Elaine Meng, 4 years ago

(Sorry for the zillions of messages, but can't tell if you were still planning to do something or whether it fell off the radar.)

For AlphaFold results, recommend nuking  "Chain Sequence ID" because it's both inconsistent and redundant, sometimes has a subset of the Name ( without the species suffix) and sometimes the same info as Gene of that same row.

For PDB/NR results, "Chain Sequence ID" should be retitled UniProt because it's always a uniprot ID, as far as I can tell.  Correct me if I'm wrong.

comment:19 by Zach Pearson, 4 years ago

No worries, it's not off the radar, it's just not (imo) blocking 1.3 anymore. I'll be pushing fixes right up until the freeze.

in reply to:  21 comment:20 by Elaine Meng, 4 years ago

I suggest changing the context menu wording.  We try to use "selected" only to mean selected in the 3D structure (e.g. with green outlines).  Sometimes I use "chosen" for rows highlighted in tables, or if it's something we want to keep short like a menu entry, sometimes omit either term and explain in documentation that it applies only to the chosen rows.

Current wording:

Load and Align Selections
Show Selections in Sequence Viewer

Suggested:

Load Structures
Show Sequence Alignment

Other possibilities, can use if you like better:

Load Structures of Chosen Hits
Show Sequences of Chosen Hits
Show MSA with Query


comment:21 by Zach Pearson, 4 years ago

Chain sequence ID and the word "Selections" are no more.

comment:22 by Elaine Meng, 4 years ago

I'll try to summarize the column suggestions that are not yet done. Sorry it got so confusing above!

(1) From comment 8, for any kind of search add a Rank column in the Standard set on the far left. That was originally an "ID" column that I said should be nuked. But now I think it is useful, sorry for changing my mind!

(2) The remainder apply to PDB (and NR searches since they generally have PDB hits):

remove Keywords, Replaces, Status, and Structure ID columns
rename Chain Sequence ID to UniProt ID (for PDB/NR only; for AlphaFold we got rid of Chain Sequence ID)
rename Deposition Date to Date (remove Revision Date and Publish Date)
rename Structure Authors to Authors (remove Citation Authors)
rename Exp Method to Method

by Elaine Meng, 4 years ago

Attachment: blast4hhb.png added

blast results screenshot showing entities = polymers

comment:23 by Elaine Meng, 4 years ago

Sorry another revision. Also the "# Entities" column should be removed:

After at least an hour of dorking around, I see that in blast results, "# Entities" is ALWAYS equal to "# Polymers" in contradiction to the discussion at the RCSB PDB of the meaning of "Entities" in this page:
<https://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/beginner%E2%80%99s-guide-to-pdb-structures-and-the-pdbx-mmcif-format>

I blasted PDB with their exact example protein 4hhb, which they say has 5 entities: alpha and beta subunits (2 copies each), water, phosphate, and heme. You can see in the attached screenshot, however, that in the results where it finds itself, # Entities and # Polymers are both given as 2. I cannot find any examples where these columns differ for the same hit and thus "# Entities" is completely redundant. (and possibly wrong)

comment:24 by Eric Pettersen, 4 years ago

The other slight possibility is that the code is mapping the input data to the output columns incorrectly and the #polymers in the input is being mapped to both #entities and #polymers in the output. But if that is not the case then nuking the #Entities oolumn would seem to be the way to go.

comment:25 by Eric Pettersen, 4 years ago

Looking at the code, I can see that '# Entities' comes from fetching the entry's polymer_entity_count and is therefore naturally identical to '# Polymers'. Either '# Entities' needs to fetch a different attribute, or we nuke '# Entities'. I lean towards the latter since I think "entities" is a confusing term for most users and we already make ligand info available in other columns.

comment:26 by Eric Pettersen, 4 years ago

The alternative is fetching nonpolymer_entity_count, which should be things like saccharides, but you'd want a different column name.

in reply to:  29 comment:27 by Elaine Meng, 4 years ago

Agreed, just get rid of "# Entities"

in reply to:  22 comment:28 by Zach Pearson, 4 years ago

Replying to Elaine Meng:

I'll try to summarize the column suggestions that are not yet done. Sorry it got so confusing above!

(1) From comment 8, for any kind of search add a Rank column in the Standard set on the far left. That was originally an "ID" column that I said should be nuked. But now I think it is useful, sorry for changing my mind!

(2) The remainder apply to PDB (and NR searches since they generally have PDB hits):

remove Keywords, Replaces, Status, and Structure ID columns
rename Chain Sequence ID to UniProt ID (for PDB/NR only; for AlphaFold we got rid of Chain Sequence ID)
rename Deposition Date to Date (remove Revision Date and Publish Date)
rename Structure Authors to Authors (remove Citation Authors)
rename Exp Method to Method

Should be fixed by this commit. I accomplished (1) by enabling the row numbers for ItemTable in ui/widgets/item_table.py -- I just set the visibility of the numbers to False in other bundles that used the ItemTable class. As a two-for-one, this fix also fixed a bug that has long plagued ItemTables: the five pixel empty column to the left of the real data. I'll post a before and after screenshot.

Last edited 4 years ago by Zach Pearson (previous) (diff)

by Zach Pearson, 4 years ago

Column spaces

by Zach Pearson, 4 years ago

No small empty column on the left

by Zach Pearson, 4 years ago

Using row numbers as ranks. I'm not sure if the problem of ties resulting in misordering of the numbers would be better with a "Rank" column

in reply to:  34 ; comment:29 by Elaine Meng, 4 years ago

Good point.  Do you have any ideas for a name for this column?   I thought "ID" was weird, but if you two think it's OK I could live with it.   Maybe there is something better. 

in reply to:  35 comment:30 by Elaine Meng, 4 years ago

Eric suggests "#" or "Hit #" ... either of those is fine with me.  What do you think?

comment:31 by Zach Pearson, 4 years ago

Hit number seems all right; I'll make the change!

in reply to:  37 comment:32 by Elaine Meng, 4 years ago

Thanks, I see Hit #! 
Next suggestion is that it should be on the far left in the table and included in the Standard (factory default) set of columns.

in reply to:  38 comment:33 by Elaine Meng, 4 years ago

Oh, and start at 1, not zero.  It's also wider than need be but I assume that will be worked on along with other column widths eventually.

comment:34 by Zach Pearson, 4 years ago

I've implemented both of these; they should be in the daily build tomorrow!

comment:35 by Zach Pearson, 4 years ago

Resolution: fixed
Status: assignedclosed

Looks like the last thing to do is the column widths. I spun that off into another ticket. I think for my own mental organization purposes, we should open new tickets for further changes.

Note: See TracTickets for help on using tickets.