Opened 4 years ago

Closed 3 years ago

#5241 closed defect (fixed)

Blastprotein: Ambiguous behavior of sequence-only hits

Reported by: Zach Pearson Owned by: Zach Pearson
Priority: moderate Milestone: 1.4
Component: UI Version:
Keywords: Cc: Elaine Meng
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

Some rows are sequence-only and do not have a corresponding PDB structure. Previously the blastprotein GUI could open the page for that entry in the NCBI protein database. Now the only option in the context menu is to load and align the hit or to show it in seqalign.

Moreover, what should happen when sequence only hits are chosen alongside hits that can be loaded?

Attachments (2)

nrsearch.py (1.2 MB ) - added by Elaine Meng 4 years ago.
nrold.cxs (34.0 KB ) - added by Elaine Meng 4 years ago.
session from ChimeraX 1.2.5 with blast nr results including a couple of sequence-only hits

Download all attachments as: .zip

Change History (16)

comment:1 by Zach Pearson, 4 years ago

Cc: Elaine Meng added

When a single hit is selected, I think it makese sense to warn the user that there is no PDB if they try to load it.

When a hit with no PDB is part of a range of items to load, I think it makes sense to make a list of all the unloadable hits in the selection and warn the user that they won't be loaded.

Additionally, another context menu item could restore the loading functionality.

in reply to:  2 ; comment:2 by Elaine Meng, 4 years ago

Somebody may have chosen several rows of hits in NR search results, some of which correspond to PDB entries and some that are only sequence-only.  This makes sense especially if they're also showing the multiple sequence alignment for that set of hits.  I think it would be super-annoying if there is some warning for each chosen hit that is not a sequence; instead if somebody then chooses to load structures,  it should just open the structures (if any) and ignore the chosen rows that don't have corresponding structures.

That is what Chimera does.  I attached Chimera session with NR search results in case you want to try messing with that.  It's currently sorted by resolution, so you have to scroll down a bit to get to the sequence-only hits.  Chimera has buttons to show the alignment or load hits, rather than doing it via context menu.  Might consider having similar buttons in ChimeraX Blast Results dialog to make loading multiples more discoverable.

Other than ignoring them, the other choice for sequence-only hits is to open their web pages. I'm thinking that double-clicking a sequence-only hit should do that. 

This session was made in a Chimera 1.16 daily build from 8/12 so you may need a daily build at least that new to open it, sorry for any inconvenience.

by Elaine Meng, 4 years ago

Attachment: nrsearch.py added

in reply to:  4 comment:3 by Elaine Meng, 4 years ago

The previous ChimeraX Blast Protein dialog opened the NCBI protein webpage when you clicked the Name of a sequence-only hit, so you may want to look at the older code as to how to do that.

comment:4 by Elaine Meng, 4 years ago

On the other hand, I can't get that to work! I'll attach a ChimeraX 1.2.5 session with blast results that has a couple of sequence-only hits... although I know it worked sometime in the past, clicking the sequence name link in 1.2.5 does not do anything. Cannot test Blast Protein in any earlier versions of ChimeraX any more because of RCSB URL changes.

So opening some NCBI protein web page upon doubleclick is a "would be nice" but prioritize as you like, nonurgent.

I see that Chimera grays out "Load Structure" when none of the chosen rows has a corresponding structure. Eric tells me that it should be possible to gray out either an explicit button or your context-menu entry when none of the chosen rows has a corresponding structure. When at least one chosen row has a structure, my vote is to simply open it and ignore any other non-applicable choices without sending warnings.

by Elaine Meng, 4 years ago

Attachment: nrold.cxs added

session from ChimeraX 1.2.5 with blast nr results including a couple of sequence-only hits

comment:5 by Zach Pearson, 4 years ago

I can certainly see what you meant by sequence only hits not being shown. In the session you posted there's clearly entries such as gb|AAC33186.1| that need to be included and parsed.

I also can't get webpages working with your session, though. But I believe you when you say they used to be there, so I'll work on a solution to get them displayed.

comment:6 by Zach Pearson, 4 years ago

Perhaps for those it's appropriate to use the accession code instead of parsing the ID.

For example

  45             {
  44               "num": 17,
  43               "description": [
  42                 {
  41                   "id": "ref|NP_001104592.1|",
  40                   "accession": "NP_001104592",
  39                   "title": "proprotein convertase subtilisin/kexin type 9 precursor [Pan troglodytes]",
  38                   "taxid": 9598
  37                 },
  36                 {
  35                   "id": "sp|A8T644.1|",
  34                   "accession": "A8T644",
  33                   "title": "RecName: Full=Proprotein convertase subtilisin/kexin type 9; AltName: Full=Proprotein convertase 9; Short=PC9; AltName: Full=Subtilisin/kexin-like protease PC9; Flags: Precursor [Pan troglodytes]     ",
  32                   "taxid": 9598
  31                 },
  30                 {
  29                   "id": "gb|ABV59217.1|",
  28                   "accession": "ABV59217",
  27                   "title": "convertase subtilisin/kexin type 9 preproprotein [Pan troglodytes]",
  26                   "taxid": 9598
  25                 }
  24               ],
  23               "len": 692,
  22               "hsps": [
  21                 {
  20                   "num": 1,
  19                   "bit_score": 192.586,
  18                   "score": 488,
  17                   "evalue": 1.97659e-55,
  16                   "identity": 92,
  15                   "positive": 92,
  14                   "query_from": 1,
  13                   "query_to": 92,
  12                   "hit_from": 61,
  11                   "hit_to": 152,
  10                   "align_len": 92,
   9                   "gaps": 0,
   8                   "qseq": "TATFHRCAKDPWRLPGTYVVVLKEETHLSQSERTARRLQAQAARRGYLTKILHVFHGLLPGFLVKMSGDLLELALKLPHVDYIEEDSSVFAQ",
   7                   "hseq": "TATFHRCAKDPWRLPGTYVVVLKEETHLSQSERTARRLQAQAARRGYLTKILHVFHGLLPGFLVKMSGDLLELALKLPHVDYIEEDSSVFAQ",
   6                   "midline": "TATFHRCAKDPWRLPGTYVVVLKEETHLSQSERTARRLQAQAARRGYLTKILHVFHGLLPGFLVKMSGDLLELALKLPHVDYIEEDSSVFAQ"
   5                 }
   4               ]
   3             },

in reply to:  9 comment:7 by Elaine Meng, 4 years ago

As you can tell, I haven't tested that aspect very much lately.  I think it used to work more uniformly but perhaps the sequence databases changed their URLs.  One that does still work in the nrblast-125.cxs session (in ChimeraX 1.2.5 and probably also 1.2) is hit #41, click the sequence name link and it opens a web page.  I can see that that does not work for many others, however!

Showing the web pages for sequence-only hits can be a useful feature but is not needed for the 1.3 release.  Getting the GUI improvements done along with the other tickets considered top priority by Tom G are more important for 1.3.

Elaine

comment:8 by Zach Pearson, 4 years ago

Milestone: 1.31.4

The scaffolding is there, but if we don't consider this important let's remilestone it to 1.4.

comment:9 by Zach Pearson, 4 years ago

Resolution: fixed
Status: assignedclosed

I added a context menu item, "Load {AlphaFold,NRDB,PDB,UniProt} webpage", that will open the relevant database's page on a user's selections. Code changes are in this commit.

in reply to:  12 comment:10 by Elaine Meng, 4 years ago

Minor: Currently the context menu choice is "Open UniRef webpage" for the uniref hits.  Should instead be "Open UniProt Webpage"  with UniProt instead of Uniref, and Webpage capitalized since the other entries have title capitalization.

in reply to:  14 ; comment:12 by Elaine Meng, 4 years ago

Now I'm thinking it would just be better to make the context menu entry the same for all databases, say "Open Database Webpage" or "Show Database Entry" or something else semi-generic like that, rather than trying to be fancy.  

One reason is that at least currently, for PDB sequences it just shows the NCBI Protein webpage (not the Protein Data Bank webpage), and I suspect it does the same thing for an NR search, although I still haven't been able to run an NR search to test that specifically.

comment:13 by Elaine Meng, 4 years ago

Resolution: fixed
Status: closedreopened

comment:14 by Zach Pearson, 3 years ago

Resolution: fixed
Status: reopenedclosed
Note: See TracTickets for help on using tickets.