Opened 3 years ago
Closed 2 years ago
#8285 closed enhancement (fixed)
Add option to uniprot open command to associate sequence with a specified structure
Reported by: | Tom Goddard | Owned by: | pett |
---|---|---|---|
Priority: | moderate | Milestone: | |
Component: | Sequence | Version: | |
Keywords: | Cc: | Elaine Meng | |
Blocked By: | Blocking: | ||
Notify when closed: | Platform: | all | |
Project: | ChimeraX |
Description (last modified by )
I'd like to be able to associate a uniprot sequence opened with the open command with an already open structure. Currently
open 2rh1
open P07550 fromDatabase uniprot
does not associate the uniprot sequence with the structure. The case I encountered this was where the uniprot sequence was opened from the chain table for 2rh1 in the Log. So I was a bit surprised that it did not associated the sequence. One way to handle this would be to add an open command option so the Log table could instead issue
open P07550 fromDatabase uniprot associate #1/A
or for other structures there may be multiple chains to associate
open P07550 fromDatabase uniprot associate #2/A,B,C
Change History (8)
follow-up: 2 comment:2 by , 3 years ago
I thought they usually auto-associated. So this may be a rare case and have low priority. It would be difficult for the Log chain table to use the separate "seq assoc" command because it would not know what name to use for the sequence -- so the open command option to associate would work better.
follow-up: 3 comment:3 by , 3 years ago
I meant the user could use that command directly (not via Log link), but yeah, it is only the advanced users who would figure that out. Your changes for #8283 much better solve the problem since the exact boundaries of the fusion still aren't that clear if you associate the structure with both (separate) uniprot sequences. By the way, the lysozyme sequence (p00720) associates automatically with 2rh1.
follow-up: 4 comment:4 by , 3 years ago
Yeah, a bit surprising that adrenergic receptor does not associate automatically when it is ~400 amino acids with all but 13 aligned I think. Actually I see a sequence mismatch in the N-terminus that does not have coordinates -- strange to me that it is not colored red like the other mismatches for the residues that have coordinates.
follow-up: 5 comment:5 by , 3 years ago
(A) If I understand correctly, something like 32 residues of the adrenergic receptor are missing. The length of the black box of "missing residues" on the p07550 sequence is misleading because >10 residues that are called mismatch after that are actually really missing but ended up associated with that many residues of the lysozyme sequence. Another way of looking at it is there are something like 150 residues in the structure (those from lysozyme) that don't match the sequence at all.
(B) The definition of mismatch is that there is a residue in the structure (e.g. that you can see in 3D) that is different from that in the sequence. It doesn't look at the SEQRES to figure out what is a mismatch.
follow-up: 6 comment:6 by , 3 years ago
Thanks that clarifies the sequence not autoassociating. It does seem a bit weird though that the extra residues from lysozome prevent the match when almost all of the receptor sequence matches. I might expect that the percent match based on the shorter sequence would be the criteria for autoassigning. But maybe there are reasons that is bad.
comment:7 by , 3 years ago
Status: | assigned → accepted |
---|
comment:8 by , 2 years ago
Description: | modified (diff) |
---|---|
Resolution: | → fixed |
Status: | accepted → closed |
Implemented. Revised the chain table to use it.
Code change: https://github.com/RBVI/ChimeraX/commit/6781d7938aa27d4462bf0a0cbadc1512dc200d87