Opened 3 years ago

Closed 2 years ago

#8285 closed enhancement (fixed)

Add option to uniprot open command to associate sequence with a specified structure

Reported by: Tom Goddard Owned by: pett
Priority: moderate Milestone:
Component: Sequence Version:
Keywords: Cc: Elaine Meng
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description (last modified by pett)

I'd like to be able to associate a uniprot sequence opened with the open command with an already open structure. Currently

open 2rh1
open P07550 fromDatabase uniprot

does not associate the uniprot sequence with the structure. The case I encountered this was where the uniprot sequence was opened from the chain table for 2rh1 in the Log. So I was a bit surprised that it did not associated the sequence. One way to handle this would be to add an open command option so the Log table could instead issue

open P07550 fromDatabase uniprot associate #1/A

or for other structures there may be multiple chains to associate

open P07550 fromDatabase uniprot associate #2/A,B,C

Change History (8)

in reply to:  1 ; comment:1 by Elaine Meng, 3 years ago

Oops put my last comment in the wrong ticket.  Repeating here:

Just a note that they usually associate automatically, but weird cases
like an inserted fusion will prevent that.  One could always use an
additional "sequence associate" command, e.g.

open 2rh1
open uniprot:p07550
seq assoc /A

(although if there were multiple structures and/or multiple sequence
windows, it might be necessary to be more explicit:  sequence associate
#1/A P07550:1)

Example that works automatically are either chain in 1LQG:

open 1lqg
open uniprot:ung_ecoli
- or -
open uniprot:ungi_bppb2

LOL I tried 2gbp first but it gave some other error, just reported as a
bug.

in reply to:  2 ; comment:2 by Tom Goddard, 3 years ago

I thought they usually auto-associated. So this may be a rare case and have low priority. It would be difficult for the Log chain table to use the separate "seq assoc" command because it would not know what name to use for the sequence -- so the open command option to associate would work better.

Last edited 2 years ago by pett (previous) (diff)

in reply to:  3 ; comment:3 by Elaine Meng, 3 years ago

I meant the user could use that command directly (not via Log link), but yeah, it is only the advanced users who would figure that out. Your changes for #8283 much better solve the problem since the exact boundaries of the fusion still aren't that clear if you associate the structure with both (separate) uniprot sequences. By the way, the lysozyme sequence (p00720) associates automatically with 2rh1.

Last edited 2 years ago by pett (previous) (diff)

in reply to:  4 ; comment:4 by Tom Goddard, 3 years ago

Yeah, a bit surprising that adrenergic receptor does not associate automatically when it is ~400 amino acids with all but 13 aligned I think. Actually I see a sequence mismatch in the N-terminus that does not have coordinates -- strange to me that it is not colored red like the other mismatches for the residues that have coordinates.

Last edited 2 years ago by pett (previous) (diff)

in reply to:  5 ; comment:5 by Elaine Meng, 3 years ago

(A) If I understand correctly, something like 32 residues of the adrenergic receptor are missing. The length of the black box of "missing residues" on the p07550 sequence is misleading because >10 residues that are called mismatch after that are actually really missing but ended up associated with that many residues of the lysozyme sequence. Another way of looking at it is there are something like 150 residues in the structure (those from lysozyme) that don't match the sequence at all.

(B) The definition of mismatch is that there is a residue in the structure (e.g. that you can see in 3D) that is different from that in the sequence. It doesn't look at the SEQRES to figure out what is a mismatch.

Last edited 2 years ago by pett (previous) (diff)

in reply to:  6 ; comment:6 by goddard@…, 3 years ago

Thanks that clarifies the sequence not autoassociating. It does seem a bit weird though that the extra residues from lysozome prevent the match when almost all of the receptor sequence matches. I might expect that the percent match based on the shorter sequence would be the criteria for autoassigning. But maybe there are reasons that is bad.

Last edited 2 years ago by pett (previous) (diff)

comment:7 by pett, 3 years ago

Status: assignedaccepted

comment:8 by pett, 2 years ago

Description: modified (diff)
Resolution: fixed
Status: acceptedclosed

Implemented. Revised the chain table to use it.

Code change: https://github.com/RBVI/ChimeraX/commit/6781d7938aa27d4462bf0a0cbadc1512dc200d87

Note: See TracTickets for help on using tickets.