Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#8283 closed enhancement (fixed)

RFE: improve chain info for fusion proteins (list multiple UniProt IDs)

Reported by: Elaine Meng Owned by: Tom Goddard
Priority: moderate Milestone:
Component: Sequence Version:
Keywords: Cc: pett
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

It would really help to understand fusion/insertion proteins if the chain info table shown when you initially fetch/open a structure could give more than one UniProt name or ID per chain.

Example is 2rh1, fusion protein of GPCR and T4 lysozyme (often used to stabilize the crystal to allow structure determination). Chain info table only gives the UniProt name for the GPCR, not the lysozyme part.

Where I'm usually trying to go is to find the precise boundary(ies) between the two proteins. You can't really do it by opening the uniprot sequence and associating the protein, because it will usually associate a few extra residues on the ends but call them mismatches.

The RCSB page for this structure shows both of the UniProt identifiers, and both of them are in the mmCIF file.

Attachments (1)

chain_log.png (44.6 KB ) - added by Tom Goddard 3 years ago.
Chain table for 2rh1

Download all attachments as: .zip

Change History (5)

comment:1 by Elaine Meng, 3 years ago

Actually, what I would REALLY like even better (or in addition) is if you show the sequence from the structure (command: seq chain /A) it would be able to show you the boundaries on that one sequence. Maybe this info is in the file? I see this and I think the lysozyme residues are numbers 1002-1161 in the structure. Not sure about the other numbers, maybe they are in the uniprot sequence numbering.

_struct_ref_seq.pdbx_auth_seq_align_end
1 1 2RH1 A 8 ? 237 ? P07550 1 ? 230 ? 1 230
2 2 2RH1 A 238 ? 397 ? P00720 2 ? 161 ? 1002 1161
3 3 2RH1 A 398 ? 500 ? P07550 263 ? 365 ? 263 365

comment:2 by Tom Goddard, 3 years ago

Resolution: fixed
Status: assignedclosed

Ok, the Log chain table now shows all the uniprot identifiers for a fusion protein. I've attached an image showing the appearance.

Also I made the chain table show in the uniprot column the sequence ranges (in PDB numbering) for each uniprot id and those sequence ranges are each links that will select the that range of residues. That selection behavior is quite similar to clicking on the chain identifier link in the first column of the table, only the range in the uniprot column may be shorter (e.g. for part of a fusion protein) or longer. It can be longer because clicking the chain link executes a select command for the residues with coordinates while the clicking the uniprot range link selects the full range of residue numbers specified in the file metadata which will include N and C terminal residues for which there are no coordinates.

I'll make a separate ticket for showing sequence viewer improvements to show the fusion domains as regions (requested in comment 1), ticket #8284.

Last edited 3 years ago by Tom Goddard (previous) (diff)

by Tom Goddard, 3 years ago

Attachment: chain_log.png added

Chain table for 2rh1

comment:3 by Tom Goddard, 3 years ago

Clicking uniprot name ADRB2_HUMAN in the chain table for 2rh1 shows the uniprot sequence in a sequence viewer but does not associate it with the structure. I made a ticket #8285 to add an open command option to say to associate the sequence with the structure.

in reply to:  5 comment:4 by Elaine Meng, 3 years ago

Just a note that they usually associate automatically, but weird cases like an inserted fusion will prevent that.  One could always use an additional "sequence associate" command, e.g.

open 2rh1
open uniprot:p07550
seq assoc /A

(although if there were multiple structures and/or multiple sequence windows, it might be necessary to be more explicit:  sequence associate #1/A P07550:1)

Example that works automatically are either chain in 1LQG:

open 1lqg
open uniprot:ung_ecoli
- or -
open uniprot:ungi_bppb2

LOL I tried 2gbp first but it gave some other error, just reported as a bug.
Note: See TracTickets for help on using tickets.