[Chimera-users] sequence and structure numbering divergence

Fri Apr 8 13:42:25 PDT 2016

Hi Michael,
Re the Sequence issue: the structure-residue numbering and the sequence window numbering could easily diverge (as you observed), since the sequence-window numbering is simply the position in the sequence or sequence alignment, and certainly in the alignment case, a certain column could contain quite different residue numbers from multiple associated structures.  Another common reason is that the structure numbering doesn’t start with 1. Also consider gap characters in an alignment, which would be included in the sequence numbering but not in the structure.

However, any difference in the sequence numbering doesn’t pose a problem for the correct sequence-structure association, as reflected in the structure residue information shown at the bottom of the sequence window upon mouseover.  In the sequence-window “Numberings” menu you can hide the numbering entirely or adjust the start number (only, which won’t help if you have an internal missing segment).  There isn’t an option to use the residue numbers from the input coordinates file, but since the associations are correct, it's mainly cosmetic. Are you wanting the two kinds of numbering to match for convenience in interactive work, or to make a figure?

If your sequence information contained all the residues and the numbering of the structure residues starts with 1, the two types of numbering would match up.  The Sequence window would just show a red outline box (by default) around the residues that were missing from the coordinates, as is the case for deposited PDB entries.  It sounds like you just started the Sequence tool, which would try to extract the sequence from the structure input file. There are two general ways to get the full sequence information if it’s not in the coordinates:

(1) the PDB input includes the full sequence in a SEQRES section
(2) you can simply open the full sequence from a text file in some standard format like FASTA, or even fetch the sequence if you know its UniProt ID, and then associate your structure with that sequence.  Association may happen automatically, but if it doesn’t you can force it with “Structure…. Associations” in the sequence window menu.  In that case you don’t call the Sequence tool, you just use File… Open (to open a fasta file) or File… Fetch by ID (to fetch sequence from UniProt) and the sequence will automatically show up in a separate window.

Sequence window menus, associations, etc. are described in more detail here:
<http://www.rbvi.ucsf.edu/chimera/docs/ContributedSoftware/multalignviewer/framemav.html>

I hope this helps,
Elaine
-----
Elaine C. Meng, Ph.D.                       
UCSF Computer Graphics Lab (Chimera team) and Babbitt Lab
Department of Pharmaceutical Chemistry
University of California, San Francisco

On Apr 8, 2016, at 10:49 AM, Michael Blaisse <mblaisse at berkeley.edu> wrote:

> Hello,
> 
> I'm solving a crystal structure and as I'm preparing to submit the final structure to PDB, I used the PDB_Extract tool to convert my final refined .pdb to an mmCIF file as per the request of PDB. After I do this, however, I noticed that if I open the CIF file in Chimera, it does not have any connectivity between the residues. This does not appear to be a problem in PyMOL. 
> 
> Less importantly, I also had a question about the Sequence view in Chimera. I have a break in my chain due to unmodelled residues, and I noticed that if I float my cursor over a residue in the sequence, it shows the correct residue number at the bottom of the window, however the numbering at the side of the sequence view window is wrong after the break because it does not take the chain break into account.
> 
> I appreciate any advice/help people can offer.
> 
> Thanks,
> Mike