Context Navigation

← Previous Ticket
Next Ticket →

#2008 accepted defect

unknown residues numbered like gaps

Reported by:	olibclarke@…	Owned by:	Eric Pettersen
Priority:	normal	Milestone:
Component:	Sequence	Version:
Keywords:		Cc:	Elaine Meng
Blocked By:		Blocking:
Notify when closed:		Platform:	all
Project:	ChimeraX

Description

The following bug report has been submitted:
Platform:        Darwin-18.6.0-x86_64-i386-64bit
ChimeraX Version: 0.9 (2019-06-05)
Description
Hi,

Regarding the sequence viewer - would it be possible to represent missing (unmodelled) residues as dashes ("-") in the sequence viewer, when viewing the sequence of a single protein model? 

This would be useful for quickly seeing how much of the native sequence is not modeled, quickly identifying missing loops, and as a bonus the "alignment numbering" used in the sequence viewer would then match the real numbering of the coordinates file.

Cheers
Oli

Log:
UCSF ChimeraX version: 0.9 (2019-06-05)  
© 2016-2019 Regents of the University of California. All rights reserved.  
How to cite UCSF ChimeraX  

> open 1bl8 format mmCIF fromDatabase pdb

1bl8 title:  
Potassium channel (KCSA) from streptomyces lividans [more info...]  
  
Chain information for 1bl8 #1  
---  
Chain | Description  
A B C D | protein (potassium channel protein)  
  
  
Alignment identifier is 1  

> close #1

> open 5tal format mmCIF fromDatabase pdb

5tal title:  
Structure of rabbit RyR1 (Caffeine/ATP/Ca2+ dataset, class 1&2) [more info...]  
  
Chain information for 5tal #1  
---  
Chain | Description  
A F H J | Peptidyl-prolyl cis-trans isomerase FKBP1B  
B E G I | Ryanodine receptor 1  
  
Non-standard residues in 5tal #1  
---  
ATP — adenosine-5'-triphosphate  
CA — calcium ion  
CFF — caffeine (3,7-dihydro-1,3,7-trimethyl-1H-purine-2,6-dione)  
ZN — zinc ion  
  
  
Alignment identifier is 1  
Alignment identifier is 2  
Alignment identifier is 3  
Alignment identifier is 4  




OpenGL version: 4.1 ATI-2.9.26
OpenGL renderer: AMD Radeon Pro 580 OpenGL Engine
OpenGL vendor: ATI Technologies Inc.

Attachments (1)

Screen Shot 2019-06-05 at 9.13.51 PM.png (1.2 MB ) - added by olibclarke@… 7 years ago.: Added by email2trac

Download all attachments as: .zip

Change History (8)

comment:1 by Eric Pettersen, 7 years ago

Cc:	Elaine Meng added
Component:	Unassigned → Sequence
Owner:	set to Eric Pettersen
Platform:	→ all
Project:	→ ChimeraX
Status:	new → accepted
Summary:	ChimeraX bug report submission → unknown residues numbered like gaps

The '?'s in the sequence for chains B/E/G in 5tal are not missing residues. Missing residues have a black outline box around them (of which there are many on that sequence). '?'s are for existing residues whose type is unknown, i.e. they have backbone (or CA) coordinates but no side chain atoms and their residue name is UNK. The bug is that for alignment numbering purposes '?' residues are being treated like gaps. I will fix that.

--Eric

in reply to: 2 ; follow-up: 2 comment:2 by olibclarke@…, 7 years ago

Hi Eric,

Thanks! Yes I understand that re the UNKs - this is a structure I built - but that’s not quite what I mean.

Many missing regions are not indicated in the sequence. E.g. see attached (another structure of the same protein with no UNKs). E4253 is indicated as being adjacent to F4540, when there are 200+ residues in between, as you can tell from the numbering. 

What I am suggesting is that even if the sequence is not present in the coordinate file header for these residues (which is I guess how the “boxed” missing regions are being extracted), their presence (inferred from the numbering) should be indicated with one dash per residue. 

This would make interpretation of the sequence view easier, and allow users to quickly see gaps in the sequence numbering deriving from regions that have not been built, missing loops etc.

Cheers
Oli

in reply to: 3 ; follow-up: 3 comment:3 by olibclarke@…, 7 years ago

Sorry forgot the attachment - the gap is between AAQISE… and …FWGEL

Cheers
Oli

by olibclarke@…, 7 years ago

Attachment:	Screen Shot 2019-06-05 at 9.13.51 PM.png added

Added by email2trac

comment:4 by Eric Pettersen, 7 years ago

The problem is that PDB residue numbering is not guaranteed to be linearly increasing. While in probably 98% of cases you could get the number of missing residues by subtracting the sequence numbers, in cases where the numbering corresponds to a reference sequence with an insertion or deletion in the missing region, your estimate would be wrong. Also, circular permutations, with their ability to have the residue numbering actually drop or shoot up dramatically -- with no actual missing residues in between -- throw another monkey wrench into the problem.

While I am unwilling to present "guesstimated" data as gospel to the (frequently naive) user, what I am willing to do is to put a preference into the sequence viewer to depict sequences where the reference sequence info is missing in the manner you request. That way the user will be cognizant that what they are looking at is an estimate.

--Eric

in reply to: 6 comment:5 by olibclarke@…, 7 years ago

That sounds like a reasonable solution, thanks Eric.

Cheers
Oli

in reply to: 7 ; follow-up: 5 comment:6 by olibclarke@…, 6 years ago

The other thought for this (rather than adding one dash per expected residue) would be to just add a single charter or graphical mark (maybe an ellipsis? …) to indicate the presence of a discontinuity in sequence numbering. This would indicate to the user that there is a gap in sequence numbering, without making too many assumptions about how many residues are present in the gap.

follow-up: 6 comment:7 by Eric Pettersen, 5 years ago

The bug where '?' characters were being treated as gaps for sequence numbering has apparently been fixed by other improvements to the handling of UNK residues. The preference for showing gaps implied by residue numbering has still not been implemented.

Note: See TracTickets for help on using tickets.

Download in other formats: