Opened 6 years ago
Last modified 5 years ago
#2008 accepted defect
unknown residues numbered like gaps
Reported by: | Owned by: | pett | |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Sequence | Version: | |
Keywords: | Cc: | Elaine Meng | |
Blocked By: | Blocking: | ||
Notify when closed: | Platform: | all | |
Project: | ChimeraX |
Description
The following bug report has been submitted: Platform: Darwin-18.6.0-x86_64-i386-64bit ChimeraX Version: 0.9 (2019-06-05) Description Hi, Regarding the sequence viewer - would it be possible to represent missing (unmodelled) residues as dashes ("-") in the sequence viewer, when viewing the sequence of a single protein model? This would be useful for quickly seeing how much of the native sequence is not modeled, quickly identifying missing loops, and as a bonus the "alignment numbering" used in the sequence viewer would then match the real numbering of the coordinates file. Cheers Oli Log: UCSF ChimeraX version: 0.9 (2019-06-05) © 2016-2019 Regents of the University of California. All rights reserved. How to cite UCSF ChimeraX > open 1bl8 format mmCIF fromDatabase pdb 1bl8 title: Potassium channel (KCSA) from streptomyces lividans [more info...] Chain information for 1bl8 #1 --- Chain | Description A B C D | protein (potassium channel protein) Alignment identifier is 1 > close #1 > open 5tal format mmCIF fromDatabase pdb 5tal title: Structure of rabbit RyR1 (Caffeine/ATP/Ca2+ dataset, class 1&2) [more info...] Chain information for 5tal #1 --- Chain | Description A F H J | Peptidyl-prolyl cis-trans isomerase FKBP1B B E G I | Ryanodine receptor 1 Non-standard residues in 5tal #1 --- ATP — adenosine-5'-triphosphate CA — calcium ion CFF — caffeine (3,7-dihydro-1,3,7-trimethyl-1H-purine-2,6-dione) ZN — zinc ion Alignment identifier is 1 Alignment identifier is 2 Alignment identifier is 3 Alignment identifier is 4 OpenGL version: 4.1 ATI-2.9.26 OpenGL renderer: AMD Radeon Pro 580 OpenGL Engine OpenGL vendor: ATI Technologies Inc.
Attachments (1)
Change History (8)
comment:1 by , 6 years ago
Cc: | added |
---|---|
Component: | Unassigned → Sequence |
Owner: | set to |
Platform: | → all |
Project: | → ChimeraX |
Status: | new → accepted |
Summary: | ChimeraX bug report submission → unknown residues numbered like gaps |
follow-up: 2 comment:2 by , 6 years ago
Hi Eric, Thanks! Yes I understand that re the UNKs - this is a structure I built - but that’s not quite what I mean. Many missing regions are not indicated in the sequence. E.g. see attached (another structure of the same protein with no UNKs). E4253 is indicated as being adjacent to F4540, when there are 200+ residues in between, as you can tell from the numbering. What I am suggesting is that even if the sequence is not present in the coordinate file header for these residues (which is I guess how the “boxed” missing regions are being extracted), their presence (inferred from the numbering) should be indicated with one dash per residue. This would make interpretation of the sequence view easier, and allow users to quickly see gaps in the sequence numbering deriving from regions that have not been built, missing loops etc. Cheers Oli
follow-up: 3 comment:3 by , 6 years ago
comment:4 by , 6 years ago
The problem is that PDB residue numbering is not guaranteed to be linearly increasing. While in probably 98% of cases you could get the number of missing residues by subtracting the sequence numbers, in cases where the numbering corresponds to a reference sequence with an insertion or deletion in the missing region, your estimate would be wrong. Also, circular permutations, with their ability to have the residue numbering actually drop or shoot up dramatically -- with no actual missing residues in between -- throw another monkey wrench into the problem.
While I am unwilling to present "guesstimated" data as gospel to the (frequently naive) user, what I am willing to do is to put a preference into the sequence viewer to depict sequences where the reference sequence info is missing in the manner you request. That way the user will be cognizant that what they are looking at is an estimate.
--Eric
follow-up: 5 comment:6 by , 6 years ago
The other thought for this (rather than adding one dash per expected residue) would be to just add a single charter or graphical mark (maybe an ellipsis? …) to indicate the presence of a discontinuity in sequence numbering. This would indicate to the user that there is a gap in sequence numbering, without making too many assumptions about how many residues are present in the gap.
follow-up: 6 comment:7 by , 5 years ago
The bug where '?' characters were being treated as gaps for sequence numbering has apparently been fixed by other improvements to the handling of UNK residues. The preference for showing gaps implied by residue numbering has still not been implemented.
The '?'s in the sequence for chains B/E/G in 5tal are not missing residues. Missing residues have a black outline box around them (of which there are many on that sequence). '?'s are for existing residues whose type is unknown, i.e. they have backbone (or CA) coordinates but no side chain atoms and their residue name is UNK. The bug is that for alignment numbering purposes '?' residues are being treated like gaps. I will fix that.
--Eric