Opened 6 years ago

Closed 6 years ago

#1931 closed defect (fixed)

Atom spec parsing consumes torsion command angle argument

Reported by: goddard@… Owned by: Conrad Huang
Priority: normal Milestone: 1.0
Component: Command Line Version:
Keywords: Cc: pett, Elaine Meng
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

The following bug report has been submitted:
Platform:        Darwin-18.5.0-x86_64-i386-64bit
ChimeraX Version: 0.9 (2019-05-09)
Description
open 5c1m
torsion /A:84@CD1,CG,CB,CA -109
Must specify exactly 4 atoms for 'torsion' command; you specified 3 
torsion /A:84@CD1,CG,CB,CA -109.0

Atom spec parsing gobbles the angle argument in the torsion command if the angle is a negative integer and results in an error where only 3 atoms of the 4 specified in the torsion command are recognized.  Works correctly if the angle has a decimal point.

Atom spec parsing should not consume the extra argument.




Log:
UCSF ChimeraX version: 0.9 (2019-05-09)  
© 2016-2019 Regents of the University of California. All rights reserved.  
How to cite UCSF ChimeraX  

> open 5c1m format mmCIF fromDatabase pdb

Summary of feedback from opening 5c1m fetched from pdb  
---  
warning | Cannot find consistent set of bond orders for ring system containing
atom CAV in residue 4VO /A:401  
  
5c1m title:  
Crystal structure of active mu-opioid receptor bound to the agonist BU72 [more
info...]  
  
Chain information for 5c1m #1  
---  
Chain | Description  
A | Mu-type opioid receptor  
B | Nanobody 39  
  
Non-standard residues in 5c1m #1  
---  
4VO —
(2S,3S,3aR,5aR,6R,11bR,11cS)-3a-methoxy-3,14-dimethyl-2-phenyl-2,3,3a,6,7,11c-hexahydro-1H-6,11b-(epiminoethano)-3,5a-methanonaphtho[2,1-g]indol-10-ol
(BU72)  
CLR — cholesterol  
OLC — (2R)-2,3-dihydroxypropyl (9Z)-octadec-9-enoate (1-Oleoyl-R-glycerol)  
P6G — hexaethylene glycol (polyethylene glycol PEG400)  
PO4 — phosphate ion  
  
  

> torsion /A:84@CD1,CG,CB,CA -109

Must specify exactly 4 atoms for 'torsion' command; you specified 3  

> torsion /A:84@CD1,CG,CB,CA -109.0




OpenGL version: 4.1 ATI-2.8.38
OpenGL renderer: AMD Radeon Pro 580 OpenGL Engine
OpenGL vendor: ATI Technologies Inc.

Change History (15)

comment:1 by Tom Goddard, 6 years ago

Component: UnassignedCommand Line
Owner: set to Conrad Huang
Platform: all
Project: ChimeraX
Status: newassigned
Summary: ChimeraX bug report submissionAtom spec parsing consumes torsion command angle argument

comment:2 by Conrad Huang, 6 years ago

There's no way to parse that correctly without specifying in the command description that "109" cannot be an atom name. Putting in heuristics to guess that 109 is not an atom name seems wrong. Most unambiguous solution is to require a keyword before the angle parameter.

comment:3 by Tom Goddard, 6 years ago

Cc: pett added

I don't grasp how the "-109" is being interpreted. It does not seem to be taken as part of the atom name since "109" without the minus sign works.

torsion /A:84@CD1,CG,CB,CA 109
-> works

torsion /A:84@CD1,CG,CB,CA -109
-> fails

select /A:84@CD1,CG,CB,CA -109
-> selects first 3 atoms but not the CA atom.

select /A:84@CD1,CG,CB,CA 109
-> Expected a keyword with 109 higlighted in red.

select -109
-> Expected an objects specifier or a keyword with -109 highlighted in red

And because there is a space and no quotation marks the -109 should have to be an independent atom spec.

comment:4 by Tom Goddard, 6 years ago

This seems like a possible failure of the range handling (e.g :15-354) use of "-".

comment:5 by pett, 6 years ago

It's a "double failure" in that AFAIK we don't allow ranges of atom names in ChimeraX

comment:6 by Conrad Huang, 6 years ago

Actually, we allow ranges at all levels syntactically, and then the code for that level (model, residue, atom) decides what to do about the ranges. Right now, at the atom level, they are treated as an alphanumeric range, e.g., CA-CD typically includes CA, CB, CD but not CE or CG.

The generic nature of the parsing explains why "torsion /A:84@CD1,CG,CB,CA -109" is treated as one atomspec (CA-109 is the last atom range), while "torsion /A:84@CD1,CG,CB,CA 109" is treated as an atomspec followed by something else.

If you want to eliminate atom ranges, we would need to change a lot of code. And unlike "color", I don't think of "torsion" as a verb. So "torsion /A:84@CD1,CG,CB,CA set -109" isn't that bad to me.

comment:7 by pett, 6 years ago

Cc: Elaine Meng added

I don't what the rules are for allowing spaces within atom specs, but allowing them around the '-' in a range is just problematic. A very common name for a chloride ion is "@CL-"; if the text that follows that atom name gets glommed into a mostly useless atom name range, that is not good.

The simplest fix is that the character that follows a space has to be one of the level indictors (#/:@) to be considered part of the preceding atom spec. That would have the downside of not allowing spaces after commas, but I could live with that.

My alternative solution would be to not allow spaces around the '-' of a range, and also throwing an error if an atom-name range is detected.

in reply to:  8 ; comment:8 by Elaine Meng, 6 years ago

Many people are conditioned by normal English to add spaces after commas, but not hyphens.  Since space after comma is such a frequent atomspec error (I remember explaining it repeatedly, as it’s practically finger memory for many people), I’d vote for your second suggestion of not allowing spaces around ‘-‘ of ranges.

comment:9 by Conrad Huang, 6 years ago

Right now, commas may be followed by spaces. I think we fixed that very early on after the decision that "whitespace should not be significant" won. Are we proposing that "whitespace should not be allowed around hyphen"?

A completely separate issue is whether atom name ranges should be allowed. If not, we can treat "CL-" as an atom name. "torsion /A:84@CD1,CG,CB,CA -109" would also work. The down side is that it involves a significant implementation change to the atomspec grammar. If that's what is needed, so be it. But it won't get done soon.

comment:10 by pett, 6 years ago

If the easiest thing is to disallow spaces around hyphens to avoid mis-parsing negative numbers and atom names that end in '-', then that's what I'd run with. If you want to be super fancy, you could make the rule be: don't allow space on only one side of a hyphen (which would allow "57-59" and " 57 - 59" but not "57 -59"), but that would be up to you.

comment:11 by Conrad Huang, 6 years ago

Are you proposing that hyphens for all levels (atom, residue, model) be treated this way? All levels share the same code right now, so customizing a single level is problematic.

comment:12 by pett, 6 years ago

Yes, absolutely, all levels. No way we want to document that "only for _atom_ ranges are you disallowed from having spaces adjacent to the dash". Not to mention that this exact same problem can come into play at the other levels too.

comment:13 by Conrad Huang, 6 years ago

Milestone: 1.0
Status: assignedaccepted

6fa355afa disallows spaces around hyphens in atomspecs.

Will still need to disallow atom name ranges.

in reply to:  14 ; comment:14 by Elaine Meng, 6 years ago

Documentation only describes the use of ranges for model numbers, residue numbers, and chains (with alphabetical ordering for the latter).  So we never “said” atom ranges could be used.  For atom names, there isn’t a sensible ordering anyway, since one would really want them in greek letter order: alpha, beta, gamma, delta (CA,CB,CG*,CD* …) and I highly doubt that is implemented.  I’ll add the information that hyphens can’t have spaces next to them.

comment:15 by Conrad Huang, 6 years ago

Resolution: fixed
Status: acceptedclosed

Fixed in ad8ad0bf6.

Atom name ranges are no longer allowed.

Note: See TracTickets for help on using tickets.