wiki:atomspec

Version 1 (modified by Conrad Huang, 11 years ago) ( diff )

--

Proposed Atom Specifier Syntax

The Chimera 1 atom specifier syntax is the starting point. The options below propose changes to address specific limitations of the Chimera 1 atomspec syntax.

Allowing Spaces in Atom Specifiers

The motivation is to simplify parsing command lines with atom specifiers by making each individual atomspec delimited by whitespace. The down side is that users cannot no longer use spaces to separate different parts of a single atom specifier.

  • I do not think there is a way to do this if we insist on being able to type named selection names with no modification. For one thing, there may be spaces in the name. For another, separating part of an atomspec that ends with an atom name from the next part that is a named selection would require severe contortions.

The alternative is to allow spaces but require the entire atomspec to be quoted when they are present. The motivation is that most atom specifiers do not (or need not) contain spaces, and would not require quoting. More complex atomspecs may require embedded spaces but quoting would be a minor additional requirement. Parsing would still be fairly simple because the atomspec would still be treated as a single token. The down side is that quoting is occasionally required.

Add a "Chain" Level to Atom Specifiers

The motivation is that chain is an often used grouping and should have be treated on the same footing as models, residues and atoms. For example, using '/' as the chain level character, "#1/A:23" refers to "model 1 chain A residue 23" and is equivalent to the Chimera 1 "#1:23.A". This improves the old "#1:.A" by making it "#1/A". The old "#1:23", which selected "residue 23" from all chains is still "#1:23" in the new syntax (omitting an atomspec level means use all entries from that level).

The problem is that '/' is already used in Chimera 1 as the attribute-test operator. '/' is good as a chain specifier character because it is a straight line and is an unshifted character on the keyboard (does not require pressing the Shift key to type). Other possibilities are '\' (unused, unshifted), '|' (already used as union operator, sometimes broken line and shifted), '_' (unused, shifted) and '!' (already used as negation operator, broken line and shifted).

If we select a character that is already used in Chimera 1, we would need to replace it with some other character. The more we do so, the less transferable the old Chimera 1 conventions. Last week, I mentioned using:

'/' for chains (Chimera 1 attribute-test operator),
'?' for attribute-test operator (Chimera 1 single-character wildcard)
'*' for multi-character wildcard (Chimera 1 whole-name wildcard), and
'=' for single-character wildcard (Chimera 1 multi-character wildcard).

Realistically, '\' is probably the best character to use because we can keep the same characters for all operators, except that '\' is not near the home keys for touch typists.

Add Parentheses to Atom Specifiers

The motivation is to use parentheses for grouping sub-parts of atomspecs to avoid having to repeat sections. In Chimera 1, "#0:12#1:14@CA,CB" selects "all atoms in model 0 residue 12 and atoms CA and CB in model 1 residue 14". To select only CA and CB from model 0 residue 12, we would have to use "#0:12@CA,CB#1:14@CA,CB". Parentheses would enable using "(#0:12#1:14)@CA,CB".

The problem with adding parentheses is that it introduces potential ambiguities. For example, what should be the meaning of "(#0@CA:12)@N"? Are the complications (both code for handling them and documentation for explaining them) worth the benefits?

Change Zone Operator Specification

Chimera 1 zone specification looks like "atomspec za<12" or "atomspec zr > 5". "zr" and "za" are keywords that were selected to make the zone operator look "nice" (similar to a comparison test), with the 'z' denoting a zone operation. With the proposed addition of the chain level, and with the increased number of models used in a session where users may want to "zone by model", I suggest we change the zone operator to be "level-character comparison-operator". For example, "models within 5A" would be "atomspec #<5"; "residues 10A or farther" would be "atomspec :>= 10". The space between the level character and comparison operator is optional.

Named Selection vs Aliases

In Chimera 1, there are many ways to specify atom selections on the command line. Two very similar ones are aliases and named selections (with which I group predefined selections from the Selection menu). In practice, many aliases function as named OSL selections. I propose that we separate the aliasing mechanism from the named selection mechanism so that there is no inline expansion for atomspecs. That is, when the command looks like "display pocket", "pocket" is like "backbone" or "O" from the Selection menu rather than some arbitrary string defined in an alias command. We can add a new command namesel for creating named selections. The benefit of this approach is that:

  • there can be error checking in the namesel command that is not possible with alias,
  • there is little functional difference for the user other than having to distinguish between aliases and named selection (not necessarily a bad thing), and
  • parsing atom specifiers is simpler.
Note: See TracWiki for help on using the wiki.