Opened 9 years ago

Closed 4 years ago

#480 closed enhancement (fixed)

maybe add "entity" (same molecular entity) operator to atomspecs

Reported by: Elaine Meng Owned by: pett
Priority: major Milestone:
Component: Command Line Version:
Keywords: Cc: Tom Goddard, pett, Greg Couch
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

A sequence option was recently added to the "select" command and it is very useful. E.g.

select #1 seq /k

...selects all the chains in #1 that are the same biopolymer molecular entity as chain K (which could be in another model or the same model, as I did not specify model after the sequence keyword).

After I tried it a few times I kept thinking how useful it would be as a general atomspec operator, e.g.

color seq /k purple

...color all chains with same sequence as K

Change History (19)

comment:1 by Elaine Meng, 9 years ago

however, in #501 I suggested maybe "entity" would be a better keyword than "sequence"

comment:2 by pett, 9 years ago

Component: UnassignedCommand Line

comment:3 by Elaine Meng, 5 years ago

Owner: changed from Conrad Huang to pett
Status: newassigned
Summary: maybe add "sequence" (same molecular entity) operator to atomspecsmaybe add "entity" (same molecular entity) operator to atomspecs

comment:4 by Elaine Meng, 5 years ago

related to #1027

comment:5 by pett, 5 years ago

Is this different in any significant way from #1027? Can we coalesce these two tickets?

comment:6 by Greg Couch, 5 years ago

Yes, the are essentially the same. Pick your duplicate.

comment:7 by pett, 5 years ago

This would certainly be nice to have, but it looks like it will be a bear to implement. There is no support for multi-word selectors much less for selectors that take an argument. Modifying the atom specifier syntax itself to support it will be quite the feat, since it isn't a "level" (atom/residue/chain/model) which everything else in the syntax is based around.

I'm open to ideas for any way to get this to happen without a Herculean level of effort...

comment:8 by pett, 5 years ago

I think I could implement it as a "same_as" or "entity" property of Chain, e.g.:

same_as=/k

Is that too awful?

--Eric

comment:9 by pett, 5 years ago

Apparently two forward slashes is italics in Wiki formatting. The above was supposed to be:

//same_as=/k

comment:10 by pett, 5 years ago

Elaine and I like the term "entity" and therefore propose: name this attribute "entity"; add a "byentity" synonym for "bypolymer" to the 'color' command; and remove the "polymer" option from the 'select' command (duplicative with this attribute).

comment:11 by pett, 4 years ago

I've implemented the mildly hacky entity attribute. It's slightly uglier than my original proposal because you can't used atom-spec special characters directly in an attribute test; you have to quote them. So my original suggestion would instead be:

//entity="/k"

(ignore the leading exclamation point if you're getting this by mail). You can use any spec inside the quotes -- it doesn't have to be a single chain, so it can match (or not match) a set of chains. I'm going to push it out so that people can play around with it and provide feedback.

comment:12 by Greg Couch, 4 years ago

How does that work when "/k" is composed of multiple (mmCIF) entities? Would you have to do "/k & protein" or "/k & ligand"?

in reply to:  13 ; comment:13 by pett, 4 years ago



Polymeric entities: chains, not ligands.

comment:14 by Greg Couch, 4 years ago

It would be really great if there were a variation that matched mmCIF entities.

comment:15 by pett, 4 years ago

I guess I don't understand this comment. This attribute tests whether full sequences match. Don't ligand entities simply have names? Wouldn't you use that?

comment:16 by Greg Couch, 4 years ago

The point is that mmCIF files have entities, so it would be nice if the concepts matched. Being able to use the mmCIF entity names would be a bonus.

In mmCIF v5, chains could be composed of branched, polymer, and non-polymer entities, all at the same time. It all depends on what the author has picked to use the same chain id.

comment:17 by pett, 4 years ago

I think you're going down a rabbit hole here. The point of this attribute is to test whether the full primary sequences of chains match. Post-translational modifications like glycosylations, phosphorylations, or covalently bound ligands don't matter for this purpose. Is the problem that the attribute is called "entity"? Would it be better if It were named "sameseq" (or something)?

Last edited 4 years ago by pett (previous) (diff)

in reply to:  18 ; comment:18 by Elaine Meng, 4 years ago

I want a word consistent with the coloring by polymer sequence, which is currently "color bypolymer," although I'd originally suggested it should be "color byentity."  Given Greg's aversion to "entity," I vote for either:

(1) polymer (goes with current coloring terminology "color bypolymer")
- or -
(2) sameseq (and we add "color bysameseq" synonym to the existing "color bypolymer").   I also thought of "seqmatch" and "color byseqmatch" but that doesn't seem any better.

Each is yucky/imperfect in its own way, but the ability to specify this concept is useful, and it has to be named something.  Unless anybody else has come up with a brilliant word that solves all our problems.

comment:19 by pett, 4 years ago

Resolution: fixed
Status: assignedclosed

After some mulling things over and discussion with Elaine, we elected to call the attribute "identity". So the new 'color' keyword is "byidentity" (a synonym for "bypolymer") and I have removed the "polymer" keyword from the 'select' command.

Note: See TracTickets for help on using tickets.