[Chimera-users] Auto-associate define attribute/render by attribute

Healey, Joseph J.Healey.1 at warwick.ac.uk
Tue Dec 3 01:15:31 PST 2019


Hi Eric,

That looks pretty similar to what I had come up with yeah (here I’m assuming an exact sequence to attribute correspondence over the full length of the chain). This is approximately the approach I was using:

from chimera import openModels, Molecule
from chimera import runCommand as rc

# assume a list of values <= len(seq) read in from a file
rc(“split”)

for m in openModels.list(modelTypes=[Molecule]):
  # insert logic for pairing sequences/input files to chains by sequence matching
  # some heuristic like Levenshtein distance will probably suffice for now or I can replace this with pairwise alignment from BioPython or something.
    for r, val in zip(m.residues, list_of_vals):
        setattr(r, ‘chouFasmanImmuno’, val)

rc(“rangecol chouFasmanImmuno ….”)


On your notes:


  1.  - The various trys/ifs are a good idea so I’ll incorporate those, thanks!
  2.  - Thanks for the tip on the capital letters – this was going to be my next question as I couldn’t get the RBA dialogue to show the new attributes so this probably explains it. I’m not fussy about the case-sensitivity of the attributes so I’ll just stick to camel/lowercase.
     *   On a related note, is there a need to programmatically ‘refresh’ rangecol/RBA after defining the attribute, as there is with the dialogue box, or should it be happy once the attr is set?
  3.  - This was exactly my plan! 😉 the residue attribute numbers for various different immunogenicity algorithms are coming from a separate suite of tools, so a pandas dataframe will probably be the input (perhaps as a csv). Initially my thinking was to use the Define Attribute format and encode the sequence in a #comment, but actually this approach frees me entirely from needing to follow that format which I think is better. My python is pretty good these days so it’s only when I run up against the inner workings of chimera that I need to consult you guys really!

This definitely feels like a simpler approach than messing with DefineAttribute or MAV, so I think it’s the way to go. Thanks again for the help.

Joe



Dr. Joseph Healey Ph.D. M.Sc. B.Sc. (Hons) MRSB
Research Fellow
Warwick Medical School
University of Warwick
Coventry
CV47AL
Mob: +44 (0) 7536 042620  | Twitter: @JRJHealey<https://twitter.com/JRJHealey>  |  Website<http://www2.warwick.ac.uk/fac/sci/moac/people/students/2013/joseph_healey>
Email: J.Healey.1 at warwick.ac.uk<mailto:J.Healey.1 at warwick.ac.uk> | ORCID: orcid.org/0000-0002-9569-6738


From: Eric Pettersen <pett at cgl.ucsf.edu>
Reply to: "chimera-users at cgl.ucsf.edu BB" <chimera-users at cgl.ucsf.edu>
Date: Monday, 2 December 2019 at 22:40
To: "Healey, Joseph" <J.Healey.1 at warwick.ac.uk>
Cc: "chimera-users at cgl.ucsf.edu BB" <chimera-users at cgl.ucsf.edu>
Subject: Re: [Chimera-users] Auto-associate define attribute/render by attribute

As long as we are talking about exact (sub)sequence matches, I think it is reasonably simple to do in Python:

# pairs of sequences and corresponding values you want to assign…
seq_vals = [
(“ADTY…whatever…PLHE”, (1.97, 2.14, … same len as sequence…, 0.88, 1.22)),
(“…next sequence…”, (…next set of values…))
]
from chimera import openModels, Molecule
for m in openModels.list(modelTypes=[Molecule]):
for seq in m.sequences():
for target_seq, vals in seq_vals:
try:
i = str(seq).index(target_seq)
except ValueError:
continue
for offset, val in enumerate(vals):
r = seq.residues[i+offset]
if r:
r.chouFasmanImmuno = val

A few notes:

1)  If there is missing structure, there maybe not be a residue that corresponds to a sequence position, which is why there is an “if r:” test in the script.  The full sequence is known from the SEQRES records in the PDB file.

2)  Avoid assigning an attribute name starting with an upper-case letter.  Symbolic constants start with upper-case letters and the render-by-attr code will screen out such attributes from its interface.  There is a way around this by registering the attribute with the SimpleSession module (which will also get the attribute to save in sessions).  I can provide details if needed.

3) You may want to read the sequences and values from a file rather than have them directly in the script.  You might also want to set multiple attributes, in which case you would include the the attribute name along with the sequence and values, and change the corresponding ‘for’ loop and change ‘r.attrName = val’ to setattr(r, attrName, val).  I don’t know how good your Python juju is, but I can provide guidance as needed.

—Eric

On Nov 29, 2019, at 8:22 AM, Healey, Joseph <J.Healey.1 at warwick.ac.uk<mailto:J.Healey.1 at warwick.ac.uk>> wrote:

Hi Eric,

I’ve been giving this some more thought. Is there perhaps a way to brute force this?

My thinking at the moment is, given that I’ll know the sequence, positions, and scores I want to use, all I really need to do is somehow set the attributes of the relevant object, perhaps circumventing MAV altogether? My main question is then, how are attributes for the models currently stored? Does `mavAttributeName` become an actual attribute of the `molecule.residue[n]`?

Broadly I was thinking something like this might work based on your code suggestion below:


·         Split the open model to get individual models for the chains

·         Iterate the list of models, and test whether their .sequence() matches the sequence I expect (perhaps via alignment if not perfect matches). I could read the attribute file in alongside this in base python, which perhaps means I don’t need to use the defined format too, and therefore can encode the seq/score/position all in one go.

·         If it does, assign the relevant attribute name/score to the residue/model (depending on how exactly the attributes are stored at the moment). As simple as Molecule.Residue.mavAttributeName? Though of course there wouldn’t necessarily need to be a “mav” in there.

o    If not, move on to the next model

·         If I can ‘force’ the attributes on to the residues etc, I can then trigger the rendering without needed to use MAV?

Presumably there’s a way to ‘inject’ the namespace with an object equivalent to that which MAV/DefineAttribute would achieve, and I can do the association ‘manually’?

Many thanks,

Joe



Dr. Joseph Healey Ph.D. M.Sc. B.Sc. (Hons) MRSB
Research Fellow
Warwick Medical School
University of Warwick
Coventry
CV47AL
Mob: +44 (0) 7536 042620  | Twitter: @JRJHealey<https://twitter.com/JRJHealey>  |  Website<http://www2.warwick.ac.uk/fac/sci/moac/people/students/2013/joseph_healey>
Email: J.Healey.1 at warwick.ac.uk<mailto:J.Healey.1 at warwick.ac.uk> | ORCID: orcid.org/0000-0002-9569-6738<http://orcid.org/0000-0002-9569-6738>


From: Eric Pettersen <pett at cgl.ucsf.edu<mailto:pett at cgl.ucsf.edu>>
Reply to: "chimera-users at cgl.ucsf.edu<mailto:chimera-users at cgl.ucsf.edu> BB" <chimera-users at cgl.ucsf.edu<mailto:chimera-users at cgl.ucsf.edu>>
Date: Wednesday, 27 November 2019 at 23:56
To: "chimera-users at cgl.ucsf.edu<mailto:chimera-users at cgl.ucsf.edu> BB" <chimera-users at cgl.ucsf.edu<mailto:chimera-users at cgl.ucsf.edu>>
Cc: "Healey, Joseph" <J.Healey.1 at warwick.ac.uk<mailto:J.Healey.1 at warwick.ac.uk>>
Subject: Re: [Chimera-users] Auto-associate define attribute/render by attribute

On Nov 27, 2019, at 1:07 PM, Elaine Meng <meng at cgl.ucsf.edu<mailto:meng at cgl.ucsf.edu>> wrote:

Python is beyond my skill set, though… somebody else would have to advise on that.

The first thing I should say is that Multalign Viewer is only available as a graphical tool, so you will only be able to script this for a Chimera that is running its graphical interface, not in any headless “batch” mode.  Okay, with that out of the way, the first job is to get a Python instance of Multalign Viewer showing the sequence you want with the chains you want associated.  Unlike ChimeraX, which associates on a per-chain basis, Chimera associates on a per-structure basis, so you will have to first split your structure apart by chains (with the “split” command).  If you have the pertinent sequence in a file, you could do this:

from MultAlignViewer.MAViewer import MAViewer
mav = MAViewer(“full-path-to-sequence-file”)

If you don’t have a file, but you know you want to use the sequence of chain C, you could do this:

from chimera import openModels, Molecule
for m in openModels.list(modelTypes=[Molecule]):
try:
seq = m.sequence(‘C’)
except KeyError:
continue
from MultAlignViewer.MAViewer import MAViewer
mav = MAViewer([seq])

You then add your custom header file with:

mav.readHeaderFile(“full-path-to-header-file”)

After that, the residues will have the necessary attribute (prefixed with “mav”, so if the header attribute is ChouFasmanImmuno, the residue attribute is mavChouFasmanImmuno).

—Eric



Eric Pettersen
UCSF Computer Graphics Lab



_______________________________________________
Chimera-users mailing list: Chimera-users at cgl.ucsf.edu<mailto:Chimera-users at cgl.ucsf.edu>
Manage subscription: http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://plato.cgl.ucsf.edu/pipermail/chimera-users/attachments/20191203/cf7c544c/attachment-0001.html>


More information about the Chimera-users mailing list