[Chimera-users] Programmatic trimming of PDB files based on alignment with reference structure

Zachariah Schuurs zachariah.schuurs at hdr.qut.edu.au
Mon Nov 18 20:27:54 PST 2019


Hi Eric,

Thanks so much for the help, it has achieved exactly what I desired. I will certainly be using this new knowledge of the API in the future. You guys have such great support.

Cheers

From: Eric Pettersen <pett at cgl.ucsf.edu>
Sent: Saturday, 16 November 2019 5:10 AM
To: Zachariah Schuurs <zachariah.schuurs at hdr.qut.edu.au>
Cc: chimera-users at cgl.ucsf.edu BB <chimera-users at cgl.ucsf.edu>
Subject: Re: [Chimera-users] Programmatic trimming of PDB files based on alignment with reference structure

Hi Zachariah,
            So you would call match() twice, once to get the full correspondence between reference and match domains (i.e. turning iteration off so that no residues get dropped) and a second time to get the best superposition.  This assumes that you actually care about the superposition and not just the correspondences.  If you don’t care about the superposition, you can skip the second match().
            So, to turn iteration off, make the same call you were making except with iterate=None.  The returned value will be a list of tuples, each tuple containing 4 values:  the moving atoms, the reference atoms, the RMSD for the actually matched atoms after iteration, the RMSD across all the initial atoms.  Since you are matching only one chain, the list will contain only one tuple, so:

            m = match(CP_SPECIFIC_SPECIFIC, [(c1, c2)], defaults[MATRIX], "nw",  defaults[GAP_OPEN], defaults[GAP_EXTEND], iterate=None)
            match_atoms, ref_atoms, iteration_rmsd, full_rmsd = m

You could then select the residues of the matched domain with:

            from chimera.selection import setCurrent
            setCurrent([a.residue for a in match_atoms])

Then you can do whatever you want with the selected atoms/residues, e.g. save them to a PDB file with the ‘write’ command.

—Eric


On Nov 14, 2019, at 8:17 PM, Zachariah Schuurs <zachariah.schuurs at hdr.qut.edu.au<mailto:zachariah.schuurs at hdr.qut.edu.au>> wrote:

Hi Eric,

Thanks for the detailed reply. The script used some information from a previous question on the mailing list for MatchMaker, and therefore it uses MatchMaker.match() to match the reference file to the protein files in question. It is as follows:

                s1, s2 = openModels.list(modelTypes=[Molecule])
              c1, c2 = s1.sequence('A'), s2.sequence(chain)

m = match(CP_SPECIFIC_SPECIFIC, [(c1, c2)], defaults[MATRIX], "nw",  defaults[GAP_OPEN], defaults[GAP_EXTEND], iterate=defaults[ITER_CUTOFF])
               RMSD = (m[-1][-1])

Some further guidance on accessing the matched atom objects and thereby the matched domain would be greatly appreciated.  Playing around a bit with the code I see the reference atoms, corresponding atoms and the RMSD values. Would it just be a matter of accessing the corresponding matched atoms and outputting a PDB file of those? Or would it still need to be trimmed?

Thanks for the help!


From: Eric Pettersen <pett at cgl.ucsf.edu<mailto:pett at cgl.ucsf.edu>>
Sent: Friday, 15 November 2019 11:26 AM
To: Zachariah Schuurs <zachariah.schuurs at hdr.qut.edu.au<mailto:zachariah.schuurs at hdr.qut.edu.au>>
Cc: chimera-users at cgl.ucsf.edu<mailto:chimera-users at cgl.ucsf.edu>
Subject: Re: [Chimera-users] Programmatic trimming of PDB files based on alignment with reference structure

Hi Zachariah,
            I can see three possible approaches to extracting the information you need:

1) Call the underlying routine that MatchMaker uses.  I’m guessing your script is doing something like runCommand(“matchmaker #0 #1”) to do the matching, and that call returns nothing.  The pertinent call is MatchMaker.match(…).  It returns the reference atoms, corresponding matched atoms, and raw and pruned RMSDs.  I’d say you’d have to be pretty Python savvy to use this approach because coming up with the arguments to the MatchMaker.match() call and processing the return values will require a good working knowledge of Python and how Chimera’s Python API works.  Anyway, if you took this approach you would first call MatchMaker.match() with no iteration and the returned matched atoms would indicate the domain on the matched structure.  If you decide to take this approach I could offer further guidance.

2) Extract the information from the Multalign Viewer tool.  Probably the most practical approach, even if it is somewhat tricky.  Since the Multalign Viewer tool is only available if the Chimera interface is shown, you won’t be able to run the script without the interface (i.e. with the --nogui flag).  Anyway, once you run the matchmaker command to generate the MAV instance, use thefindMAV() function shown in this previous chimera-users message: http://plato.cgl.ucsf.edu/pipermail/chimera-users/2014-June/010016.html to the MAV Python instance.  Assuming the domain on the reference is selected (and only that is selected), you can get a list of those residues with:

            from chimera.selection import currentResidues
            ref_domain = currentResidues()

and assuming the MAV instance is in the variable mav, you can get a list of the corresponding matched residues with:

            match_domain =[]
            ref_seq, match_seq = mav.seqs
            ref_map = ref_seq.matchMaps.values()[0]
            match_map = match_seq.matchMaps.values()[0]
            for ref_r in ref_domain:
                        pos = ref_map[ref_r]
                        try:
                                    match_domain.append(match_map[pos])
                        except KeyError:
                                    pass

then you can select those residues with:

            from chimera.selection import setCurrent
            setCurrent(match_domain)

Now the matched domain (only) should be selected and you can go on to do whatever you would do with that.  You should probably also call mav.Quit() to close the MAV instance.

3) Use ChimeraX.  In ChimeraX, the chimera.runCommand(…) equivalent (chimerax.core.commands.run(…)) does return the match values, such as the atom correspondences, so as you can imagine things would be simpler.  I don’t know if ChimeraX does everything else you intend to do or not, but it certainly does the things you mentioned.  It is also faster than Chimera, and can run the script in nogui mode, if either of those are relevant.  I can provide further guidance here also if you choose this route.

—Eric


                Eric Pettersen
                UCSF Computer Graphics Lab






On Nov 13, 2019, at 7:56 PM, Zachariah Schuurs <zachariah.schuurs at hdr.qut.edu.au<mailto:zachariah.schuurs at hdr.qut.edu.au>> wrote:

Dear Chimera Team,

I have a reference protein domain and then a series of PDB files which all contain the same domain. I have programmatically been able to script an alignment of the reference to the same domain in the other files whereby he domain in the proteins of interest are isolated using the zone select method. This however leaves me with fragments and often misses some of the domain features of the interest proteins that occur outside of the specified zone.

I am wondering if it is possible to write a python script that selects the area on the protein of interest by selecting the residues in the sequence alignment after MatchMaker is conducted. For instance – protein #0 is my reference protein, which is 150 residues long. My protein of interest contains the same domain as the reference, but is 500 residues long. Upon conducting MatchMaker and getting an output sequence alignment, the 150 residues of #0 (the reference) are selected, along with the aligned residues in the protein of interest. It is possible to select the residues in the sequence alignment window (as in the picture) but I wish to be able to script it. The selected 150 residues (or so) in the protein of interest are then saved as their own trimmed file. I have been able to write a python script for completing the MatchMaker etc, but I am not sure how to achieve the selection of residues in #1 based of the alignment with #0, or if it is even possible.

Thanks

Zachariah Schuurs  | PhD Candidate
CARP – Cancer & Ageing Research Program
Institute of Health and Biomedical Innovation | Queensland University of Technology
Level 6 | Translational Research Institute | 37 Kent St | Woolloongabba QLD 4102
T: +61 7 3443 7296   | E: zachariah.schuurs at hdr.qut.edu.au<mailto:zachariah.schuurs at hdr.qut.edu.au> | Web: www.carp.org.au<http://www.carp.org.au/>
<image002.jpg>






I acknowledge the Turrbal and Yugara, as the First Nations owners of the lands where QUT now stands. I pay respect to their Elders, lores, customs and creation spirits. I recognise that these lands have always been places of teaching, research and learning.

This email and its attachments (if any) contain confidential information intended for use by the addressee and may be privileged. We do not waive any confidentiality, privilege or copyright associated with the email or the attachments. If you are not the intended addressee, you must not use, transmit, disclose or copy the email or any attachments. If you receive this email by mistake, please notify the sender immediately and delete the original email.


<image004.png>
_______________________________________________
Chimera-users mailing list: Chimera-users at cgl.ucsf.edu<mailto:Chimera-users at cgl.ucsf.edu>
Manage subscription: http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users

_______________________________________________
Chimera-users mailing list: Chimera-users at cgl.ucsf.edu<mailto:Chimera-users at cgl.ucsf.edu>
Manage subscription: http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://plato.cgl.ucsf.edu/pipermail/chimera-users/attachments/20191119/de9bb8f2/attachment-0001.html>


More information about the Chimera-users mailing list