[Chimera-users] Programmatic trimming of PDB files based on alignment with reference structure

Thu Nov 14 17:25:39 PST 2019

Hi Zachariah,
	I can see three possible approaches to extracting the information you need:

1) Call the underlying routine that MatchMaker uses.  I’m guessing your script is doing something like runCommand(“matchmaker #0 #1”) to do the matching, and that call returns nothing.  The pertinent call is MatchMaker.match(…).  It returns the reference atoms, corresponding matched atoms, and raw and pruned RMSDs.  I’d say you’d have to be pretty Python savvy to use this approach because coming up with the arguments to the MatchMaker.match() call and processing the return values will require a good working knowledge of Python and how Chimera’s Python API works.  Anyway, if you took this approach you would first call MatchMaker.match() with no iteration and the returned matched atoms would indicate the domain on the matched structure.  If you decide to take this approach I could offer further guidance.

2) Extract the information from the Multalign Viewer tool.  Probably the most practical approach, even if it is somewhat tricky.  Since the Multalign Viewer tool is only available if the Chimera interface is shown, you won’t be able to run the script without the interface (i.e. with the --nogui flag).  Anyway, once you run the matchmaker command to generate the MAV instance, use the findMAV() function shown in this previous chimera-users message: http://plato.cgl.ucsf.edu/pipermail/chimera-users/2014-June/010016.html <http://plato.cgl.ucsf.edu/pipermail/chimera-users/2014-June/010016.html> to the MAV Python instance.  Assuming the domain on the reference is selected (and only that is selected), you can get a list of those residues with:

	from chimera.selection import currentResidues
	ref_domain = currentResidues()

and assuming the MAV instance is in the variable mav, you can get a list of the corresponding matched residues with:

	match_domain =[]
	ref_seq, match_seq = mav.seqs
	ref_map = ref_seq.matchMaps.values()[0]
	match_map = match_seq.matchMaps.values()[0]
	for ref_r in ref_domain:
		pos = ref_map[ref_r]
		try:
			match_domain.append(match_map[pos])
		except KeyError:
			pass

then you can select those residues with:

	from chimera.selection import setCurrent
	setCurrent(match_domain)

Now the matched domain (only) should be selected and you can go on to do whatever you would do with that.  You should probably also call mav.Quit() to close the MAV instance.

3) Use ChimeraX.  In ChimeraX, the chimera.runCommand(…) equivalent (chimerax.core.commands.run(…)) does return the match values, such as the atom correspondences, so as you can imagine things would be simpler.  I don’t know if ChimeraX does everything else you intend to do or not, but it certainly does the things you mentioned.  It is also faster than Chimera, and can run the script in nogui mode, if either of those are relevant.  I can provide further guidance here also if you choose this route.

—Eric

	Eric Pettersen
	UCSF Computer Graphics Lab

> On Nov 13, 2019, at 7:56 PM, Zachariah Schuurs <zachariah.schuurs at hdr.qut.edu.au> wrote:
> 
> Dear Chimera Team,
>  
> I have a reference protein domain and then a series of PDB files which all contain the same domain. I have programmatically been able to script an alignment of the reference to the same domain in the other files whereby he domain in the proteins of interest are isolated using the zone select method. This however leaves me with fragments and often misses some of the domain features of the interest proteins that occur outside of the specified zone.
>  
> I am wondering if it is possible to write a python script that selects the area on the protein of interest by selecting the residues in the sequence alignment after MatchMaker is conducted. For instance – protein #0 is my reference protein, which is 150 residues long. My protein of interest contains the same domain as the reference, but is 500 residues long. Upon conducting MatchMaker and getting an output sequence alignment, the 150 residues of #0 (the reference) are selected, along with the aligned residues in the protein of interest. It is possible to select the residues in the sequence alignment window (as in the picture) but I wish to be able to script it. The selected 150 residues (or so) in the protein of interest are then saved as their own trimmed file. I have been able to write a python script for completing the MatchMaker etc, but I am not sure how to achieve the selection of residues in #1 based of the alignment with #0, or if it is even possible.
>  
> Thanks
>  
> Zachariah Schuurs  | PhD Candidate 
> CARP – Cancer & Ageing Research Program
> Institute of Health and Biomedical Innovation | Queensland University of Technology
> Level 6 | Translational Research Institute | 37 Kent St | Woolloongabba QLD 4102
> T: +61 7 3443 7296   | E: zachariah.schuurs at hdr.qut.edu.au <mailto:zachariah.schuurs at hdr.qut.edu.au> | Web: www.carp.org.au <http://www.carp.org.au/>
> <image002.jpg>
>  
>  
>  
>  
>  
>  
> I acknowledge the Turrbal and Yugara, as the First Nations owners of the lands where QUT now stands. I pay respect to their Elders, lores, customs and creation spirits. I recognise that these lands have always been places of teaching, research and learning.
>  
> This email and its attachments (if any) contain confidential information intended for use by the addressee and may be privileged. We do not waive any confidentiality, privilege or copyright associated with the email or the attachments. If you are not the intended addressee, you must not use, transmit, disclose or copy the email or any attachments. If you receive this email by mistake, please notify the sender immediately and delete the original email.
>  
>  
> <image004.png>
> _______________________________________________
> Chimera-users mailing list: Chimera-users at cgl.ucsf.edu <mailto:Chimera-users at cgl.ucsf.edu>
> Manage subscription: http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users <http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://plato.cgl.ucsf.edu/pipermail/chimera-users/attachments/20191114/7b0b6746/attachment-0001.html>