[Chimera-users] Text statistics from Matchmaker

Mon Aug 4 15:38:59 PDT 2008

Hi Jim,
Thanks for your kind words - we're glad MatchMaker has been useful!!

Now to your questions... I am not sure exactly what you want...

One issue:
When you say RMSD, do you mean the RMSD of the overall pairwise fit  
consisting of many CA-CA pairs, or do you want a measure of the  
structural variability within each individual column of the  
alignment?  The former information is sent to the Reply Log, while the  
latter is available as the RMSD header, which can be shown as a  
histogram above the sequences in the alignment.   In the most recent  
versions of Chimera (newer than the July 9 production release), the  
RMSD header is automatically shown in alignments from MatchMaker; in  
the production release you can just turn it on using the Headers menu  
in the sequence alignment window.  In the pairwise case, this header  
simply shows the CA-CA distance between the two residues associated  
with a column in the alignment.

You can save header values to a file.  However, simply using the  
values within Chimera can be more powerful.  As header values are  
assigned as attributes of the associated residues, you can just select  
the residues with mavRMSD attribute values (see Select... By Attribute  
Value in main Chimera menu) above or below some number and apply  
various actions to them, including writing them out to a list.  You  
can use "Structure... Match" in the sequence alignment menu to  
superimpose the structures with or without fit iteration (without  
iteration uses all columns regardless of any cutoff).   You can save  
the alignment itself (File... Save As in alignment window menu) as  
well as a text file of which structure residue is associated with each  
position in the alignment (File... Save Association Info).

A second issue:
Depending on how easy-to-align the sequences are (level of percent  
identity), you may want to go through a cycle of using Match->Align to  
generate a second sequence alignment from the MatchMaker superposition  
instead of using the initial sequence alignment from MatchMaker.  To  
get that second alignment after using MatchMaker, you would start  
Match->Align (under Tools... Structure Comparison), set a distance  
cutoff, and press OK.

Here is a little more detail on that issue:
"The primary purpose of MatchMaker is to superimpose related  
structures; the sequence alignments can be considered a by-product.  
Successful superposition only requires a partially correct sequence  
alignment, as incorrect portions tend to be omitted during fit  
iteration. If the sequences are easily alignable (significantly  
similar), the MatchMaker sequence alignment is likely to be correct  
from beginning to end, but when they are more distantly related, parts  
of the sequence alignment may be incorrect even when the resulting  
iterated match looks very good. If the goal is to obtain not just a  
structural superposition but also an alignment of dissimilar  
sequences, Match -> Align is recommended for generating a structurally  
verified sequence alignment after the structures have been  
superimposed. Furthermore, matching the structures using this  
structurally verified sequence alignment will provide better RMSD and  
number-of-pairs values for describing structural similarity when the  
sequences are divergent (because more columns are aligned correctly)  
while having little effect on the superposition."

It sounds like your sequences are so similar, this issue may not come  
into play.  However, a further advantage of adding a Match-Align step  
is that it can make a multiple sequence alignment (beyond pairwise)  
given multiple superimposed structures.  Unfortunately 100 structures  
at a time would cause a combinatorial explosion.  I have used it to  
make a sequence alignment of <10 superimposed structures.

If these explanations and tips do not give what you are looking for,  
or interactive use is impractical given the number of runs you wish to  
perform, it should be python-scriptable, but we would need to know the  
series of operations you had in mind and precisely which observables  
you wanted to write.

I hope this helps,
Elaine
-----
Elaine C. Meng, Ph.D.                          meng at cgl.ucsf.edu
UCSF Computer Graphics Lab (Chimera team) and Babbitt Lab
Department of Pharmaceutical Chemistry
University of California, San Francisco
                      http://www.cgl.ucsf.edu/home/meng/index.html

On Aug 4, 2008, at 12:32 PM, nettles wrote:

> I have found MatchMaker in Chimera an incredibly useful alignment  
> tool. Thank you to the UCSF team!
>
> Now, I want to do do a mutliple alignment of many structures (~100)  
> off a single reference structure, and output a text record of the  
> residue pairings involved in the individual alignments along with  
> their RMSD.
>
> In this particular case, I'll be looking at only small sequence  
> variations of the same protein (with different drugs and cofactors).  
> Accordingly,  I would also like to output the RMSD of residue pairs  
> beyond the alignment cut for relatively easy  identification of  
> regions having maximal variations.
>
> Advise is appreciated,
> With regard,
> Jim
> ________________________________________________________
>
> James Nettles, Ph. D.  Assistant Professor
> Department of Pediatrics, Emory University School of Medicine
> ________________________________________________________
>