[Chimera-users] Analysis of the multiple alignments + structures

James Starlight jmsstarlight at gmail.com
Tue Sep 2 02:12:34 PDT 2014


Hi Elaine,


thanks alot for such detailed suggestions!

In my case the situation might be much simplier because I'd like to i) make
conclusions about structure-function relationships in the set of olfactory
receptors proteins which are
-) all rhodopsin-like (A) GPCRs sharing 40-70% sequence identify  -) have
the same functional-relevant motifs  -) interact only with one Golf type of
transdusser  -) !! but !! could interact with big range of odorants
(ligands). Taking into account that TM bundle of those proteins has big
conservation in comparison to its extracellular (expecially loop) part I
should to focus on the extracellular (less conservative) part which are
mainly involved in the ligand binding (! because I'd like to predict
mutations which affect ligand binding first of all). BTW taking into
account very hight level of the redundancy in GPCRs sequence (within the
rhodopsin-like subfamily less that 40% of homology on sequence) but hight
conservation of the structure of all there proteins (rmsd <4 A) might the
conclusion about high tollerance to the mutations of these proteins towards
the mutations (affecting function) be made (the phenomen which is called
like buffering in evolutional biology)?

Some teqnichal question: is it possible in Chimera (using mut align plugin
for instance)  to redunce number of sequences ussed in the  conservation
diagrams as well as in the in multiple alignments diagrams (for inctance to
hide some sequences from the all set which will not been used)?  E.g:
Previously I've had to make new aligmnet.fasta files with different number
of sequences and load it each time in the Chimera when I need to compare
different number of sequences. But in fact I'd like to do it within one
open Chimera session with the set of maximum number of sequences hidding
the unused sequenses.



Thanks alot for help,

James


2014-09-02 0:07 GMT+04:00 Elaine Meng <meng at cgl.ucsf.edu>:

> Hi James,
> That’s a really broad question… usually that’s why people are interested
> in showing conservation on some structure: to highlight the important
> residues, and in combination with structural analysis and/or mutagenesis,
> to suggest their roles in structure and function.  I don’t have specific
> papers in mind, but there should be no shortage if you try a few keyword
> searches.
>
> There is a bit more discussion in my page on “sources of sequence
> alignments” as to choosing the set of sequences, or source (web database or
> web server) of a set of sequences for calculating conservation.
> <http://www.cgl.ucsf.edu/home/meng/sources.html>
> (That page is also linked to the “mapping sequence conservation” tutorial.)
>
> However, your question is really about making judgement calls as they
> pertain to your specific research project, which may be beyond the scope of
> this Chimera list, and further, I don’t claim to be an authority on the
> subject.  Logic would dictate that if you are looking for residues
> important in function X, then you would want to include only sequences of
> proteins that perform function X, but if you are looking for patterns of
> conservation among related functions X,Y,Z in some set of homologous
> proteins, you would use a broader set.    Maybe it is known what percent
> IDs give the proper boundaries in your situation, but more often it is not
> known.  By boundaries I mean how much variability can be in your set
> without including some proteins that don’t have the function of interest.
>
> The percent ID filtering you mention is a little bit different - that
> filtering would remove sequences from the set that are similar to another
> within the specified cutoff, but could be considered separately from the
> issue of how broadly variable the set is in the first place (for example,
> only class A GPCRs, only amine neurotransmitter-binding GPCRs, or only
> Gs-stimulating GPCRs).
> In other words, a data set could be big for at least two different
> reasons: (1) it’s broad, (2) it’s redundant. For redundancy-filtering, I’d
> personally only filter down to get an alignment that is not too big for
> Chimera, and err on the side of including lots of sequences as long as the
> overall range of the set is appropriate for the function of interest.
> There are  sequence-weighting options in the AL2CO conservation scoring in
> Chimera that may help to mitigate any under- and over-representation of
> subsets of the sequences.
>
> Disclaimer:  this is just my opinion as someone who has dabbled in the
> area, and others with more experience may have divergent views or better
> ideas of how to attack the problem!
>
> I hope this helps,
> Elaine
> -----
> Elaine C. Meng, Ph.D.
> UCSF Computer Graphics Lab (Chimera team) and Babbitt Lab
> Department of Pharmaceutical Chemistry
> University of California, San Francisco
>
> On Sep 1, 2014, at 2:55 AM, James Starlight <jmsstarlight at gmail.com>
> wrote:
>
> > Elaine,
> >
> > I have not fully finished your tutorial but now I have one question:
> have you seen some intresting papers covering my problem: to make
> prediction of the functional properties of some amino acid (motifs) based
> on the analysis of the 3D structure of the protein under interest together
> with the analysis of sequences of closely related homologues?
> >
> > In particular I'm interested of how much sequences should I include to
> the MSA and what threshold for the seq identity (agains my target protein)
> should be chosen. E.g In case where I deal with the set of G-protein
> coupled receptors (which has low sequence similarity but hight structure
> conservation): I've obtained 2 different pictures of the conservative a.a
> motifs in cases where i've used i) only several templates with low sequence
> (40%) identity VS ii) where I have used alot of sequenses with begger
> identity (up to 60%). In the latter cases I've obtained much bigger
> conservation in the motifs seen based on the analysis of SS( which is
> trivial!) where in the i) case- there were only several highly conservative
> motifs. Does it means that the analysis of BIG datasets with bigger
> sequence identity produce bigger unsertaintly in the final results because
> we can conclude about what conservative elements are *really* functional
> importnat?
> >
> > James
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://plato.cgl.ucsf.edu/pipermail/chimera-users/attachments/20140902/a53da77a/attachment.html>


More information about the Chimera-users mailing list