[Chimera-users] Restricting "select by conservation" to a subset of sequences in a multiple alignment
Elaine Meng
meng at cgl.ucsf.edu
Wed Feb 18 09:52:14 PST 2015
Hi Oliver,
The sum-of-pairs option does represent amino acid similarity, using the values in an amino acid similarity matrix of your choice. It's literally a sum of the values for the various combinations of pairs in the column. All AL2CO conservation values are then normalized so that positive always means more conserved and the value is in standard deviations from the mean (Z-score), regardless of which option is used. There is a full explanation in the Chimera docs and in the AL2CO paper.
<http://www.rbvi.ucsf.edu/chimera/docs/ContributedSoftware/multalignviewer/multalignviewer.html#mavpref-analysis>
AL2CO: calculation of positional conservation in a protein sequence alignment. Pei J, Grishin NV. Bioinformatics. 2001 Aug;17(8):700-12.
<http://www.ncbi.nlm.nih.gov/pubmed/11524371>
Best,
Elaine
----------
Elaine C. Meng, Ph.D.
UCSF Computer Graphics Lab (Chimera team) and Babbitt Lab
Department of Pharmaceutical Chemistry
University of California, San Francisco
On Feb 18, 2015, at 9:43 AM, Oliver Clarke <olibclarke at gmail.com> wrote:
> Hi Elaine,
>
> Many thanks for the detailed explanation - it certainly looks like there are some useful features here that I didn’t know about, and I will definitely take advantage of them in future.
>
> With regards to the conservation similarity features, I will read further, but I don’t quite know how to interpret the mavConservation attribute when conservation is calculated using the AL2CO method - it doesn’t seem to be a simple "percentage of sequences that are identical at this position” like it is otherwise - it does not range between 0 and 1, for example.
>
> What I would like, in addition to the present options, is have the option to take into account similarity, not just identity, when calculating mavConservation (or perhaps have a separate attribute, mavSimilar) - so that rather than mavConservation being the percentage of sequences that are identical at a given position, instead it would be the percentage of residues that are similar at a given position.
>
> Cheers,
> Oliver.
More information about the Chimera-users
mailing list