[Chimera-users] MD analysis : get clusters

Francois Berenger berenger at riken.jp
Mon Nov 5 18:23:18 PST 2012


On 11/06/2012 09:50 AM, Eric Pettersen wrote:
> On Nov 4, 2012, at 7:26 AM, Benjamin SCHWARZ wrote:
>
>> Thanks Elaine,
>>
>>    According to the paper the clustering is performed with average
>> linkage, with some tricky method to determine the number of clusters.
>>
>>
>> A few suggestions for the enhancement of the clustering functionality
>> in Chimera :
>>  - Allow the user to save the cluster information. For instance by
>> saving clustered frames in separate files, or by saving the index of
>> frames accompanied of their cluster index in a coma separated file.
>
> Hi Ben,
> Tonight's daily build will have a "Save" button on the clustering
> dialog.  The saved file will have one cluster per line with the
> representative frame number listed first, followed by the other frame
> numbers of that cluster.
>
>>  - Since the clustering scheme is a linkage, it could be good to show
>> the dendogram and let the user play with it to determine its own
>> cutof. The most expensive part being the computation of the pairwise
>> distance matrix, once it is done, it might be possible to let the user
>> interactively choose a cutof distance, or a number of clusters he
>> desires and see what happens.

Hi Ben, ;)

There are even ways to accelerate the initialisation of this distance 
matrix, if the distance being used is a metric.

http://bioinformatics.oxfordjournals.org/content/27/7/939

But, the applicability of these kind of methods depends on the 
clustering algorithm being used.
Some algorithms would allow the matrix to be initialised "lazily", some 
would not.
My intuition tells me that in hierarchical clustering, complete
and single linkage can be accelerated by a quite similar technique.
Average linkage cannot.

I played quite a lot with some software for this,
usually after 20k PDBs, a desktop workstation has not
enough memory to handle the distance matrix.

Regards,
Francois.

> That would be nice.  I'll open a enhancement-request ticket in our bug
> database with you on the recipient list, so you'll be notified if/when
> we get to it.
>
> --Eric
>
>                          Eric Pettersen
>                          UCSF Computer Graphics Lab
> http://www.cgl.ucsf.edu
>
>>
>>    --Ben
>>
>>> Hi Ben,
>>> I can answer parts of the question. The clustering is a
>>> reimplementation of what is described in this paper:
>>> An automated approach for clustering an ensemble of NMR-derived
>>> protein structures into conformationally related subfamilies. Kelley
>>> LA, Gardner SP, Sutcliffe MJ. Protein Eng. 1996 Nov;9(11):1063-5.
>>> <http://www.ncbi.nlm.nih.gov/pubmed/8961360>
>>>
>>> This reference is given in the Ensemble Cluster docs:
>>> <http://www.cgl.ucsf.edu/chimera/docs/ContributedSoftware/ensemblecluster/ensemblecluster.html>
>>> (and the MD Movie clustering docs link to this page; from MD Movie
>>> tool, click Help button and go to the clustering section of the
>>> resulting page)
>>>
>>> Clustering is not available as a Chimera command.
>>>
>>> Chimera capabilities can be accessed with python scripts, but I will
>>> have to leave any details on that, as well as how to save results
>>> (other than saving your Chimera session), for the others to provide.
>>>
>>> I hope this helps,
>>> Elaine
>>> -----
>>> Elaine C. Meng, Ph.D.
>>> UCSF Computer Graphics Lab (Chimera team) and Babbitt Lab
>>> Department of Pharmaceutical Chemistry
>>> University of California, San Francisco
>>>
>>> On Nov 2, 2012, at 7:46 AM, Benjamin SCHWARZ wrote:
>>>
>>>> Hi list,
>>>>
>>>> I clustered a bunch of structures using the MD clustering tool, I am
>>>> very happy with the result but I can't find how to download/save the
>>>> result.
>>>>
>>>> Is there a way for instance to copy-paste the columns indicating the
>>>> number of models as well as the index of the representative for each
>>>> cluster; alternatively, is possible to sort out frames by clusters ?
>>>>
>>>> If there are command shortcuts to manipulate the clustering, I'd be
>>>> happy to get them, or to get a link to it in the manual. Chimera is
>>>> really a great tool, but the learning curve appears really steep ;)
>>>>
>>>> Ultimately, I wonder about the type of clustering used here. I
>>>> suspect a complete linkage but I couldn't find the confirmation in
>>>> the manual. Any info on that ?
>>>>
>>>>  Thanks a lot for your extremely good work, and the fast answers on
>>>> this mailing list
>>>>
>>>>    --Ben
>>>>
>>>
>>
>> ---
>>  Benjamin SCHWARZ
>>  Email : schwarz at igbmc.fr <mailto:schwarz at igbmc.fr>
>>  Voice : +33 (0)3 68 85 47 30
>>  FAX : +33 (0)3 68 85 47 18
>>
>> Biocomputing group -- Integrated Structural Biology -- IGBMC
>> 1 rue Laurent Fries, BP 10142
>> F - 67404 Illkirch CEDEX
>> FRANCE
>>
>>
>> _______________________________________________
>> Chimera-users mailing list
>> Chimera-users at cgl.ucsf.edu <mailto:Chimera-users at cgl.ucsf.edu>
>> http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users
>
>
>
>
> _______________________________________________
> Chimera-users mailing list
> Chimera-users at cgl.ucsf.edu
> http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users
>




More information about the Chimera-users mailing list