[Chimera-users] automating ensemble clustering in chimera

Wed Oct 15 14:17:08 PDT 2008

Hello Conrad,
Thank you so much for your email and your code. It worked well for 20
structures, but when testing on 400 structures it caused an automatic
crash and reboot of the machine.
I was actually looking for a way to import the Ensemble Cluster
definition into a python script and then run the script from the chimera
GUI. It's been working well for MatchMaker:

----------------------------------------------------------------------
#!/usr/local/MGLTools/i86Linux2/bin/python

import sys
import fileinput
import os
import string
import shutil
import glob

import chimera
from chimera import runCommand
import Midas
from MatchMaker import match, CP_BEST, defaultMatrix, \
                       defaultAlignAlgorithm, defaultGapOpen, \
                       defaultGapExtend, defaultIterateCutoff

temp1 = 'template1.pdb'
temp2 = 'template2.pdb'
curdir = os.getcwd()

pdblist = glob.glob('*.T*pdb')

for x in pdblist:
   mol1 = chimera.openModels.open(temp1, type='PDB')[0]
   mol2 = chimera.openModels.open(temp2, type='PDB')[0]
   mol3 = chimera.openModels.open(x, type='PDB')[0]
   mol4 = chimera.openModels.open(x, type='PDB')[0]

   match(CP_BEST, (mol1, [mol3]), defaultMatrix,
           defaultAlignAlgorithm, defaultGapOpen, defaultGapExtend,
           iterate=defaultIterateCutoff, showAlignment=True)[0]

   match(CP_BEST, (mol2, [mol4]), defaultMatrix,
           defaultAlignAlgorithm, defaultGapOpen, defaultGapExtend,
           iterate=defaultIterateCutoff, showAlignment=True)[0]

   Midas.close(0)
   Midas.close(1)
--------------------------------------------------------------------------------------------

 Thanks,
 Katryna

> On Wed, Oct 15, 2008 at 3:00 PM, Conrad Huang <conrad at cgl.ucsf.edu> wrote:
>> Unfortunately, the nmr clustering code is closely tied to its graphical user
>> interface, so there is no simple calls into Chimera for scripting.  However,
>> the function that does all the work is relatively simple, so I've abstracted
>> it into its own function which takes a list of models (all with exactly the
>> same set of atoms) and returns a list of pairs. The first element of the
>> pair is a cluster representative model; the second element is a list of all
>> models in the cluster, including the representative.  I've included the code
>> as an attachment.  There is a small example at the end of the code on how to
>> use the "cluster" function.  Please let me know if this is what you were
>> looking for.  Thanks.
>>
>> Conrad
>>
>> Katryna Cisek wrote:
>>>
>>> Dear Chimera Users,
>>>
>>> How can I automate/import the Ensemble Cluster (Structure Comparison,
>>> MD/Ensemble Analysis; cluster members of a conformational ensemble,
>>> determine cluster representatives) module into a python script?
>>>
>>> Thanks,
>>> Katryna
>>> _______________________________________________
>>> Chimera-users mailing list
>>> Chimera-users at cgl.ucsf.edu
>>> http://www.cgl.ucsf.edu/mailman/listinfo/chimera-users
>>
>>
>> def clusterModels(modelList):
>>        from EnsembleMatch.distmat import DistanceMatrix
>>        from chimera import match
>>        # First we need to construct a full distance matrix, including
>>        # possibly identical elements
>>        fulldm = DistanceMatrix(len(modelList))
>>        sameAs = {}
>>        atoms = [ match._coordArray(m.sortedAtoms()) for m in modelList ]
>>        for i in range(len(modelList)):
>>                mi = modelList[i]
>>                if mi in sameAs:
>>                        continue
>>                ai = atoms[i]
>>                for j in range(i + 1, len(modelList)):
>>                        aj = atoms[j]
>>                        m = match.Match(ai, aj)
>>                        if m.rms <= 0:
>>                                # Detect identical elements and track them
>>                                mj = modelList[j]
>>                                sameAs[mj] = mi
>>                        fulldm.set(i, j, m.rms)
>>        # Now we reduce the distance matrix by removing identical elements
>>        if not sameAs:
>>                # No identical elements, just use the whole thing
>>                dm = fulldm
>>                models = modelList
>>        else:
>>                models = []
>>                indexMap = []
>>                for i, mi in enumerate(modelList):
>>                        if mi in sameAs:
>>                                # Skip identical element
>>                                continue
>>                        models.append(mi)
>>                        indexMap.append(i)
>>                dm = DistanceMatrix(len(models))
>>                for i in range(len(models)):
>>                        im = indexMap[i]
>>                        for j in range(i + 1, len(models)):
>>                                jm = indexMap[j]
>>                                dm.set(i, j, fulldm.get(im, jm))
>>        # Next we run the clustering algorithm
>>        from EnsembleMatch.nmrclust import NMRClust
>>        nmrc = NMRClust(dm)
>>        # Next reformat the results into something useful.
>>        # "clusterInfo" is a list of whose elements are 2-tuples of
>>        # (cluster representative model, list of models in cluster)
>>        clusterInfo = []
>>        for cid, c in enumerate(nmrc.clusters):
>>                members = c.members()
>>                mList = []
>>                for member in members:
>>                        m = models[member]
>>                        m.clusterId = cid
>>                        m.clusterRep = 0
>>                        mList.append(m)
>>                rep = nmrc.representative(c)
>>                m = models[rep]
>>                m.clusterRep = len(members)
>>                clusterInfo.append((m, mList))
>>        for mj, mi in sameAs.iteritems():
>>                mj.clusterId = mi.clusterId
>>                mj.clusterRep = 0
>>                rep, members = clusterInfo[mi.clusterId]
>>                rep.clusterRep += 1
>>                members.append(mj)
>>        return clusterInfo
>>
>> # Here's an example on how you can use the function above.
>> # Run this script with:
>> #       chimera --nogui --silent cluster.py
>> # where cluster.py is the name of this file
>> if __name__ == "chimeraOpenSandbox":
>>        import chimera
>>        # Use following line to fetch from RCSB
>>        models = chimera.openModels.open("1n9u", type="PDBID")
>>        # Use following line for your own data file
>>        # models = chimera.openModels.open("1N9U.pdb", type="PDB")
>>        ci = clusterModels(models)
>>        print len(models), "models ->", len(ci), "clusters"
>>        for rep, members in ci:
>>                print rep.oslIdent(), "(%d member)" % len(members)
>>                for m in members:
>>                        print '\t', m.oslIdent()
>>
>>
>