[chimera-dev] Help with "looping through PDB IDs" script

Navya Shilpa Josyula njosyu2 at uic.edu
Wed Apr 16 21:43:15 PDT 2014


Hi Eric and Elaine,

Thank you so much for your valuable suggestions and time. I am trying out
the scripts as per your advice. I have a slight change here. Instead of
looking at the PDB IDs, I am looking at the ".pdb1" files. These are the
biological units of the PDB files.

So I have a folder with about 400 ".pdb1" files. For each structure, I am
trying to write atomic areaSAS and atomic Bfactor values. So far I have
updated my script as attached in this mail based on your inputs. I am able
to get these values for the ".pdb1" file it is reading in the end, but not
for all the proteins. Also, is there a way it writes out the PDB ID in the
1st column along with the corresponding values in the next columns?

I am aiming to get all these values into a single file, which leads me to
another issue of file size. So is there a way python can be connected to
SQL Server and to write these values for each ".pdb1" structure directly
into the SQL database?

Kindly advise on this. I really appreciate your help with regards to my
issues.

Thanks in advance,
Navya









On Wed, Apr 16, 2014 at 7:41 PM, Eric Pettersen <pett at cgl.ucsf.edu> wrote:

> Hi Navya,
> This later mail of yours provides some additional details that help.  So
> for one thing you might also want to look at the Programmer's Examples (
> Examples<http://www.cgl.ucsf.edu/chimera/docs/ProgrammersGuide/Examples/index.html>),
> particularly the first one ("Chimera's Object Model").  At any rate, the
> obvious problem with the script you have so far is that you haven't defined
> the 'residues' variable.  Let's say you are trying to work with residues
> 50.A, 55.A, and 70.A.  Here's a little code snippet that will get those
> residues into the 'residues' variable once you've opened your structure:
>
> rc("sel :50.a:55.a:70.a")
> from chimera.selection import currentResidues
> residues = currentResidues()
>
> Now, I don't know where you're storing your lists of residues.  Are they
> in the same file as the IDs?  A separate file?  I don't know if you need
> more help with that or not.
>
>  So once you have the residues, then printing the atomic surface areas
> and bfactors is:
>
> for r in residues:
> for a in r.atoms:
>  print>>outf, a, a.areaSAS, a.bfactor
>
> Getting the CASTp information is another whole can of worms however, since
> Chimera stores that info in the CASTp dialog rather than with the atoms and
> residues.  Nonetheless, once you've got the rest of your script working you
> could try to do that part.  In would involve copying the processCastpID
> function from CASTp/__init__.py (in your Chimera installation's 'share'
> folder) and deleting the last two lines and instead return the cavity list.
>  Each CastpCavity instance has 'mouthInfo' and 'pocketInfo' attributes
> which are dictionaries that have an 'atoms' key.  The value for the 'atoms'
> key is a chimera.selection.ItemizedSelection instance.  You can use the
> contains() method of those ItemizedSelections to see if a particular atom
> is in the ItemizedSelection.  The pocketInfo dictionary also has 'SA
> volume' and 'MS volume' keys you can use to get the desired volume info.
>
> --Eric
>
>                         Eric Pettersen
>                         UCSF Computer Graphics Lab
>                         http://www.cgl.ucsf.edu
>
> On Apr 16, 2014, at 12:44 PM, Navya Shilpa Josyula <njosyu2 at uic.edu>
> wrote:
>
> Hi,
>
> I need a help in chimera scripting. I have text file with list of PDB IDs
> and corresponding residue list. I am trying to write a chimera command
> which will scan this file, open each PDB ID and select the corresponding
> residues listed in the file. After selecting, it should write out the
> values of atomic areaSAS, atomic Bfactor and which atom belong to CASTp
> identified pockets (with pocket volume). I know I am asking a lot here but
> I am new to chimera scripting and I searched the users mailing list and
> found the "looping" script,
>
> http://www.cgl.ucsf.edu/pipermail/chimera-users/2012-February/007281.html
>
> but it is not working for me. Attached is the script I have till now.
>
> Please help me on this.
>
> Thank you in advance,
> Navya
> <test.py>
>
> import os
> from chimera import runCommand as rc # use 'rc' as shorthand for runCommand
> from chimera import replyobj # for emitting status messages
>
> # change to folder with data files
> os.chdir("C:/Users/Navya Shilpa/Desktop/thesis/All_Proteins/nonred")
>
> # open file of PDB IDs
> f = open("C:/Users/Navya
> Shilpa/Desktop/thesis/All_Proteins/nonred/nr-list.txt", 'r')
>
> # loop through the IDs, opening, processing, and closing each in turn
> for line in f:
> pdbID = line.strip()
> replyobj.status("Processing " + pdbID) # show what PDB we're working on
>  rc("open " + pdbID)
> rc("surf") # surface receptor
> outf = open(pdbID, "w")
>  for r in residues:
> print>>outf, r, r.areaSAS
> outf.close()
>  rc("close all")
> # uncommenting the line below will cause Chimera to exit when the script
> is done
> #rc("stop now")
> # note that indentation is significant in Python; the fact that
> # the above command is exdented means that it is executed after
> # the loop completes, whereas the indented commands that
> # preceded it are executed as part of the loop.
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://plato.cgl.ucsf.edu/pipermail/chimera-dev/attachments/20140416/86f9ccaa/attachment.html>
-------------- next part --------------
import os
from chimera import openModels, Molecule
from chimera import runCommand as rc # use 'rc' as shorthand for runCommand
from chimera import replyobj # for emitting status messages
from chimera import selection

# change to folder with data files
os.chdir("C:/Users/Navya Shilpa/Desktop/thesis/All_Proteins/nr-bilogicalunits/working")

# gather the names of .pdb files in the folder
file_names = [fn for fn in os.listdir(".") if fn.endswith(".pdb1")]

# loop through the files, opening, processing, and closing each in turn
for fn in file_names:
        replyobj.status("Processing " + fn) 
        rc("open " + fn)
        rc("delete solvent")
        rc("select")
        rc("surf") # surface receptor
        outfn = open("testing.csv", "w")        
        residues = selection.currentResidues()
        for m in openModels.list(modelTypes=[Molecule]):
            for r in m.residues:
                for a in r.atoms:
                        try:
                                sas=a.areaSAS
                                bf=a.bfactor
                        except AttributeError:
                                continue        
                        print>>outfn, a, sas, bf        
        outfn.close()           
        rc("close all")


More information about the Chimera-dev mailing list