[chimera-dev] Help with "looping through PDB IDs" script

Navya Shilpa Josyula njosyu2 at uic.edu
Thu Apr 17 14:11:13 PDT 2014


Hi,

Sorry to bother you again. With your help, I was able to get areaSAS and
Bfactor values for the atoms in each .pdb1 file showing corresponding PDB
IDs in the 1st column. I really thank you for this!

Now I am trying to write CASTp information for each of my proteins into a
separate file. As you suggested in earlier email, the processCastpID
function is in the gui.py file but not in __init__.py file. Hope I am not
missing anything here. As per my understanding, this function is fetching
the 4 castp files of which I would require only ".poc" and ".pocInfo"
files. From these two files I want to write the data of only atoms list,
pocID and MS_Volume data into a single file for all 400 proteins in my
dataset. Is there a link or any script available for such requirement?

Again, as mentioned in my last email, since my output files will be huge in
size, will I be able to write my files directly to a database table in SQL
server?

Kindly help me with this.

Thanks and regards,
Navya


On Wed, Apr 16, 2014 at 11:43 PM, Navya Shilpa Josyula <njosyu2 at uic.edu>wrote:

> Hi Eric and Elaine,
>
> Thank you so much for your valuable suggestions and time. I am trying out
> the scripts as per your advice. I have a slight change here. Instead of
> looking at the PDB IDs, I am looking at the ".pdb1" files. These are the
> biological units of the PDB files.
>
> So I have a folder with about 400 ".pdb1" files. For each structure, I am
> trying to write atomic areaSAS and atomic Bfactor values. So far I have
> updated my script as attached in this mail based on your inputs. I am able
> to get these values for the ".pdb1" file it is reading in the end, but not
> for all the proteins. Also, is there a way it writes out the PDB ID in the
> 1st column along with the corresponding values in the next columns?
>
> I am aiming to get all these values into a single file, which leads me to
> another issue of file size. So is there a way python can be connected to
> SQL Server and to write these values for each ".pdb1" structure directly
> into the SQL database?
>
> Kindly advise on this. I really appreciate your help with regards to my
> issues.
>
> Thanks in advance,
> Navya
>
>
>
>
>
>
>
>
>
> On Wed, Apr 16, 2014 at 7:41 PM, Eric Pettersen <pett at cgl.ucsf.edu> wrote:
>
>> Hi Navya,
>> This later mail of yours provides some additional details that help.  So
>> for one thing you might also want to look at the Programmer's Examples (
>> Examples<http://www.cgl.ucsf.edu/chimera/docs/ProgrammersGuide/Examples/index.html>),
>> particularly the first one ("Chimera's Object Model").  At any rate, the
>> obvious problem with the script you have so far is that you haven't defined
>> the 'residues' variable.  Let's say you are trying to work with residues
>> 50.A, 55.A, and 70.A.  Here's a little code snippet that will get those
>> residues into the 'residues' variable once you've opened your structure:
>>
>> rc("sel :50.a:55.a:70.a")
>> from chimera.selection import currentResidues
>> residues = currentResidues()
>>
>> Now, I don't know where you're storing your lists of residues.  Are they
>> in the same file as the IDs?  A separate file?  I don't know if you need
>> more help with that or not.
>>
>>  So once you have the residues, then printing the atomic surface areas
>> and bfactors is:
>>
>> for r in residues:
>> for a in r.atoms:
>>  print>>outf, a, a.areaSAS, a.bfactor
>>
>> Getting the CASTp information is another whole can of worms however,
>> since Chimera stores that info in the CASTp dialog rather than with the
>> atoms and residues.  Nonetheless, once you've got the rest of your script
>> working you could try to do that part.  In would involve copying the
>> processCastpID function from CASTp/__init__.py (in your Chimera
>> installation's 'share' folder) and deleting the last two lines and instead
>> return the cavity list.  Each CastpCavity instance has 'mouthInfo' and
>> 'pocketInfo' attributes which are dictionaries that have an 'atoms' key.
>>  The value for the 'atoms' key is a chimera.selection.ItemizedSelection
>> instance.  You can use the contains() method of those ItemizedSelections to
>> see if a particular atom is in the ItemizedSelection.  The pocketInfo
>> dictionary also has 'SA volume' and 'MS volume' keys you can use to get the
>> desired volume info.
>>
>> --Eric
>>
>>                         Eric Pettersen
>>                         UCSF Computer Graphics Lab
>>                         http://www.cgl.ucsf.edu
>>
>> On Apr 16, 2014, at 12:44 PM, Navya Shilpa Josyula <njosyu2 at uic.edu>
>> wrote:
>>
>> Hi,
>>
>> I need a help in chimera scripting. I have text file with list of PDB IDs
>> and corresponding residue list. I am trying to write a chimera command
>> which will scan this file, open each PDB ID and select the corresponding
>> residues listed in the file. After selecting, it should write out the
>> values of atomic areaSAS, atomic Bfactor and which atom belong to CASTp
>> identified pockets (with pocket volume). I know I am asking a lot here but
>> I am new to chimera scripting and I searched the users mailing list and
>> found the "looping" script,
>>
>> http://www.cgl.ucsf.edu/pipermail/chimera-users/2012-February/007281.html
>>
>> but it is not working for me. Attached is the script I have till now.
>>
>> Please help me on this.
>>
>> Thank you in advance,
>> Navya
>> <test.py>
>>
>> import os
>> from chimera import runCommand as rc # use 'rc' as shorthand for
>> runCommand
>> from chimera import replyobj # for emitting status messages
>>
>> # change to folder with data files
>> os.chdir("C:/Users/Navya Shilpa/Desktop/thesis/All_Proteins/nonred")
>>
>> # open file of PDB IDs
>> f = open("C:/Users/Navya
>> Shilpa/Desktop/thesis/All_Proteins/nonred/nr-list.txt", 'r')
>>
>> # loop through the IDs, opening, processing, and closing each in turn
>> for line in f:
>> pdbID = line.strip()
>> replyobj.status("Processing " + pdbID) # show what PDB we're working on
>>  rc("open " + pdbID)
>> rc("surf") # surface receptor
>> outf = open(pdbID, "w")
>>  for r in residues:
>> print>>outf, r, r.areaSAS
>> outf.close()
>>  rc("close all")
>> # uncommenting the line below will cause Chimera to exit when the script
>> is done
>> #rc("stop now")
>> # note that indentation is significant in Python; the fact that
>> # the above command is exdented means that it is executed after
>> # the loop completes, whereas the indented commands that
>> # preceded it are executed as part of the loop.
>>
>>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://plato.cgl.ucsf.edu/pipermail/chimera-dev/attachments/20140417/0e6ce510/attachment.html>


More information about the Chimera-dev mailing list