[chimera-dev] PDB file parser failing to identify chain IDs

Mon Jul 20 06:30:41 PDT 2009

Hi,

I'm working with PDB files, opening them with:

chimera.openModels.open(fileHandler, 'PDB').

Later on when I use the chainId on the residues of the molecule (accessed by
residue.id.chainId). The problem is that the PDB format does not require the
chaind id to be defined. When that is the case, all residues have id ' ',
which is rather annoying to distinguish the chains to which they belong.

Now, bringing up problems only is never really appreciated ;), so my
colleague Aurelien Grosdidier and I tried to think of some solutions to
split the residues into chains when loading a model from a PDB file that
does not contain any chain ID.

1) Split on TER. Chains are usually separated by a TER line, which could
thus be used to determine when to create a new chain. However, like the
chain id information, the TER line is optional in the PDB format...

2) Compute an atom to atom distance matrix for each two consecutive
residues. Since the typical distance between two atoms is known, we can
compute if two residues reasonably belong to the same chain. The
computations can easily be pruned when two residues belong to the same
chain, which is the most frequent case.

3) Compute an atom to atom distance matrix and use it to determine which
atoms belong to the same chain. This is computationnally more expensive than
2), but it must be done somewhere in Chimera to compute the bonds between
the atoms that are rendered graphically. Any chance we can access this
information (I did not find how)?

Any comment on those solutions? Any other solutions? Any help will be
appreciated!

Thanks,

Sebastien
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://plato.cgl.ucsf.edu/pipermail/chimera-dev/attachments/20090720/0b86e56b/attachment.html>