[chimera-dev] PDB opening and coloring optimizations

Thu Aug 7 11:24:41 PDT 2003

Here are some tests I did to investigate speeding up opening large PDB
models and coloring large molecules.

       Tom

Optimizing PDB model opening and coloring
-----------------------------------------

Opening PDB file optimization
-----------------------------

  I tested making Chimera open PDB files and display the molecule
without creating Python atoms, bonds and residues.  The molecule
displayed about 4 times faster:

      Open 1jj2 (~100,000 atoms) current chimera:	26 seconds
      Open without Python atoms, bonds, residues:	6.5 seconds
      Open in PyMol:					3 seconds

These times are on Linux, P4 2.2 GHz, 1 Gb memory.  The Python atoms,
bonds, residues are created when they are needed, for example, if the
atoms are colored.  To avoid a freeze there could be an idle procedure
that creates the Python atoms, bonds and residues.

  This optimization requires the following code changes.  1) The
b-factor, occupancy and serial number from the PDB file are saved as
Python attributes.  These would need to be C++ attributes.  2) WrapPy
fetches all attributes and makes a reference to them, eg
molecule.atoms, molecule.bonds, ....  I commented out those lines that
would cause Python atoms, bonds and residues to be made.  WrapPy does
this to make sure objects aren't deleted prematurely.  In the case of
atoms, bonds and residues this is not a problem because the C++ object
keeps a reference to the Python object so the Python object cannot be
deleted until molecule.destroy() is called.  3) chimera/__init__.py
replaces bonds to metals with pseudo-bonds.  A C++ routine to find
metal atoms with bonds is needed to avoid creating all the Python
atoms and bonds.  4) Atom, bond and residue created triggers don't
happen when the molecule is opened with this optimization.  Some
Chimera features rely on the triggers happening immediately, for
example, surface categorizer wants to recategorize atoms whenever
bonds are created or deleted.  Like the metals complex code, this
would have to be done in C++ to avoid creating all Python atoms and
bonds immediately when a file is opened.  Also the bond triggers
should probably indicate when the bond is created, not just when the
Python bond is created.  So triggers should fire on C++ bond creation.
Currently the TrackChanges code keeps a set of the Python objects
instead of C++ objects so this would cause all Python bonds to be
created on file open.

  Changes 1), 2) and 3) are small and easy.  Change 4) is a more
serious problem that needs more consideration.  Perhaps the solution
for 4) is to keep the trigger and TrackChanges as is, but make surface
categorization an idle task.  It creates category names for menu
entries that could be added to menus whenever the computation
finishes.

Memory use for PDB models
-------------------------

  About 2/3 of memory used is C++ atoms/bonds and 1/3 is Python
atoms/bonds.

  	       	                              VSZ (Mb)        RSS (Mb)
     Chimera start-up				188		23
     1jj2, no Python atoms/bonds/residues	360 (grew 172)	167 (grew 144)
     1jj2 with Python atoms/bonds/residues	437 (grew 77)	243 (grew 76)

These Redhat 8.0 memory use numbers from unix command ps suggest most
of the memory is in the C++ data structures, rather than the Python
atoms/bonds.

Atom coloring optimization
--------------------------

  Here's a summary of some tests I did coloring atoms and bonds of
2btv (~50,000 atoms) a single color.  More details are in a chimera-dev
mailing list posting from June 2003 with subject coloring speed.

	Color atoms/bonds using Actions menu:	1.70 sec
	Breakdown:
	  Building display list:	.88 sec
	  Python color setting:		.66 sec
          Clearing track changes:	.11 sec

I tested alternate version of the Python color setting code:

Minimal for loop

   for a in m.atoms: a.color = color
   for b in m.bonds: b.color = color

took .52 seconds.  The Actions menu code is a little slower because the
color assignment is done with a function call.  Using setattr() is slower.
Using map() is slower.

Using C++ set_atom_color() function

   for a in m.atoms: set_atom_color(a, color)
   for b in m.bonds: set_bond_color(b, color)

took .27 seconds.  So attribute setting done by WrapPy may not be as fast
as it could be.

Using C++ routines

   color_atoms(atoms, color)
   color_bonds(bonds, color)

where atoms and bonds are lists took .10 second.  I believe this time is
all taken adding atoms/bonds to the TrackChanges stl::map<PyObject*>.

  I did similar test for the Python part of "color by element".

	Coloring atoms by element type:		3.5 sec
	Breakdown:
	  Build display list and TrackChanges:	1    sec
	  Python code to set colors:		2.44 sec

Here are times for various versions of the Python color setting code:

     Actions menu color by element:		2.44 sec
     Optimized python coloring:			1.11 sec
     C++ routine:				 .12 sec

The optimized Python uses a Python dictionary mapping element numbers
to Chimera color objects.  The current Actions menu version is slower
because it maps element numbers to color names, and fetches the color
object for each atom.  The C++ routine color_atoms_by_element(atom_list)
is a simple translation of the Python version.

If the display list and track changes overhead is reduced by a factor
of 10 (which I believe is not too hard), then moving some Python loops
to C++ will make a significant improvement in response time.  Besides
coloring this could help with atom/bond show/hide, representation changes,
large selections like chains or selecting all atoms.