[Chimera-users] Mol2 Trajectory Reader?

Tue Dec 21 18:18:51 PST 2004

On Dec 20, 2004, at 1:49 PM, S Joshua Swamidass wrote:

> How hard would it be to write an extension to chimera that read a
> multimolecule mol2 file into chimera as a trajectory? That would be
> really useful for me. Please let me know if you can do it or some
> advice on how to procede.

Hi Joshua,
	I guess I'm curious as to what package puts out trajectories in Mol2 
format...
	The difficulty of adding a format is directly proportional to how fast 
you need to have the trajectory read.  Reading the entire trajectory at 
startup in interpreted Python is not too difficult to code.  Using a 
C/C++ Python module to read the trajectory and/or reading the frames on 
demand both increase the coding effort.
	If you provided me with examples of Mol2-format trajectories, I could 
probably have support for them done in about a month or possibly less, 
given the various other demands on my time.  Performance would be 
similar to the multi-MODEL PDB trajectory case, since the entire 
trajectory would be read in on startup, but using the C++ layer.
	I've appended an outline of how to add a new trajectory format, in 
case you want to give it a stab yourself...

                         Eric Pettersen
                         UCSF Computer Graphics Lab
                         pett at cgl.ucsf.edu
                         http://www.cgl.ucsf.edu

Adding a New Trajectory Format
----------------------------------------------

Movie actually uses the Trajectory module (chimera/share/Trajectory) to 
read the various formats.  Trajectory has a subdirectory named 
"formats" that in turn has subdirectories for each supported format 
(the subdirectories are Python modules).  By convention the module name 
for each format is the name of the format with the initial letter of 
each word capitalized and all other letters lowercase (e.g. the MMTK 
module's name is Mmtk).

A format's module is typically structured so that the code that 
interfaces with Trajectory's generic format handling is in the 
__init__.py file, and the code specific to supporting reading the 
format's files is in another file -- usually named after the
format itself (e.g. Gromos.py).

       __init__.py:

The __init__.py file needs to support the following things:

1)  If the name of the format as displayed to the user is different 
from the module name (which, due to capitalization, it usually is) then 
there has to be a global variable named "formatName" that is 
initialized to the display name of the format.

2)  A class named ParamGUI needs to be defined that handles presenting 
the file-loading interface for that format to the user.  It must have 
two methods:

   2.1)  __init__, which receives a Tkinter.Frame instance argument.  
The __init__ method should populate the frame with widgets for 
gathering the input information for the format from the user.

   2.2)  loadEnsemble, which takes as arguments a starting frame number, 
ending frame number, and callback function.  loadEnsemble needs to 
compose a list of the arguments that were provided by the user to the 
widgets defined in the __init__ function, and then call this module's 
global loadEnsemble method (see below) with that list as the first 
argument and the start/end frame number and callback as the remaining 
three arguments.

3)  A global loadEnsemble function that generates an ensemble instance 
(discussed later).  This function is not only called by the 
ParamGUI.loadEnsemble method, but also when the user uses a "metafile" 
to specify the input parameters.  This function takes fours arguments:  
a format-specific list of input parameters, a starting frame number, 
ending frame number, and callback function to start the Movie 
interface.  This function should call the Movie-interface callback with 
the generated ensemble as an argument.  This function should also 
remember the provided format-specific values as preferred defaults for 
future uses of the format.

The code for a format's __init__.py file is very similar from format to 
format.  The easiest way to write your own is to grab another format's 
__init__.py file and modify it.  The __init__.py file for the Gromos 
format is a good example since it uses multiple input files and and has 
a non-file parameter as well, so it pretty much covers all the bases in 
what you might need.

       The format-specific .py file:

This file defines an "ensemble" class that gets instantiated from 
__init__.py's loadEnsemble function.  The ensemble class needs to 
support the following methods:

1)  An __init__ method that takes the format's input parameters and 
start/end frames as arguments.  The __init__ method may read input 
files or do whatever is necessary to support the other instance methods 
(i.e. call into a C/C++ module to read the files -- the Amber format 
does this).

2)  A GetDict method that takes a string argument.  The string 
specifies what data should be returned.  The possible string values 
are:

   2.1)  atomnames -- return a list of the atom names;  a residue's 
atoms must be consecutive
   2.2)  elements -- return a list of the atom elements.  These should 
be instances of chimera.Element (which can be initialized with a string 
(e.g. "Fe") or a number).  Trajectory's determineElementFromMass 
function may be useful here if the format doesn't specify the atomic 
number directly or it can't be easily determined from the atom name.
   2.3)  resnames -- return a list of the residue names
   2.4)  bonds -- return a list of "bonds":  two-tuples of indices into 
the atomnames list
   2.5)  ipres --  a list of the first atom of each residue (indices 
into atomnames, but unlike previous indices these are 1-based, so the 
first element of ipres will always be 1)

3)  A __getitem__ method taking a frame-number argument (starting with 
1):  return a list of 3-tuples corresponding to the xyz coordinates of 
the atoms in that frame (same order as atomnames).  The coordinates 
should be in angstroms.

4)  A __len__ method that returns the total number of frames in the 
trajectory (not just the number of frames between the user-specified 
start/end frames).