[Chimera-users] Mol2 Trajectory Reader?
Eric Pettersen
pett at cgl.ucsf.edu
Tue Dec 21 18:18:51 PST 2004
On Dec 20, 2004, at 1:49 PM, S Joshua Swamidass wrote:
> How hard would it be to write an extension to chimera that read a
> multimolecule mol2 file into chimera as a trajectory? That would be
> really useful for me. Please let me know if you can do it or some
> advice on how to procede.
Hi Joshua,
I guess I'm curious as to what package puts out trajectories in Mol2
format...
The difficulty of adding a format is directly proportional to how fast
you need to have the trajectory read. Reading the entire trajectory at
startup in interpreted Python is not too difficult to code. Using a
C/C++ Python module to read the trajectory and/or reading the frames on
demand both increase the coding effort.
If you provided me with examples of Mol2-format trajectories, I could
probably have support for them done in about a month or possibly less,
given the various other demands on my time. Performance would be
similar to the multi-MODEL PDB trajectory case, since the entire
trajectory would be read in on startup, but using the C++ layer.
I've appended an outline of how to add a new trajectory format, in
case you want to give it a stab yourself...
Eric Pettersen
UCSF Computer Graphics Lab
pett at cgl.ucsf.edu
http://www.cgl.ucsf.edu
Adding a New Trajectory Format
----------------------------------------------
Movie actually uses the Trajectory module (chimera/share/Trajectory) to
read the various formats. Trajectory has a subdirectory named
"formats" that in turn has subdirectories for each supported format
(the subdirectories are Python modules). By convention the module name
for each format is the name of the format with the initial letter of
each word capitalized and all other letters lowercase (e.g. the MMTK
module's name is Mmtk).
A format's module is typically structured so that the code that
interfaces with Trajectory's generic format handling is in the
__init__.py file, and the code specific to supporting reading the
format's files is in another file -- usually named after the
format itself (e.g. Gromos.py).
__init__.py:
The __init__.py file needs to support the following things:
1) If the name of the format as displayed to the user is different
from the module name (which, due to capitalization, it usually is) then
there has to be a global variable named "formatName" that is
initialized to the display name of the format.
2) A class named ParamGUI needs to be defined that handles presenting
the file-loading interface for that format to the user. It must have
two methods:
2.1) __init__, which receives a Tkinter.Frame instance argument.
The __init__ method should populate the frame with widgets for
gathering the input information for the format from the user.
2.2) loadEnsemble, which takes as arguments a starting frame number,
ending frame number, and callback function. loadEnsemble needs to
compose a list of the arguments that were provided by the user to the
widgets defined in the __init__ function, and then call this module's
global loadEnsemble method (see below) with that list as the first
argument and the start/end frame number and callback as the remaining
three arguments.
3) A global loadEnsemble function that generates an ensemble instance
(discussed later). This function is not only called by the
ParamGUI.loadEnsemble method, but also when the user uses a "metafile"
to specify the input parameters. This function takes fours arguments:
a format-specific list of input parameters, a starting frame number,
ending frame number, and callback function to start the Movie
interface. This function should call the Movie-interface callback with
the generated ensemble as an argument. This function should also
remember the provided format-specific values as preferred defaults for
future uses of the format.
The code for a format's __init__.py file is very similar from format to
format. The easiest way to write your own is to grab another format's
__init__.py file and modify it. The __init__.py file for the Gromos
format is a good example since it uses multiple input files and and has
a non-file parameter as well, so it pretty much covers all the bases in
what you might need.
The format-specific .py file:
This file defines an "ensemble" class that gets instantiated from
__init__.py's loadEnsemble function. The ensemble class needs to
support the following methods:
1) An __init__ method that takes the format's input parameters and
start/end frames as arguments. The __init__ method may read input
files or do whatever is necessary to support the other instance methods
(i.e. call into a C/C++ module to read the files -- the Amber format
does this).
2) A GetDict method that takes a string argument. The string
specifies what data should be returned. The possible string values
are:
2.1) atomnames -- return a list of the atom names; a residue's
atoms must be consecutive
2.2) elements -- return a list of the atom elements. These should
be instances of chimera.Element (which can be initialized with a string
(e.g. "Fe") or a number). Trajectory's determineElementFromMass
function may be useful here if the format doesn't specify the atomic
number directly or it can't be easily determined from the atom name.
2.3) resnames -- return a list of the residue names
2.4) bonds -- return a list of "bonds": two-tuples of indices into
the atomnames list
2.5) ipres -- a list of the first atom of each residue (indices
into atomnames, but unlike previous indices these are 1-based, so the
first element of ipres will always be 1)
3) A __getitem__ method taking a frame-number argument (starting with
1): return a list of 3-tuples corresponding to the xyz coordinates of
the atoms in that frame (same order as atomnames). The coordinates
should be in angstroms.
4) A __len__ method that returns the total number of frames in the
trajectory (not just the number of frames between the user-specified
start/end frames).
More information about the Chimera-users
mailing list