wiki:Architecture

Version 8 (modified by goddard, 17 years ago) ( diff )

--

Architecture and Implementation Issues

  1. Use numpy for vector, point, transform, bounding box, coordinate set data types. Our current home-brewed Vector, Point, and Xform classes are a significant obstacle to implementing geometric calculations. The Chimera volume data code is littered with conversions back and forth between these undernourished data types and the much more usable numpy arrays. TG
  2. Use SWIG to make Python wrappers for C++ code. Our current home-brewed WrapPy seems to require maintenance for every new Python version. WrapPy does not support numpy array arguments which are used throughout the volume data C++ code. SWIG is well-maintained and the industry standard and it does not make sense for us to duplicate that work. TG
  3. Don't auto-generate molecular data classes with OTF. Our current auto-generated molecule classes introduce unnecessary difficulty in maintaining and enhancing those data structures. Those classes should be directly hand-written. The OTF appears to be historical baggage. TG
  4. Many GUI features have no command equivalent. This impairs movie making and scripting to process large numbers of models. May want to establish a protocol that no new feature goes into a production release without a command interface? TG
  5. Separate _chimera C++ library into a molecule library and a 3d viewer library (Camera, viewer, side view, lights, opengl management, colors). More modularity will help code quality and faster code modification/testing cycle. TG
  6. Code in Python unless C++ is needed for speed. I think much Chimera C++ code (almost all) would be easier to enhance, maintain, and debug if it were instead written in Python. The only code that belongs in C++ is code that requires high speed (and can't be done adequately in numpy) or that is called by other C++ code. The (trivial) examples that brought this to mind are eigenMatrix(), RMSD_fillMatrix(), RMSD_matrix() in _chimera. The entire opengl viewer/camera framework currently in _chimera could probably be in Python. TG
  7. Molecular surface use is limited. Each atom can only have one molecular surface associated with it. For instance, making a surface for chain A, one for chain B and one for the atoms of chain A and B to examine buried surface area is not possible. Each atom can belong to just one "surface category" and each molecular surface is created from a category. It is also cumbersome to redefine the surface categories from the default ones (main, ligand, solvent). The constrained possibilities for molecular surfaces complicate simple problems like looking at electrostatic potential on the interface between two chains. Allowing multiple molecular surfaces would seem to be conceptually simple and more useful. Current implementation with surface color, opacity, and display attributes attached to atoms instead of the surface would need to be changed. Also current practice of using the same model id number for molecules and their surfaces would make referring to different surfaces in commands problematic. TG
  8. Many operations impose arbitrary requirements on how atoms are grouped. For example a sequence based structure alignment can be done on two separate "molecules", but not on two separate "chains" without first partitioning the chains into separate molecules objects. Fitting two molecules into a density map with distinct rigid motions can be done but two chains of a single molecule cannot be moved independently by the fit-in-map tool. In other cases two chains would be needed instead of two molecules. For instance, a single molecular surface can be computed bounding two chains, but not two molecules. These use restrictions are caused by implementation details such as molecules being positioned by transformation matrices while chains are positioned by individual atom coordinates, and also by graphical user interface design such as a menu that only allows choosing chains. TG
  9. Molecule data structures only know when chains are polymers or non-polymers by looking at bond connectivity. So an alpha carbon only structure has to have bonds between the alpha carbons so that it is recognized as a polymer, so a ribbon could be drawn. Would be better if the data structures knew which chains were linear polymers without relying on bonds. TG
  10. Aligning new models to the model with lowest id number is extremely confusing when it produces the wrong alignment. This confusion could be avoided by distinguishing global rotations/translations where all models are moved together from relative motions where some models are fixed. The implementation would keep a camera view transform that would include all global motions, while individual model transforms would only be used for relative motions. The global frame would then be defined by the camera view transform. The user will find it easier to grasp that moving some models with others fixed moves those in the global coordinate frame, while the fixed ones keep their original positions in the global frame. TG
  11. The Chimera developers depend on outside users to guide improvements in usability and utility. This is very inefficient compared to having the developers directly use the program themselves on real applications. A combination of outside and internal feedback is needed, but there is inadequate internal feedback. Demonstrations and actual use by Elaine Meng within the lab generates roughly as much feedback as all outside users combined. Likewise my use of volume / molecular assembly capabilities accounts for about half of the feedback driving those developments. We would benefit significantly by arranging that all developers are real users of the program. TG
  12. The selectable object system is based on a 3-level hierarchy (graphs, subgraphs, vertices/edges) that maps to molecules, residues, atoms/bonds. This system struggles to handle 2-level hierarchies of surfaces and surface pieces, and fails to handle >3 level hierarchies needed to handle for example quaternary structure and chains. TG
  13. The command language for naming data objects is limited to molecules, residues, and atoms. Referencing surfaces or defined pieces of surfaces is problematic or not possible. Even molecules may have identical names and id numbers so there is no unique identifier to distinguish them in commands. Also weird assumptions are made about the meaning of "sub-id" numbers. For instance the "findclash" command things of these as members of an nmr ensemble and does not detect clashes with those by default. The sym command uses sub-id numbers for symmetry copies where detecting clashes is of interest. Unique names for data objects are needed. TG
Note: See TracWiki for help on using the wiki.