Changes between Initial Version and Version 1 of Chimera2/ArrayVsObject


Ignore:
Timestamp:
Feb 27, 2012, 11:34:33 AM (14 years ago)
Author:
Conrad Huang
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Chimera2/ArrayVsObject

    v1 v1  
     1== Investigate Instancing and Parallel Arrays Complexity ==
     2
     3Two important features of the Chimera 2 architecture are possible support for instancing and using parallel array for storage.
     4
     5Instancing allows the sharing of "basic" data among multiple instances of objects (eg multimers made using the same monomer coordinates).  The problem with instancing is finding the right split between "component" (shared) data and "instance" (unshared) data.  For example, where do atom color data get stored for a multimer?  The answer determines whether each multimer may be colored by a different set of atomic colors.  (I use atomic rather than molecular color as the example because there are many more atomic colors than molecular colors.)  Another issue with instance attributes is that they must be available to support the ability to add arbitrary data (eg via defattr).  There is also the added complexity of the interface, both programming for the developer and usage for the end user.  The API is the simpler problem since programmers (especially those familiar with object oriented programming) are probably more used to thinking of instances vs. copies.  Explaining to users why certain (instance) attributes may be changed with impunity while others (component attributes) have much wider effect may be more than we want to attempt.
     6
     7Using parallel arrays for in-memory storage addresses two problems.  First, many of our operations (eg match and align) require collecting data from multiple items into parallel arrays so that they may be efficiently processed by "hardened" packages such as NumPy.  Second, array storage, particularly of basic types such as 3-tuples of real numbers for coordinates, generally has smaller memory footprints than storing the data in individual objects.  The trade-off is that programmers find dealing with objects and attributes simpler than objects, references and indices to arrays.  The extra references and indices may also counteract any memory savings from using arrays in the first place.
     8
     9We want to get quantitative and qualitative measures on the complexity and efficiency of using instancing and parallel arrays in a Python application.  To test this, we propose to build programs for manipulating graphical objects such as spheres, cylinders, triangular meshes and groups.
     10 - "component" attributes of the graphics objects are their geometric properties (eg coordinates and radii) and
     11 - "instance" attributes are other types of information (eg color and rendering style).
     12If any of the programs we build turns out to be useful, it can be the basis for further development such as displaying the graphical objects in different front ends.
     13
     14We propose to:
     15
     161. Build two programs that handle only component data, one using parallel arrays and the other not.  We can then measure the memory footprint for both programs as well as the speed to access and update data (particularly for large data sets).  This will provide a basic measure of the ceiling of benefits from using parallel arrays.
     17
     182. Design APIs for adding instance attributes on top of these programs.  We will document these APIs and discuss with members of the Chimera team.
     19
     203. We will implement each approved API and again measure memory usage and speed.
     21
     22The test data for measurement will be randomly generated for several different sizes, consisting of different graphical object composition.  For component attributes, we will use coordinates and radii only.  For instance attributes we will use colors.
     23
     24The results will be reported in four weeks.