Chimera2 Sessions
Background
From the user perspective, Chimera2 is an application with a suite of tools. In any given session, there may be some user data (e.g., molecules and volumes) and some active tools (e.g., Reply Log and Command Line). One of the required functionality of Chimera2 is to be able to save the current state of the session, and then resume the session in the same state at the future time. This sounds simple enough, but there are some details that need to be explicitly enumerated, as not everyone will agree what consists "the state of the session".
From the developer side, we can distinguish a couple different types of data. Some data are the same for all sessions; for example, start-up preferences and the list of installed tools. Some data are different for different sessions; for example, the list of _active_ tools and the currently opened data files. Let's call the former "initial data" and the latter "session data". Chimera2 "state of the session" consists only of "session data". This means that if the user (a) saves a session, and then (b) changes his start-up preference (e.g., background color) and it was not explicitly overridden in a session (e.g., by changing the background color), then (c) on resumption, the session will use the new background color. Clearly, this may be unexpected for the user. We will need to define what data normally defined by "initial data" that should always be saved as "session data" so that these surprises are minimized. (The answer is not "all of them" because clearly the list of installed tools may change, and additions and deletions cannot be ignored.)
Chimera2 data can considered as a three-layer organization: tools, managers, and raw data. The bottom layer is the raw data such as atomic coordinates and transformation matrices; raw data is typically read from a data source (e.g., data file or network connection) or defined by the user (e.g., orientation or location in space). These data are often used by more than one tool (particularly if we consider the core as a tool). To facilitate sharing, we introduce the manager layer. The managers provide two functions: (a) defining APIs for accessing data, and (b) providing control points for saving and restoring state. Each manager controls a non-overlapping chunk of raw data. The top layer is the tools. Each tool maintains its own private state as well as accessing and manipulating shared data through manager APIs. Both the manager and tool layers must implement an API for saving and restoring snapshots and user session files. (Snapshots are states saved in memory or temporary files, while user session files are saved for the user in (somewhat) readable form; snapshots may need to be stored in more compact form to minimize resource hogging.)
Steps in Saving a Session
The general idea is that each manager and tool is saved in several phases:
- The first phase (a) identifies data controlled by the manager or tool, and (b) prepares for returning identifiers for data references used by other managers and tools (for saving in user session files and recovering data references during restoration). The first phase explicitly ignores references to data not under manager or tool control.
- The second phase converts external data references into identifiers and associates the identifiers with internal data.
- The third phase combines the data from the first two phases into a single session-friendly object. This object is "the state" for the manager or tool.
- If the state needs to be written out into a user session file, then a fourth phase serializes "the state" into a form suitable for file storage.
The actual steps for saving a session are:
- Loop over all managers and call "save_phase1".
- Each manager should identify all the raw data under its control that need to be saved.
- Each manager should prepare for returning session-friendly references to manager-controlled data. For example, the atomic structure manager should construct whatever data structures needed so that it can return simple identifiers (like integers or strings) for data that other managers may references (e.g., a collection of atoms). Note that a manager does not need to explicitly assign identifiers to all the data it controls; it just needs to be ready to quickly return such identifiers when queried. It will also need to keep track of the reference identifiers given out because it will need to be able to convert the identifiers back to data references during session restore.
- References to data controlled by other managers are ignored in this pass.
- Loop over all managers and call "save_phase2".
- Each manager should identify all raw data controlled by another manager and query for identifiers for those data.
- During phase 2, each manager that was queried for identifiers need to keep track of any data structure needed for back-converting from identifier to data reference. These data structures will need to be saved as part of the state.
- Repeat step 1 for tools.
- Like managers, tools will have some data they control (private tool state).
- Repeat step 2 for tools.
- Like managers, tools will have references to shared data (identifiers obtained by querying managers).
- Loop over all managers and tools and call "save_phase3".
- Each manager or tool should convert all data it controls, all data reference identifiers and all auxiliary data for back-converting identifiers to data references into a compact form and return the compacted data as "the state".
- The compacted data will also need to include the manager or tool that created the data and a version number. To facilitate restoration, this must be done the same way for all managers and tools. (The method can change between user session file versions, but must be the same for any given version.)
- The compacted data may be used as part of snapshots.
- Loop over all managers and tools and call "save_serialize".
- The return value from step 5 is the single argument.
- The manager or tool converts the compacted data from step 5 into serialized form suitable for saving in a user session file and return the serialized data.
- Write out the user session file.
- Write out session header, including version number and magic number. This step may be evolve over time, which is why we have a version number written into the user session file. For example, we might add a checksum, or encrypt the file file. The global version number is used for determining how to read the rest of the file on restoring. Note that there are separate version numbers for each manager and tool that determine how to interpret their serialized data.
- Write out all serialized manager or tool data. Each chunk of data should be marked so that they can be read in independently and processed during restoration. The order should not matter because we will restore in phases.
- Write out all tool data. It probably should not matter whether tools and manager data are interspersed in to user session file, but it seems neater to segregate them.
- Write out session epilog if necessary.
Steps in Restoring a Session
- Read session header.
- Make sure magic number matches.
- Identify session version number. This determines how we read the rest of the file (e.g., verifying checksum, decrypting, etc.). See 7a above.
- Read serialized data chunks.
- Identify the data creator and version. See 5b above.
- Call "restore_phase1" of creator and pass in serialized data and version number.
- Each creator should parse the serialized data and restore all data under its control. Identifiers to data from another manager should be cached for processing in a subsequent phase.
- Keep track of the managers and tools created.
- For all created managers and tools, call "restore_phase2".
- Each manager or tool will convert identifiers back into data references by querying the controlling manager.
Example of Session Saving and Restoring
The steps required in saving and restoring user session files is best illustrated in an example. For this example, we will assume the following:
- There is one model open and it has four metal coordination bonds.
- In the tool layer, there are two tools: the Chimera2 core and a distance measurement tool. Two distances are being measured.
- In the manager layer, there are two managers: the atomic structure manager and the pseudobond manager.
- The atomic structure manager keeps track of molecules, residues, atoms, transformation matrices, etc. The pseudobond manager keeps track of a collection of atom pairs.
Save Session
- Loop over all managers and call "save_phase1".
- Pseudobond manager "save_phase1" is called.
- Pseudobond manager records that the are two pseudobond groups: the metal coordination group and the distance measurement group. There are 4 bonds in the metal coordination group, or 8 atoms. There are 2 distances, or 4 atoms. The two groups are marked for saving; the 4+8 atoms are marked for later processing.
- The two groups already have unique identifiers, so we do not need to do anything more.
- Atomic structure manager "save_phase1" is called.
- Atomic structure manager records that one molecule needs to be saved. There are no external data references.
- A model map is prepared, with the key being the model id and the value being the model itself. The model map will be saved as part of the state. Empty atom, residue and chain maps are made. They are filled in when a query for an identifier for an instance of one of those types is made. These maps will also be saved as part of the state.
- Pseudobond manager "save_phase1" is called.
- Loop over all managers and call "save_phase2".
- Pseudobond manager "save_phase2" is called.
- Pseudobond manager queries the atomic structure manager (which controls atoms and atom collections) for identifiers for the 4+8 atoms in the two pseudobond groups. The atom identifiers are saved with the corresponding pseudobond groups.
- Atomic structure manager "save_phase2" is called".
- Nothing happens. There are no external references in atomic structure manager data.
- Pseudobond manager "save_phase2" is called.
- Repeat step 1 for tools.
- Core "save_phase1" is called.
- Not sure what the "private state" for core is yet XXX.
- Distance measurement tool "save_phase1" is called.
- Information about each distance measurement is saved (e.g., whether the distance is displayed graphically). If there are atom references in the tool private state, they are marked for later processing.
- Core "save_phase1" is called.
- Repeat step 2 for tools.
- Core "save_phase2" is called.
- Not sure what the "private state" for core is yet XXX.
- Distance measurement tool "save_phase2" is called.
- If there are atom references in the tool private state, they are converted to identifiers by querying the atomic structure manager. The identifiers are stored with the distance measurements.
- Core "save_phase2" is called.
- Loop over all managers and tools and call "save_phase3".
- Pseudobond manager "save_phase3" is called.
- The data is stored as a list of pseudobond groups. Each pseudobond group is stored as a tuple with the group name and a list of atom identifiers. No compaction is done.
- The "state" for the pseudobond manager is a 3-tuple of (pseudobond manager identifier, current pbm version, and data from previous step).
- Atomic structure manager "save_phase3" is called.
- The atomic data is converted to a list of molecules. The molecule map is converted to a dictionary. The atom map is the other non-empty map because the pseudobond manager and distance measurement tool queried for identifiers in steps 2 and 4. The atom map is converted to a dictionary.
- The "state" for the atomic structure manager is a 3-tuple of (atomic structure manager identifier, current asm version, and data and maps from previous step).
- Core "save_phase3" is called.
- Not sure what conversion is needed yet XXX.
- Distance measurement tool "save_phase3" is called.
- The distance measurement data is stored as a list. Each distance measurement is stored as a tuple.
- The "state" for the distance measurement tool is a 3-tuple of (distance measurement tool identifier, current dmt version, and tuple from previous step).
- Pseudobond manager "save_phase3" is called.
- Loop over all managers and tools and call "save_serialize".
- Pseudobond manager "save_serialize" is called.
- Data from step 5a is converted to JSON.
- Atomic structure manager "save_serialize" is called.
- Data from step 5b is converted to JSON.
- Core "save_serialize" is called.
- Data from step 5c is converted to JSON.
- Distance measurement tool "save_serialize" is called.
- Data from step 5d is converted to JSON.
- Pseudobond manager "save_serialize" is called.
- Write out the user session file.
- Write header.
- Write pseudobond manager serialized data.
- Write atomic structure manager serialized data.
- Write core serialized data.
- Write distance measurement tool serialized data.
- Loop over all managers and tools and call "save_finish".
- Pseudobond manager "save_serialize" is called.
- Nothing happens.
- Atomic structure manager "save_serialize" is called.
- Molecule, atom, residue, etc. maps are deleted.
- Core "save_serialize" is called.
- Not sure what needs to happen here XXX.
- Distance measurement tool "save_serialize" is called.
- Nothing happens.
- Pseudobond manager "save_serialize" is called.
Restore Session
- Read session header. 1a. Assume that magic number and version match.
- Read serialized data chunks. 2a. Read pseudobond manager data and call "restore_phase1".
2a1. Two pseudobond groups are created but atom collections are not initialized yet. The atom identifiers are saved for later processing.
2b. Read atomic structure manager data and call "restore_phase1".
2b1. One molecule is created. Is the metal coordination pseudobond group created yet XXX?
2c. Read core data and call "restore_phase1". 2d. Read distance measurement tool data and call "restore_phase1".
2d1. Two distance measurements are created but atom collections (if used) are not created yet.
- For all created managers and tools, call "restore_phase2". 3a. Call pseudobond manager "restore_phase2".
3a1. Atom collections are initialized from atom identifiers by querying atomic structure manager.
3b. Call atomic structure manager "restore_phase2".
3b1. Nothing happens because no external references are present.
3c. Call core "restore_phase2".
3c1. Not sure what happens here XXX.
3d. Call distance measurement tool "restore_phase2".
3d1. Atom collections (if used) are initialized from atom identifiers by querying atomic structure manager.