wiki:Session2

Version 8 (modified by Greg Couch, 10 years ago) ( diff )

--

Chimera2 Sessions -- Take 2

Background

From the user perspective, Chimera2 is an application with a suite of tools. In any given session, there may be some user data (e.g., molecules and volumes) and some active tool GUIs (e.g., Reply Log and Command Line). One of the required functionality of Chimera2 is to be able to save the current state of the session, and then resume the session in the same state at the future time. This sounds simple enough, but there are some details that need to be explicitly enumerated, as not everyone will agree what consists "the state of the session".

From the developer side, we can distinguish a couple different types of data. Some data are the same for all sessions; for example, start-up preferences and the list of installed tools. Some data are different for different sessions; for example, the list of active tool GUIs and the currently opened data files. Let's call the former "initial data" and the latter "session data". Chimera2's "state of the session" consists only of "session data". This means that if the user (a) saves a session, and then (b) changes his start-up preference (e.g., background color) and it was not explicitly overridden in a session (e.g., by changing the background color), then (c) on resumption, the session will use the new background color. Clearly, this may be unexpected for the user. We will need to define what data normally defined by "initial data" that should always be saved as "session data" so that these surprises are minimized. (The answer is not "all of them" because clearly the list of installed tools may change, and additions and deletions cannot be ignored.)

(Chimera2 data can be categorized as raw data, GUI data, and organizational data. Expound.)

What is a Tool?

The word tool is used in two different ways in Chimera2:

  1. A chunk of functionality (code) from the tool shed
  2. GUIs that the user can start via a menu

In this document, tool, by itself, refers to the code.

Data Types

All session data is simple, i.e., int, float, string, list, dict, etc. or an instance of a class provided by a tool.

Protocols

For discussion purposes, any method we choose for session files will need to follow the following protocols (the solution will have more details):

Saving protocol:

  1. visit all session state and record:
    • what tool is responsible for the state
    • what data it refers to and
      • only referred to data needs to be named
  2. save the list of used tools
    • need tool's state version, not tool version
  3. serialize state
    • keep track of named data that was saved
  4. confirm that all referred to data was saved
    • if not, then the session is incomplete/corrupt

Restoring protocol:

  1. Read the list of used tools
    • if installed tools are insufficient, then give user option to cancel or load tools
  2. deserialze state inside block/unblock of triggers

The Problem

The problem is how to serialize and deserialize the session state. Ideally, when restoring a session, all data that is referenced by some other data has already been restored. Therefore all of dependencies need to be known by the saving code and there can be no circular dependencies. This avoids two-phase initialization, which is especially important for C++ objects.

Where is Session State?

All session state is accessible via a Session object. The session object has a registry of which of its attributes contain savable state and conform to the State API. Attributes that may contain nested data, and that data can use the same State API.

Examples of State

Session attributes:

  • open models
  • open tool GUIs
  • running tasks
  • scenes
  • user colors
  • user colormaps
  • view (camera, lighting, clipping, etc.)
  • selections (TODO)
  • tool non-GUI data

Nested data:

  • Tool GUI data
  • molecules: atoms/bonds/residues & graphical state
  • pseudobonds
  • surfaces
  • generic 3D graphics (e.g., STL)
  • atom/bond annotations

The non-GUI state of a tool should be kept in a session attribute or model rather than an instance of the tool's GUI class.

Issues

  • Can we provide a simple state API?
    • just hand off object
  • How to give dependencies for ordering session data?
    • naive data with back references are circular
    • may need to give before and after dependencies
    • how to handle dependencies between models?
  • What is the granularity?
  • Can we guarantee non-circular references?

Possible Solution

Two competing solutions:

  1. allow circular references via two-phase initialization
    • like Python's pickle
  2. Specify dependencies and save/restore in the right order

Example

  • molecular data needs to be saved early
  • molecular data is in a model
  • session gets a model from its list of models
  • a tool may provide a new model that depends on molecular data and tool non-GUI state
  • so some models will need to saved before others
  • some tool non-GUI state may need to saved before associated model
  • does that mean that tool non-GUI state is saved first?
    • note that the tool GUI state is saved separately

How the data is organized in the session does not match the order in which the data needs to be saved.

The Solution

To allow the data to be serialized in session order instead of dependent order, allow for circular dependencies. That mandates using two-phase initialization when deserializing a session.

We really, really don't want to use two-phase initialization for C++ objects. So don't. As long as the C++ objects are restored all together (i.e., atomic structures and pseudobonds), we can avoid two-phase initialization for them. If the C++ objects refer to Python objects, that part would need to follow a two-phase protocol.

On the Python side, it is possible to provide an API that takes simple data and implements all of the individual save and restore steps.

Avoiding Two-phase Initialization

If a directed graph of the data dependences has no cycles, then it can be serialized in an order such that when it is deserialized, all of the referenced data will exist before it is needed. That requirement, we assert, holds for anything in the Chimera2 core. And we can detect when it is violated, and report to the user what the offending toolor tools are.

Saving Protocol

  1. Discovery
    • All state is reachable from the session object
    • The initial set objects to save in the session are the session's registered state attributes
    • Objects to save are examined for referred to non-simple objects, and those objects are added to the set of savable objects, until all objects are examined
      • These are the objects that need to be named symbolically
      • Simple objects are ints, lists, dicts, etc.
    • While examining savable objects:
      • dependency graph is constructed
      • the set of needed tools is constructed
    • Detect "all" cycles, so user can take corrective action if there are cycles
  2. Serializing
    • The list of needed tools and their session state versions
    • The data in sorted order

Restoring Protocol

  1. Get list of needed tools
    • If installed tools are insufficient, allow user to install them
      • If not installed, don't restore data from those tools, might fail
  2. Clear main session
  3. Restore into main session while blocking triggers

Tool Classes

  • All classes exported by tools need to have a 'tool_info' attribute, a ToolInfo instance so the session code knows which tool is needed to reconstruct it
  • ToolInfo needs the State version in addition to the tool version
  • When registering a tool's commands, it needs the ToolInfo instance, just like starting a tool's GUI to be able set the the 'tool_info' attribute if there is no GUI
  • The tool's module needs a 'get_class' function to return the class associated with a name
    • for security and allows renaming
  • Tool classes have an alternate initializer if attributes are not simple (like pickle)
Note: See TracWiki for help on using the wiki.