wiki:Grant/emplan

Version 1 (modified by goddard, 15 years ago) ( diff )

--

Plans for Electron Microscopy and Molecular Assemblies

Tom Goddard
December 3, 2010.

Possible projects related to electron microscopy (EM) and molecular assemblies for RBVI next NCRR 5-year grant proposal to be submited in May 2011.

Will talk about 10 project ideas that fall in 2 broad categories:

  • Dissemination: Communicating Analysis Results and Analysis Methods
  • Technology: Advances in Visualization and Analysis Methods

Conflicting goals: Impact, Fun, Fundable

  • Dissemination projects will have higher impact than technology projects (short term and long term). In electron microscopy / molecular assembly research community technology is abundant, know-how and data is scarce.
  • It is more fun to work on technology projects.
  • Grant reviewers are more impressed by technology projects.

List of Project Ideas

Dissemination: Communicating Analysis Results and Analysis Methods

  • Enable researchers to publish their models, analysis and data on the web such that other researchers can build on their work (computationally useful data). Map symmetry, segmentations, geometric models (chromatin), operational file formats, EMDB/ViPERdb/PDB/CCDB collaboration, web 2.0, lab notebook, 3d browser display (WebGL).
  • Extend the community of Chimera developers. Programmer documentation, simpler APIs, training through collaborations and workshops.
  • Video documentation for Chimera usage, task oriented. Currently even advanced Chimera users know little.

Technology: Advances in Visualization and Analysis Methods

  • High performance computing: e.g. multi-threading, instancing, large atomic models, gpu computing, hdf5 files.
  • Continuous and direct mouse interaction (mouse modes): e.g. oblique or spherical map slicing, molecule normal modes, tiling volume slices, movement with clash detection.
  • Comparing large numbers of objects: e.g. conformations from BLAST pdb, interfaces between virus proteins, bacteria in termite gut, enzymes binding sites (SFLD), alternative fits of molecules in maps.
  • Coarse grain models.
  • Animation.
  • SAXS suite of tools.
  • High resolution (3-4 Angstrom) EM model building.

Web presentation of models, analysis and data

Enable users to easily create web pages showing computer readable 3d data and analysis and models and interactive 3d renderings. Establish and document operational file formats (hdf5) to represent volume symmetry, segmentations, coarse grain models, ..., that can be adopted by public databases EMDB, ViPERdb, PDB, CCDB.

Opportunity:

EM and molecular assemblies data and analysis is 95% lost -- only literature publication (pictures and words) of results. Computer readable results are not available except by personal request to the lab. Few EM maps, molecular models, symmetry parameters, sequence alignments, SAXS profiles, lists of interacting residues, ... are put into public archives. This stymies the whole research community effort to build computational understanding of molecular machines, cells, microbial communities. The build-up of computational knowledge from past decades of work is small at EM resolutions compared to what has been achieved at finer levels: proteins and sequences (PDB and seq databases).

Possible Products:

  • Multimedia notebook. Make Chimera able to export directly to HTML at user request. Can append to an html project work log scene images, spin animation, 3d model (WebGL), links to Chimera session, data files (PDB, map, sequence alignment, list of interface residues), measured values (distances, angles, RMSD, surface area, symmetry parameters...), user text notes. Output could be private record of data analysis for researcher, would aid creating journal manuscript, or can be made public -- web 2.0 style decentralized data publication.
  • File formats for public databases. EMDB/ViPERdb/PDB/CCDB can't produce the software that makes data easily accessible. This makes it hard for them to establish useful file formats. Analysis software should define the file formats. Our strong connections with the database sites (EMDB/ViPERdb/PDB/CCDB) can be used to promote adoption of our useful file formats for data and analysis results. Will require documenting and more careful design of map, segmentation, marker set file formats. An example where software has defined the format of analysis results is standard xray data quality report (reflections, completeness, R-factor).
  • Data exchange between software. Web data publication is most useful if many software packages can read the published files. So Chimera formats (hdf5 maps, segmentations, xml markers, symmetry operators, measurements) will have most value if other software developers add support to read those. We know many of the software developers in EM / molecular assemblies (EMAN/IMOD/Situs/Spider/BSOFT) and can collaborate to add support for our publication formats in those packages.
  • Coarse grain model schema and formats. There are not standards for representing coarse grain models, e.g. nuclear pore architecture (Frank Albers), chromatin folding (Davide Bau). Geometric models using spheres, ellipsoids, tubes, and hierarchy for different levels of detail. Andrej Sali wants to solve this. Need a publishable format. Chimera marker set XML can be enhanced. Computable coarse grain models shared on the web would be innovative.

Chimera Programmer Community

Extend the community of Chimera developers. Create programmer documentation, simpler APIs, training through collaborations and workshops.

Opportunity:

Chimera contains many libraries to analyze volume data, molecules and assemblies. This is the powerful toolkit I use day-to-day to quickly build new analysis capabilities for collaborators. Ability to write a page or two of Python code greatly extends the analysis capabilities of Chimera. Programming by users can multiply the value of our core Chimera libraries many-fold and extend their lifetime, and avoid others reimplementing the same capabilities (e.g. Gorgon, V3D, UROX).

Possible Products:

  • Documented well-designed programming interfaces to Chimera modules.
  • SciPy module for volume data.
  • Programming workshops and collaborator programming training.

Video Documentation

Screen-capture videos showing how to do common analysis tasks with Chimera. Currently even advanced users know little about Chimera.

Opportunity:

Most people who have used Chimera hundreds of times know only 1/4 of the capabilities they could productively use. I see this several times per week. (Yesterday's example Jiang Zhu, NIH, modeling proteins, uses Chimera for volumes, Grasp2 for multiple sequence alignments.) Video how-to documentation can greatly reduce the barrier to learning advanced Chimera techniques. Easy to follow, no missing steps, shows both how and what can be done.

Possible Products:

  • Video demonstrations showing perhaps 100 common Chimera work-flows. My current 12 videos range from 5 to 8 minutes and each includes basic, intermediate, and advanced techniques.
  • Document common analysis protocols. Guide to what types of analysis are sensible with given data types. Graduate student researchers often struggle with this.

High performance computing

High performance computing: e.g. multi-threading, instancing, large atomic models, gpu computing, hdf5 files.

Opportunity:

The most widely cited Chimera volume capability is fitting a molecule in a density map. Dozens of programs do this. Our unique advantage is the fit is done in one second. This allows trying many possibilities. One of the most common reasons verbally given for using Chimera for volume display is "It loads my very large map, and other programs choke". Analysis algorithm literature focuses almost entirely on quality of results, not speed. But in practice, so much goes wrong in analysis that speed to allow many alternate analysis attempts proves more important to whether high quality results are achieved. Where long-running calculations fail to find the right answer, many refined quick analysis tries with human inspection can often produce the right answer.

Possible Products:

  • Multi-threaded molecule in map fitting, allows global rotational search.
  • Graphical copies ("instancing") for fast and memory efficient display of multimeric assemblies and symmetric density maps.
  • Memory efficient molecules. This is our current bottleneck for analysis of molecular assemblies (2 Kbytes/atom).
  • OpenGL programs for 10x faster molecule display. Allows for example animating motions when examining molecule fitting alternatives (needs 30 frames per second, only possible on small systems now).
  • HDF5 map files containing subsampled copies of maps.
  • Virus map symmetry compression. 60x savings on 1 Gbyte maps.
  • Fourier coefficient map representation for fast interactive changes in displayed resolution.
  • Fast transparent surface rendering.
  • Fast morph calculation, for animating model comparisons without user separately computing morphs.
  • OpenCL GPU computing.

Interactive mouse modes

Continuous and direct mouse interaction (mouse modes).

Opportunity:

Continuous hand/eye interaction using mouse dragging is highly valuable in analysis. Most obvious example is rotating a model using a mouse drag. The advantage 30 frame/sec continuous hand control becomes very apparent when compared to only being able to change view direction with a typed command (as in some older software). Translating, zooming, volume contour level adjustment, rotamer bond rotation, clip plane positioning, hand fitting, volume cropping, molecular dynamics playback, volume morphing are all powerful data exploration methods in Chimera. Many more are not available in Chimera.

Possible Products:

  • Mouse mode interface (like paint program), that allows user to quickly learn about all continuous interaction modes.
  • Allow all continuous interaction using drags in graphics window instead of slider in a different window. Eliminates cumbersome window switches.
  • Context sensitive mouse dragging -- where you click in graphics window matters.
  • Segmenting volume data using drags to continuously grow regions.
  • Mouse wheel flipping volume planes.
  • Volume smoothing with degree of smoothing continuous control.
  • Spherical virus slices with continuous radius control.
  • Oblique slices of volumes with 3d volume shown for context.
  • Volume slice tiling (like Mac Expose for tiling windows).
  • Continuous atom zone distance variation.
  • Steric repulsion when hand fitting.
  • Real-time molecular dynamics on small regions during hand motion of ligand, or bond rotation.
  • Molecule normal animation with hand amplitude control.

Many Model Comparitive Analysis

Tools to compare large numbers of objects: e.g. conformations from BLAST pdb, interfaces between virus proteins, bacteria in termite gut, enzymes binding sites (SFLD), alternative fits of molecules in maps.

Opportunity:

Researchers often compares dozens of homologous structures, alternate map fits, segmented volume regions, binding interfaces. I commonly see Chimera user's with 10 - 30 open models. As biology research accumulates more models, analysis of many models becomes as important as the one-at-a-time analysis that Chimera, designed in a more data poor era, focuses on. Working with many models becomes too time consuming and tedious to be feasible without multi-model analysis tools. The Chimera View Dock tool is a successful example of multi-model analysis.

Possible Products:

  • List all unique binding interfaces in molecular assemblies, e.g. virus capsid. Animate motion to any interface with only contact residues shown.
  • Allow inspection of dozens of molecule in map fit alternatives, for example, animating from one to next with each scroll wheel click. Simultaneous text display of goodness-of-fit values.
  • Align and average together similar marked objects in EM tomography. This is becoming a prominent technique called subtomogram averaging.
  • Provide morphing capability between large sets of related structures ordered to show principle differences.
  • Show all BLAST PDB models with scroll-wheel to show each aligned to reference. And with background prefetch.

Coarse grain models

Opportunity:

No one has established even a simple common representation for models at lower than atomic resolution. A simple framework supporting geometric models: spheres, ellipsoids, tubes, connections, coloring, hierarchy, and exchange file format would allow sharing models of very interesting biology. For example, Davide Bau visited this week and showed chromatin model, 50 Kbases of DNA adopts unique shapes during transcription and when inactive (Nature Struct Biol, out next week). He used Chimera volume tracer. Sali lab IMP collaboration. Auer lab cellular structures collaboration. This may be a subproject of web data publication.

Possible Products:

  • Make a simple geometric model exchange format that Chimera, IMP, and possibly a few other pertinent programs will read and write.

Stereo Animation

Opportunity:

3d consumer computer displays and televisions appear to just be taking off. ESPN 3D offers 3d sports broadcasts. Stereo animation may be attractive for web data publication.

Possible Products:

  • Export video in stereo formats, YouTube, BluRay. Study if any web browser sequential stereo technology exists.
  • Provide continuous depth of field control in keyframe animation.

SAXS suite of tools

Opportunity:

SAXS data and model visualization I think is an uncolonized niche. Standard molecular viewers used with little specialized support. Would be possible to make Chimera the standard SAXS visualization software (similar to current our monopoly on single-particle EM volume display). Don't currently have experimental collaborators (Sali lab methods devel), but have talked with Alex Shkumatov in Dmitri Svergun lab -- world leader in SAXS computation analysis.

Possible Products:

  • Display volume envelopes derived from SAXS profiles using third party computation tools (Svergun lab).
  • Handle conformational ensemble fitting to SAXS profiles (Nick Ulyanov collaboration).
  • Optimize speed of existing SAXS profile calculation (OpenCL GPU computing?) for interactive exploration of SAXS fits.

High resolution EM modeling

High resolution (3-4 Angstrom) EM model building.

Opportunity:

Single particle EM maps in the 3 to 4 Angstrom resolution range for viruses are becoming common. This is another opportunity to monopolize software for an emerging subfield. I formerly thought existing low-resolution xray model building tools would be used for this data. It appears a new generation of model building software is needed. Matt Baker in collaboration with U. Washington computer science dept is developing Gorgon visualization and model building from scratch, for 2 years. It might be disruptive to compete with that project. Also it is a very hard problem, perhaps requiring more resources than we can give it. Gorgon is unlikely to succeed for lack of man-power.

Possible Products:

  • Build protein backbones in 3-4 Angstrom maps. Place side chains where possible.
  • Possibly some existing xray automated model building system could be integrated into Chimera. Needs research.
  • Methods to represent quality of fit. EM model validation is a very active research problem.

Do these ideas require replacing Chimera?

I favor large-scale incremental software changes, instead of starting over, but we have never made such changes (can't shake OTF, wrappy, Tk, fixed function OpenGL, extend atom specs to surfaces, memory efficient molecules, abstract models are molecules). Our practised accretion method with no major changes offers good stability for outside developers but we don't have many of those for other reasons.

Developing a next-generation Chimera 2 while maintaining Chimera 1 is the pattern we followed with the MidasPlus to Chimera transition, but it required more than 5 years to get the next-generation code into initial distribution. Chimera 2 could leverage much of the existing volume C++ code.

Note: See TracWiki for help on using the wiki.