Changes between Initial Version and Version 1 of Grant/emplan


Ignore:
Timestamp:
Mar 8, 2011, 1:50:41 PM (15 years ago)
Author:
goddard
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Grant/emplan

    v1 v1  
     1{{{
     2#!html
     3
     4<h1>
     5Plans for Electron Microscopy and Molecular Assemblies
     6</h1>
     7
     8<p>
     9Tom Goddard<br>
     10December 3, 2010.
     11</p>
     12
     13<p>
     14Possible projects related to electron microscopy (EM) and molecular assemblies
     15for RBVI next NCRR 5-year grant proposal to be submited in May 2011.
     16</p>
     17
     18<p>
     19Will talk about 10 project ideas that fall in 2 broad categories:
     20</p>
     21
     22<ul>
     23<li>Dissemination: Communicating Analysis Results and Analysis Methods
     24<li>Technology: Advances in Visualization and Analysis Methods
     25</ul>
     26
     27<h2>
     28Conflicting goals: Impact, Fun, Fundable
     29</h2>
     30
     31<ul>
     32<li>Dissemination projects will have higher impact than technology projects
     33(short term and long term).
     34In electron microscopy / molecular assembly research community
     35technology is abundant, know-how and data is scarce.
     36<li>It is more fun to work on technology projects.
     37<li>Grant reviewers are more impressed by technology projects.
     38</ul>
     39
     40<h2>
     41List of Project Ideas
     42</h2>
     43
     44<p><b>
     45Dissemination: Communicating Analysis Results and Analysis Methods
     46</b></p>
     47
     48<ul>
     49<li>Enable researchers to publish their models, analysis and data on the web
     50such that other researchers can build on their work (computationally useful data).
     51Map symmetry, segmentations, geometric models (chromatin), operational file
     52formats, EMDB/ViPERdb/PDB/CCDB collaboration, web 2.0, lab notebook, 3d browser
     53display (WebGL).
     54<li>Extend the community of Chimera developers. Programmer documentation,
     55simpler APIs, training through collaborations and workshops.
     56<li>Video documentation for Chimera usage, task oriented. Currently even
     57advanced Chimera users know little.
     58</ul>
     59
     60<p><b>
     61Technology: Advances in Visualization and Analysis Methods
     62</b></p>
     63
     64<ul>
     65<li>High performance computing: e.g. multi-threading, instancing,
     66large atomic models, gpu computing, hdf5 files.
     67<li>Continuous and direct mouse interaction (mouse modes):
     68e.g. oblique or spherical map slicing, molecule normal modes,
     69tiling volume slices, movement with clash detection.
     70<li>Comparing large numbers of objects: e.g. conformations from BLAST pdb,
     71interfaces between virus proteins, bacteria in termite gut, enzymes binding sites
     72(SFLD), alternative fits of molecules in maps.
     73</ul>
     74
     75<ul>
     76<li>Coarse grain models.
     77<li>Animation.
     78<li>SAXS suite of tools.
     79<li>High resolution (3-4 Angstrom) EM model building.
     80</ul>
     81
     82<h2>Web presentation of models, analysis and data</h2>
     83
     84<p>
     85Enable users to easily create web pages showing computer readable 3d data
     86and analysis and models and interactive 3d renderings.
     87Establish and document operational file formats
     88(hdf5) to represent volume symmetry, segmentations, coarse grain models, ...,
     89that can be adopted by public databases EMDB, ViPERdb, PDB, CCDB.
     90</p>
     91
     92<p><b>Opportunity</b>:</p>
     93
     94<p>
     95EM and molecular assemblies data and analysis is
     9695% lost -- only literature publication (pictures and words) of
     97results.  Computer readable results are not available except by
     98personal request to the lab.  Few EM maps, molecular models, symmetry
     99parameters, sequence alignments, SAXS profiles, lists of interacting
     100residues, ... are put into public archives.  This stymies the whole
     101research community effort to build computational understanding of
     102molecular machines, cells, microbial communities.  The build-up of
     103computational knowledge from past decades of work is small at EM
     104resolutions compared to what has been achieved at finer levels:
     105proteins and sequences (PDB and seq databases).
     106</p>
     107
     108<p><b>Possible Products</b>:</p>
     109
     110<ul>
     111<li><b>Multimedia notebook.</b>  Make Chimera able to export directly
     112to HTML at user request.  Can append to an html project work log scene
     113images, spin animation, 3d model (WebGL), links to Chimera session,
     114data files (PDB, map, sequence alignment, list of interface residues),
     115measured values (distances, angles, RMSD, surface area, symmetry
     116parameters...), user text notes.  Output could be private record of
     117data analysis for researcher, would aid creating journal manuscript,
     118or can be made public -- web 2.0 style decentralized data publication.
     119
     120<li><b>File formats for public databases.</b> EMDB/ViPERdb/PDB/CCDB
     121can't produce the software that makes data easily accessible. This
     122makes it hard for them to establish useful file formats. Analysis
     123software should define the file formats. Our strong connections with
     124the database sites (EMDB/ViPERdb/PDB/CCDB) can be used to promote
     125adoption of our useful file formats for data and analysis results.
     126Will require documenting and more careful design of map, segmentation,
     127marker set file formats.  An example where software has defined the
     128format of analysis results is standard xray data quality report
     129(reflections, completeness, R-factor).
     130
     131<li><b>Data exchange between software.</b>
     132Web data publication is most useful if many software packages can read
     133the published files.  So Chimera formats (hdf5 maps, segmentations,
     134xml markers, symmetry operators, measurements) will have most value if
     135other software developers add support to read those.  We know many of
     136the software developers in EM / molecular assemblies
     137(EMAN/IMOD/Situs/Spider/BSOFT) and can collaborate to add support for our
     138publication formats in those packages.
     139
     140<li><b>Coarse grain model schema and formats.</b>  There are not
     141standards for representing coarse grain models, e.g. nuclear pore
     142architecture (Frank Albers), chromatin folding (Davide Bau).
     143Geometric models using spheres, ellipsoids, tubes, and hierarchy for
     144different levels of detail.  Andrej Sali wants to solve this.  Need a
     145publishable format.  Chimera marker set XML can be enhanced.
     146Computable coarse grain models shared on the web would be innovative.
     147
     148</ul>
     149
     150<h2>Chimera Programmer Community</h2>
     151
     152<p>
     153Extend the community of Chimera developers.
     154Create programmer documentation,
     155simpler APIs, training through collaborations and workshops.
     156</p>
     157
     158<p><b>Opportunity</b>:</p>
     159<p>
     160Chimera contains many libraries to analyze volume data, molecules and
     161assemblies. This is the powerful toolkit I use day-to-day to quickly build
     162new analysis capabilities for collaborators.  Ability to write a page or
     163two of Python code greatly extends the analysis capabilities of Chimera.
     164Programming by users can multiply the value of our core Chimera
     165libraries many-fold and extend their lifetime, and avoid others reimplementing
     166the same capabilities (e.g. Gorgon, V3D, UROX).
     167</p>
     168
     169<p><b>Possible Products</b>:</p>
     170
     171<ul>
     172<li>Documented well-designed programming interfaces to Chimera modules.
     173<li>SciPy module for volume data.
     174<li>Programming workshops and collaborator programming training.
     175</ul>
     176
     177<h2>Video Documentation</h2>
     178
     179<p>
     180Screen-capture videos showing how to do common analysis tasks with Chimera.
     181Currently even advanced users know little about Chimera.
     182</p>
     183
     184<p><b>Opportunity</b>:</p>
     185<p>Most people who have used Chimera hundreds of
     186times know only 1/4 of the capabilities they could productively use. I
     187see this several times per week.  (Yesterday's example Jiang Zhu, NIH,
     188modeling proteins, uses Chimera for volumes, Grasp2 for multiple
     189sequence alignments.)  Video how-to documentation can greatly reduce
     190the barrier to learning advanced Chimera techniques. Easy to follow,
     191no missing steps, shows both how and what can be done.
     192</p>
     193
     194<p><b>Possible Products</b>:</p>
     195<ul>
     196<li>Video demonstrations showing perhaps 100 common Chimera work-flows.
     197My current 12 videos range from 5 to 8 minutes and each includes basic,
     198intermediate, and advanced techniques.
     199<li>Document common analysis protocols.  Guide to what types of analysis are
     200sensible with given data types.  Graduate student researchers often struggle
     201with this.
     202</ul>
     203
     204
     205<h2>High performance computing</h2>
     206
     207<p>
     208High performance computing: e.g. multi-threading, instancing,
     209large atomic models, gpu computing, hdf5 files.
     210</p>
     211
     212<p><b>Opportunity</b>:</p>
     213<p>
     214The most widely cited Chimera volume capability is fitting a molecule in
     215a density map.  Dozens of programs do this.  Our unique advantage is the fit
     216is done in one second.  This allows trying many possibilities.  One of the
     217most common reasons verbally given for using Chimera for volume display is
     218"It loads my very large map, and other programs choke".  Analysis algorithm
     219literature focuses almost entirely on quality of results, not speed.  But
     220in practice, so much goes wrong in analysis that speed to allow many
     221alternate analysis attempts proves more important to whether high quality
     222results are achieved.  Where long-running calculations fail to find
     223the right answer, many refined quick analysis tries with human inspection
     224can often produce the right answer.
     225</p>
     226
     227<p><b>Possible Products</b>:</p>
     228<ul>
     229<li>Multi-threaded molecule in map fitting, allows global rotational search.
     230<li>Graphical copies ("instancing") for fast and memory efficient display
     231of multimeric assemblies and symmetric density maps.
     232<li>Memory efficient molecules.  This is our current bottleneck for analysis
     233of molecular assemblies (2 Kbytes/atom).
     234<li>OpenGL programs for 10x faster molecule display.  Allows for example
     235animating motions when examining molecule fitting alternatives (needs 30 frames
     236per second, only possible on small systems now).
     237<li>HDF5 map files containing subsampled copies of maps.
     238<li>Virus map symmetry compression.  60x savings on 1 Gbyte maps.
     239<li>Fourier coefficient map representation for fast interactive changes
     240in displayed resolution.
     241<li>Fast transparent surface rendering.
     242<li>Fast morph calculation, for animating model comparisons without user
     243separately computing morphs.
     244<li>OpenCL GPU computing.
     245</ul>
     246
     247<h2>Interactive mouse modes</h2>
     248
     249<p>
     250Continuous and direct mouse interaction (mouse modes).
     251</p>
     252
     253<p><b>Opportunity</b>:</p>
     254<p>
     255Continuous hand/eye interaction using mouse dragging is highly
     256valuable in analysis.  Most obvious example is rotating a model using
     257a mouse drag.  The advantage 30 frame/sec continuous hand control
     258becomes very apparent when compared to only being able to change view
     259direction with a typed command (as in some older software).
     260Translating, zooming, volume contour level adjustment, rotamer bond
     261rotation, clip plane positioning, hand fitting, volume cropping,
     262molecular dynamics playback, volume morphing are all powerful data
     263exploration methods in Chimera.  Many more are not available in
     264Chimera.
     265</p>
     266
     267<p><b>Possible Products</b>:</p>
     268<ul>
     269<li>Mouse mode interface (like paint program), that allows user to
     270quickly learn about all continuous interaction modes.
     271<li>Allow all continuous interaction using drags in graphics window
     272instead of slider in a different window.  Eliminates cumbersome window switches.
     273<li>Context sensitive mouse dragging -- where you click in graphics window matters.
     274<li>Segmenting volume data using drags to continuously grow regions.
     275<li>Mouse wheel flipping volume planes.
     276<li>Volume smoothing with degree of smoothing continuous control.
     277<li>Spherical virus slices with continuous radius control.
     278<li>Oblique slices of volumes with 3d volume shown for context.
     279<li>Volume slice tiling (like Mac Expose for tiling windows).
     280<li>Continuous atom zone distance variation.
     281<li>Steric repulsion when hand fitting.
     282<li>Real-time molecular dynamics on small regions during hand motion
     283of ligand, or bond rotation.
     284<li>Molecule normal animation with hand amplitude control.
     285</ul>
     286
     287<h2>Many Model Comparitive Analysis</h2>
     288
     289<p>
     290Tools to compare large numbers of objects: e.g. conformations from BLAST pdb,
     291interfaces between virus proteins, bacteria in termite gut,
     292enzymes binding sites (SFLD), alternative fits of molecules in maps.
     293</p>
     294
     295<p><b>Opportunity</b>:</p>
     296<p>
     297Researchers often compares dozens of homologous structures,
     298alternate map fits, segmented volume regions, binding interfaces.
     299I commonly see Chimera user's with 10 - 30 open models.
     300As biology research accumulates more models, analysis of many
     301models becomes as important as the one-at-a-time analysis that
     302Chimera, designed in a more data poor era, focuses on.  Working
     303with many models becomes too time consuming and tedious to be feasible
     304without multi-model analysis tools.  The Chimera View Dock tool is
     305a successful example of multi-model analysis.
     306</p>
     307
     308<p><b>Possible Products</b>:</p>
     309<ul>
     310<li>List all unique binding interfaces in molecular assemblies, e.g.
     311virus capsid.  Animate motion to any interface with only contact residues
     312shown.
     313<li>Allow inspection of dozens of molecule in map fit alternatives,
     314for example, animating from one to next with each scroll wheel click.
     315Simultaneous text display of goodness-of-fit values.
     316<li>Align and average together similar marked objects in EM tomography.
     317This is becoming a prominent technique called subtomogram averaging.
     318<li>Provide morphing capability between large sets of related structures
     319ordered to show principle differences.
     320<li>Show all BLAST PDB models with scroll-wheel to show each aligned to
     321reference.  And with background prefetch.
     322</ul>
     323
     324<h2>Coarse grain models</h2>
     325
     326<p><b>Opportunity</b>:</p>
     327<p>
     328No one has established even a simple common representation for models
     329at lower than atomic resolution.  A simple framework supporting
     330geometric models: spheres, ellipsoids, tubes, connections, coloring,
     331hierarchy, and exchange file format would allow sharing models of very
     332interesting biology.  For example, Davide Bau visited this week and
     333showed chromatin model, 50 Kbases of DNA adopts unique shapes during
     334transcription and when inactive (Nature Struct Biol, out next week).
     335He used Chimera volume tracer.
     336Sali lab IMP collaboration.  Auer lab cellular structures collaboration.
     337This may be a subproject of web data publication.
     338</p>
     339
     340<p><b>Possible Products</b>:</p>
     341<ul>
     342<li>Make a simple geometric model exchange format that Chimera, IMP, and
     343possibly a few other pertinent programs will read and write.
     344</ul>
     345
     346<h2>Stereo Animation</h2>
     347
     348<p><b>Opportunity</b>:</p>
     3493d consumer computer displays and televisions appear to just be taking
     350off.  ESPN 3D offers 3d sports broadcasts. Stereo animation may be attractive
     351for web data publication.
     352</p>
     353
     354<p><b>Possible Products</b>:</p>
     355<ul>
     356<li>Export video in stereo formats, YouTube, BluRay.  Study if any web
     357browser sequential stereo technology exists.
     358<li>Provide continuous depth of field control in keyframe animation.
     359</ul>
     360
     361<h2>SAXS suite of tools</h2>
     362
     363<p><b>Opportunity</b>:</p>
     364<p>
     365SAXS data and model visualization I think is an uncolonized niche.
     366Standard molecular viewers used with little specialized support.
     367Would be possible to make Chimera the standard SAXS visualization software
     368(similar to current our monopoly on single-particle EM volume display).
     369Don't currently have experimental collaborators (Sali lab methods devel), but
     370have talked with Alex Shkumatov in Dmitri Svergun lab -- world leader in SAXS
     371computation analysis.
     372</p>
     373
     374<p><b>Possible Products</b>:</p>
     375<ul>
     376<li>Display volume envelopes derived from SAXS profiles using third party
     377computation tools (Svergun lab).
     378<li>Handle conformational ensemble fitting to SAXS profiles (Nick Ulyanov
     379collaboration).
     380<li>Optimize speed of existing SAXS profile calculation (OpenCL GPU computing?)
     381for interactive exploration of SAXS fits.
     382</ul>
     383
     384<h2>High resolution EM modeling</h2>
     385
     386<p>
     387High resolution (3-4 Angstrom) EM model building.
     388</p>
     389
     390<p><b>Opportunity</b>:</p>
     391<p>
     392Single particle EM maps in the 3 to 4 Angstrom resolution range for viruses
     393are becoming common.  This is another opportunity to monopolize software for
     394an emerging subfield.  I formerly thought existing low-resolution xray model
     395building tools would be used for this data.  It appears a new generation of
     396model building software is needed.  Matt Baker in collaboration with U.
     397Washington computer science dept is developing Gorgon visualization and
     398model building from scratch, for 2 years.  It might be disruptive to compete
     399with that project.  Also it is a very hard problem, perhaps requiring
     400more resources than we can give it.  Gorgon is unlikely to succeed for lack
     401of man-power.
     402</p>
     403
     404<p><b>Possible Products</b>:</p>
     405<ul>
     406<li>Build protein backbones in 3-4 Angstrom maps.  Place side chains where
     407possible.
     408<li>Possibly some existing xray automated model building system could be
     409integrated into Chimera.  Needs research.
     410<li>Methods to represent quality of fit.  EM model validation is a very
     411active research problem.
     412</ul>
     413
     414<h2>Do these ideas require replacing Chimera?</h2>
     415
     416<p>
     417I favor large-scale incremental software changes, instead of starting over,
     418but we have never made such changes (can't shake OTF, wrappy,
     419Tk, fixed function OpenGL, extend atom specs to surfaces, memory efficient
     420molecules, abstract models are molecules).  Our practised accretion method
     421with no major changes offers good stability for outside developers
     422but we don't have many of those for other reasons.
     423</p>
     424
     425<p>
     426Developing a next-generation Chimera 2 while maintaining Chimera 1 is the
     427pattern we followed with the MidasPlus to Chimera transition, but it required
     428more than 5 years to get the next-generation code into initial distribution.
     429Chimera 2 could leverage much of the existing volume C++ code.
     430</p>
     431
     432}}}