| | 1 | {{{ |
| | 2 | #!html |
| | 3 | |
| | 4 | <h1> |
| | 5 | Plans for Electron Microscopy and Molecular Assemblies |
| | 6 | </h1> |
| | 7 | |
| | 8 | <p> |
| | 9 | Tom Goddard<br> |
| | 10 | December 3, 2010. |
| | 11 | </p> |
| | 12 | |
| | 13 | <p> |
| | 14 | Possible projects related to electron microscopy (EM) and molecular assemblies |
| | 15 | for RBVI next NCRR 5-year grant proposal to be submited in May 2011. |
| | 16 | </p> |
| | 17 | |
| | 18 | <p> |
| | 19 | Will talk about 10 project ideas that fall in 2 broad categories: |
| | 20 | </p> |
| | 21 | |
| | 22 | <ul> |
| | 23 | <li>Dissemination: Communicating Analysis Results and Analysis Methods |
| | 24 | <li>Technology: Advances in Visualization and Analysis Methods |
| | 25 | </ul> |
| | 26 | |
| | 27 | <h2> |
| | 28 | Conflicting goals: Impact, Fun, Fundable |
| | 29 | </h2> |
| | 30 | |
| | 31 | <ul> |
| | 32 | <li>Dissemination projects will have higher impact than technology projects |
| | 33 | (short term and long term). |
| | 34 | In electron microscopy / molecular assembly research community |
| | 35 | technology is abundant, know-how and data is scarce. |
| | 36 | <li>It is more fun to work on technology projects. |
| | 37 | <li>Grant reviewers are more impressed by technology projects. |
| | 38 | </ul> |
| | 39 | |
| | 40 | <h2> |
| | 41 | List of Project Ideas |
| | 42 | </h2> |
| | 43 | |
| | 44 | <p><b> |
| | 45 | Dissemination: Communicating Analysis Results and Analysis Methods |
| | 46 | </b></p> |
| | 47 | |
| | 48 | <ul> |
| | 49 | <li>Enable researchers to publish their models, analysis and data on the web |
| | 50 | such that other researchers can build on their work (computationally useful data). |
| | 51 | Map symmetry, segmentations, geometric models (chromatin), operational file |
| | 52 | formats, EMDB/ViPERdb/PDB/CCDB collaboration, web 2.0, lab notebook, 3d browser |
| | 53 | display (WebGL). |
| | 54 | <li>Extend the community of Chimera developers. Programmer documentation, |
| | 55 | simpler APIs, training through collaborations and workshops. |
| | 56 | <li>Video documentation for Chimera usage, task oriented. Currently even |
| | 57 | advanced Chimera users know little. |
| | 58 | </ul> |
| | 59 | |
| | 60 | <p><b> |
| | 61 | Technology: Advances in Visualization and Analysis Methods |
| | 62 | </b></p> |
| | 63 | |
| | 64 | <ul> |
| | 65 | <li>High performance computing: e.g. multi-threading, instancing, |
| | 66 | large atomic models, gpu computing, hdf5 files. |
| | 67 | <li>Continuous and direct mouse interaction (mouse modes): |
| | 68 | e.g. oblique or spherical map slicing, molecule normal modes, |
| | 69 | tiling volume slices, movement with clash detection. |
| | 70 | <li>Comparing large numbers of objects: e.g. conformations from BLAST pdb, |
| | 71 | interfaces between virus proteins, bacteria in termite gut, enzymes binding sites |
| | 72 | (SFLD), alternative fits of molecules in maps. |
| | 73 | </ul> |
| | 74 | |
| | 75 | <ul> |
| | 76 | <li>Coarse grain models. |
| | 77 | <li>Animation. |
| | 78 | <li>SAXS suite of tools. |
| | 79 | <li>High resolution (3-4 Angstrom) EM model building. |
| | 80 | </ul> |
| | 81 | |
| | 82 | <h2>Web presentation of models, analysis and data</h2> |
| | 83 | |
| | 84 | <p> |
| | 85 | Enable users to easily create web pages showing computer readable 3d data |
| | 86 | and analysis and models and interactive 3d renderings. |
| | 87 | Establish and document operational file formats |
| | 88 | (hdf5) to represent volume symmetry, segmentations, coarse grain models, ..., |
| | 89 | that can be adopted by public databases EMDB, ViPERdb, PDB, CCDB. |
| | 90 | </p> |
| | 91 | |
| | 92 | <p><b>Opportunity</b>:</p> |
| | 93 | |
| | 94 | <p> |
| | 95 | EM and molecular assemblies data and analysis is |
| | 96 | 95% lost -- only literature publication (pictures and words) of |
| | 97 | results. Computer readable results are not available except by |
| | 98 | personal request to the lab. Few EM maps, molecular models, symmetry |
| | 99 | parameters, sequence alignments, SAXS profiles, lists of interacting |
| | 100 | residues, ... are put into public archives. This stymies the whole |
| | 101 | research community effort to build computational understanding of |
| | 102 | molecular machines, cells, microbial communities. The build-up of |
| | 103 | computational knowledge from past decades of work is small at EM |
| | 104 | resolutions compared to what has been achieved at finer levels: |
| | 105 | proteins and sequences (PDB and seq databases). |
| | 106 | </p> |
| | 107 | |
| | 108 | <p><b>Possible Products</b>:</p> |
| | 109 | |
| | 110 | <ul> |
| | 111 | <li><b>Multimedia notebook.</b> Make Chimera able to export directly |
| | 112 | to HTML at user request. Can append to an html project work log scene |
| | 113 | images, spin animation, 3d model (WebGL), links to Chimera session, |
| | 114 | data files (PDB, map, sequence alignment, list of interface residues), |
| | 115 | measured values (distances, angles, RMSD, surface area, symmetry |
| | 116 | parameters...), user text notes. Output could be private record of |
| | 117 | data analysis for researcher, would aid creating journal manuscript, |
| | 118 | or can be made public -- web 2.0 style decentralized data publication. |
| | 119 | |
| | 120 | <li><b>File formats for public databases.</b> EMDB/ViPERdb/PDB/CCDB |
| | 121 | can't produce the software that makes data easily accessible. This |
| | 122 | makes it hard for them to establish useful file formats. Analysis |
| | 123 | software should define the file formats. Our strong connections with |
| | 124 | the database sites (EMDB/ViPERdb/PDB/CCDB) can be used to promote |
| | 125 | adoption of our useful file formats for data and analysis results. |
| | 126 | Will require documenting and more careful design of map, segmentation, |
| | 127 | marker set file formats. An example where software has defined the |
| | 128 | format of analysis results is standard xray data quality report |
| | 129 | (reflections, completeness, R-factor). |
| | 130 | |
| | 131 | <li><b>Data exchange between software.</b> |
| | 132 | Web data publication is most useful if many software packages can read |
| | 133 | the published files. So Chimera formats (hdf5 maps, segmentations, |
| | 134 | xml markers, symmetry operators, measurements) will have most value if |
| | 135 | other software developers add support to read those. We know many of |
| | 136 | the software developers in EM / molecular assemblies |
| | 137 | (EMAN/IMOD/Situs/Spider/BSOFT) and can collaborate to add support for our |
| | 138 | publication formats in those packages. |
| | 139 | |
| | 140 | <li><b>Coarse grain model schema and formats.</b> There are not |
| | 141 | standards for representing coarse grain models, e.g. nuclear pore |
| | 142 | architecture (Frank Albers), chromatin folding (Davide Bau). |
| | 143 | Geometric models using spheres, ellipsoids, tubes, and hierarchy for |
| | 144 | different levels of detail. Andrej Sali wants to solve this. Need a |
| | 145 | publishable format. Chimera marker set XML can be enhanced. |
| | 146 | Computable coarse grain models shared on the web would be innovative. |
| | 147 | |
| | 148 | </ul> |
| | 149 | |
| | 150 | <h2>Chimera Programmer Community</h2> |
| | 151 | |
| | 152 | <p> |
| | 153 | Extend the community of Chimera developers. |
| | 154 | Create programmer documentation, |
| | 155 | simpler APIs, training through collaborations and workshops. |
| | 156 | </p> |
| | 157 | |
| | 158 | <p><b>Opportunity</b>:</p> |
| | 159 | <p> |
| | 160 | Chimera contains many libraries to analyze volume data, molecules and |
| | 161 | assemblies. This is the powerful toolkit I use day-to-day to quickly build |
| | 162 | new analysis capabilities for collaborators. Ability to write a page or |
| | 163 | two of Python code greatly extends the analysis capabilities of Chimera. |
| | 164 | Programming by users can multiply the value of our core Chimera |
| | 165 | libraries many-fold and extend their lifetime, and avoid others reimplementing |
| | 166 | the same capabilities (e.g. Gorgon, V3D, UROX). |
| | 167 | </p> |
| | 168 | |
| | 169 | <p><b>Possible Products</b>:</p> |
| | 170 | |
| | 171 | <ul> |
| | 172 | <li>Documented well-designed programming interfaces to Chimera modules. |
| | 173 | <li>SciPy module for volume data. |
| | 174 | <li>Programming workshops and collaborator programming training. |
| | 175 | </ul> |
| | 176 | |
| | 177 | <h2>Video Documentation</h2> |
| | 178 | |
| | 179 | <p> |
| | 180 | Screen-capture videos showing how to do common analysis tasks with Chimera. |
| | 181 | Currently even advanced users know little about Chimera. |
| | 182 | </p> |
| | 183 | |
| | 184 | <p><b>Opportunity</b>:</p> |
| | 185 | <p>Most people who have used Chimera hundreds of |
| | 186 | times know only 1/4 of the capabilities they could productively use. I |
| | 187 | see this several times per week. (Yesterday's example Jiang Zhu, NIH, |
| | 188 | modeling proteins, uses Chimera for volumes, Grasp2 for multiple |
| | 189 | sequence alignments.) Video how-to documentation can greatly reduce |
| | 190 | the barrier to learning advanced Chimera techniques. Easy to follow, |
| | 191 | no missing steps, shows both how and what can be done. |
| | 192 | </p> |
| | 193 | |
| | 194 | <p><b>Possible Products</b>:</p> |
| | 195 | <ul> |
| | 196 | <li>Video demonstrations showing perhaps 100 common Chimera work-flows. |
| | 197 | My current 12 videos range from 5 to 8 minutes and each includes basic, |
| | 198 | intermediate, and advanced techniques. |
| | 199 | <li>Document common analysis protocols. Guide to what types of analysis are |
| | 200 | sensible with given data types. Graduate student researchers often struggle |
| | 201 | with this. |
| | 202 | </ul> |
| | 203 | |
| | 204 | |
| | 205 | <h2>High performance computing</h2> |
| | 206 | |
| | 207 | <p> |
| | 208 | High performance computing: e.g. multi-threading, instancing, |
| | 209 | large atomic models, gpu computing, hdf5 files. |
| | 210 | </p> |
| | 211 | |
| | 212 | <p><b>Opportunity</b>:</p> |
| | 213 | <p> |
| | 214 | The most widely cited Chimera volume capability is fitting a molecule in |
| | 215 | a density map. Dozens of programs do this. Our unique advantage is the fit |
| | 216 | is done in one second. This allows trying many possibilities. One of the |
| | 217 | most common reasons verbally given for using Chimera for volume display is |
| | 218 | "It loads my very large map, and other programs choke". Analysis algorithm |
| | 219 | literature focuses almost entirely on quality of results, not speed. But |
| | 220 | in practice, so much goes wrong in analysis that speed to allow many |
| | 221 | alternate analysis attempts proves more important to whether high quality |
| | 222 | results are achieved. Where long-running calculations fail to find |
| | 223 | the right answer, many refined quick analysis tries with human inspection |
| | 224 | can often produce the right answer. |
| | 225 | </p> |
| | 226 | |
| | 227 | <p><b>Possible Products</b>:</p> |
| | 228 | <ul> |
| | 229 | <li>Multi-threaded molecule in map fitting, allows global rotational search. |
| | 230 | <li>Graphical copies ("instancing") for fast and memory efficient display |
| | 231 | of multimeric assemblies and symmetric density maps. |
| | 232 | <li>Memory efficient molecules. This is our current bottleneck for analysis |
| | 233 | of molecular assemblies (2 Kbytes/atom). |
| | 234 | <li>OpenGL programs for 10x faster molecule display. Allows for example |
| | 235 | animating motions when examining molecule fitting alternatives (needs 30 frames |
| | 236 | per second, only possible on small systems now). |
| | 237 | <li>HDF5 map files containing subsampled copies of maps. |
| | 238 | <li>Virus map symmetry compression. 60x savings on 1 Gbyte maps. |
| | 239 | <li>Fourier coefficient map representation for fast interactive changes |
| | 240 | in displayed resolution. |
| | 241 | <li>Fast transparent surface rendering. |
| | 242 | <li>Fast morph calculation, for animating model comparisons without user |
| | 243 | separately computing morphs. |
| | 244 | <li>OpenCL GPU computing. |
| | 245 | </ul> |
| | 246 | |
| | 247 | <h2>Interactive mouse modes</h2> |
| | 248 | |
| | 249 | <p> |
| | 250 | Continuous and direct mouse interaction (mouse modes). |
| | 251 | </p> |
| | 252 | |
| | 253 | <p><b>Opportunity</b>:</p> |
| | 254 | <p> |
| | 255 | Continuous hand/eye interaction using mouse dragging is highly |
| | 256 | valuable in analysis. Most obvious example is rotating a model using |
| | 257 | a mouse drag. The advantage 30 frame/sec continuous hand control |
| | 258 | becomes very apparent when compared to only being able to change view |
| | 259 | direction with a typed command (as in some older software). |
| | 260 | Translating, zooming, volume contour level adjustment, rotamer bond |
| | 261 | rotation, clip plane positioning, hand fitting, volume cropping, |
| | 262 | molecular dynamics playback, volume morphing are all powerful data |
| | 263 | exploration methods in Chimera. Many more are not available in |
| | 264 | Chimera. |
| | 265 | </p> |
| | 266 | |
| | 267 | <p><b>Possible Products</b>:</p> |
| | 268 | <ul> |
| | 269 | <li>Mouse mode interface (like paint program), that allows user to |
| | 270 | quickly learn about all continuous interaction modes. |
| | 271 | <li>Allow all continuous interaction using drags in graphics window |
| | 272 | instead of slider in a different window. Eliminates cumbersome window switches. |
| | 273 | <li>Context sensitive mouse dragging -- where you click in graphics window matters. |
| | 274 | <li>Segmenting volume data using drags to continuously grow regions. |
| | 275 | <li>Mouse wheel flipping volume planes. |
| | 276 | <li>Volume smoothing with degree of smoothing continuous control. |
| | 277 | <li>Spherical virus slices with continuous radius control. |
| | 278 | <li>Oblique slices of volumes with 3d volume shown for context. |
| | 279 | <li>Volume slice tiling (like Mac Expose for tiling windows). |
| | 280 | <li>Continuous atom zone distance variation. |
| | 281 | <li>Steric repulsion when hand fitting. |
| | 282 | <li>Real-time molecular dynamics on small regions during hand motion |
| | 283 | of ligand, or bond rotation. |
| | 284 | <li>Molecule normal animation with hand amplitude control. |
| | 285 | </ul> |
| | 286 | |
| | 287 | <h2>Many Model Comparitive Analysis</h2> |
| | 288 | |
| | 289 | <p> |
| | 290 | Tools to compare large numbers of objects: e.g. conformations from BLAST pdb, |
| | 291 | interfaces between virus proteins, bacteria in termite gut, |
| | 292 | enzymes binding sites (SFLD), alternative fits of molecules in maps. |
| | 293 | </p> |
| | 294 | |
| | 295 | <p><b>Opportunity</b>:</p> |
| | 296 | <p> |
| | 297 | Researchers often compares dozens of homologous structures, |
| | 298 | alternate map fits, segmented volume regions, binding interfaces. |
| | 299 | I commonly see Chimera user's with 10 - 30 open models. |
| | 300 | As biology research accumulates more models, analysis of many |
| | 301 | models becomes as important as the one-at-a-time analysis that |
| | 302 | Chimera, designed in a more data poor era, focuses on. Working |
| | 303 | with many models becomes too time consuming and tedious to be feasible |
| | 304 | without multi-model analysis tools. The Chimera View Dock tool is |
| | 305 | a successful example of multi-model analysis. |
| | 306 | </p> |
| | 307 | |
| | 308 | <p><b>Possible Products</b>:</p> |
| | 309 | <ul> |
| | 310 | <li>List all unique binding interfaces in molecular assemblies, e.g. |
| | 311 | virus capsid. Animate motion to any interface with only contact residues |
| | 312 | shown. |
| | 313 | <li>Allow inspection of dozens of molecule in map fit alternatives, |
| | 314 | for example, animating from one to next with each scroll wheel click. |
| | 315 | Simultaneous text display of goodness-of-fit values. |
| | 316 | <li>Align and average together similar marked objects in EM tomography. |
| | 317 | This is becoming a prominent technique called subtomogram averaging. |
| | 318 | <li>Provide morphing capability between large sets of related structures |
| | 319 | ordered to show principle differences. |
| | 320 | <li>Show all BLAST PDB models with scroll-wheel to show each aligned to |
| | 321 | reference. And with background prefetch. |
| | 322 | </ul> |
| | 323 | |
| | 324 | <h2>Coarse grain models</h2> |
| | 325 | |
| | 326 | <p><b>Opportunity</b>:</p> |
| | 327 | <p> |
| | 328 | No one has established even a simple common representation for models |
| | 329 | at lower than atomic resolution. A simple framework supporting |
| | 330 | geometric models: spheres, ellipsoids, tubes, connections, coloring, |
| | 331 | hierarchy, and exchange file format would allow sharing models of very |
| | 332 | interesting biology. For example, Davide Bau visited this week and |
| | 333 | showed chromatin model, 50 Kbases of DNA adopts unique shapes during |
| | 334 | transcription and when inactive (Nature Struct Biol, out next week). |
| | 335 | He used Chimera volume tracer. |
| | 336 | Sali lab IMP collaboration. Auer lab cellular structures collaboration. |
| | 337 | This may be a subproject of web data publication. |
| | 338 | </p> |
| | 339 | |
| | 340 | <p><b>Possible Products</b>:</p> |
| | 341 | <ul> |
| | 342 | <li>Make a simple geometric model exchange format that Chimera, IMP, and |
| | 343 | possibly a few other pertinent programs will read and write. |
| | 344 | </ul> |
| | 345 | |
| | 346 | <h2>Stereo Animation</h2> |
| | 347 | |
| | 348 | <p><b>Opportunity</b>:</p> |
| | 349 | 3d consumer computer displays and televisions appear to just be taking |
| | 350 | off. ESPN 3D offers 3d sports broadcasts. Stereo animation may be attractive |
| | 351 | for web data publication. |
| | 352 | </p> |
| | 353 | |
| | 354 | <p><b>Possible Products</b>:</p> |
| | 355 | <ul> |
| | 356 | <li>Export video in stereo formats, YouTube, BluRay. Study if any web |
| | 357 | browser sequential stereo technology exists. |
| | 358 | <li>Provide continuous depth of field control in keyframe animation. |
| | 359 | </ul> |
| | 360 | |
| | 361 | <h2>SAXS suite of tools</h2> |
| | 362 | |
| | 363 | <p><b>Opportunity</b>:</p> |
| | 364 | <p> |
| | 365 | SAXS data and model visualization I think is an uncolonized niche. |
| | 366 | Standard molecular viewers used with little specialized support. |
| | 367 | Would be possible to make Chimera the standard SAXS visualization software |
| | 368 | (similar to current our monopoly on single-particle EM volume display). |
| | 369 | Don't currently have experimental collaborators (Sali lab methods devel), but |
| | 370 | have talked with Alex Shkumatov in Dmitri Svergun lab -- world leader in SAXS |
| | 371 | computation analysis. |
| | 372 | </p> |
| | 373 | |
| | 374 | <p><b>Possible Products</b>:</p> |
| | 375 | <ul> |
| | 376 | <li>Display volume envelopes derived from SAXS profiles using third party |
| | 377 | computation tools (Svergun lab). |
| | 378 | <li>Handle conformational ensemble fitting to SAXS profiles (Nick Ulyanov |
| | 379 | collaboration). |
| | 380 | <li>Optimize speed of existing SAXS profile calculation (OpenCL GPU computing?) |
| | 381 | for interactive exploration of SAXS fits. |
| | 382 | </ul> |
| | 383 | |
| | 384 | <h2>High resolution EM modeling</h2> |
| | 385 | |
| | 386 | <p> |
| | 387 | High resolution (3-4 Angstrom) EM model building. |
| | 388 | </p> |
| | 389 | |
| | 390 | <p><b>Opportunity</b>:</p> |
| | 391 | <p> |
| | 392 | Single particle EM maps in the 3 to 4 Angstrom resolution range for viruses |
| | 393 | are becoming common. This is another opportunity to monopolize software for |
| | 394 | an emerging subfield. I formerly thought existing low-resolution xray model |
| | 395 | building tools would be used for this data. It appears a new generation of |
| | 396 | model building software is needed. Matt Baker in collaboration with U. |
| | 397 | Washington computer science dept is developing Gorgon visualization and |
| | 398 | model building from scratch, for 2 years. It might be disruptive to compete |
| | 399 | with that project. Also it is a very hard problem, perhaps requiring |
| | 400 | more resources than we can give it. Gorgon is unlikely to succeed for lack |
| | 401 | of man-power. |
| | 402 | </p> |
| | 403 | |
| | 404 | <p><b>Possible Products</b>:</p> |
| | 405 | <ul> |
| | 406 | <li>Build protein backbones in 3-4 Angstrom maps. Place side chains where |
| | 407 | possible. |
| | 408 | <li>Possibly some existing xray automated model building system could be |
| | 409 | integrated into Chimera. Needs research. |
| | 410 | <li>Methods to represent quality of fit. EM model validation is a very |
| | 411 | active research problem. |
| | 412 | </ul> |
| | 413 | |
| | 414 | <h2>Do these ideas require replacing Chimera?</h2> |
| | 415 | |
| | 416 | <p> |
| | 417 | I favor large-scale incremental software changes, instead of starting over, |
| | 418 | but we have never made such changes (can't shake OTF, wrappy, |
| | 419 | Tk, fixed function OpenGL, extend atom specs to surfaces, memory efficient |
| | 420 | molecules, abstract models are molecules). Our practised accretion method |
| | 421 | with no major changes offers good stability for outside developers |
| | 422 | but we don't have many of those for other reasons. |
| | 423 | </p> |
| | 424 | |
| | 425 | <p> |
| | 426 | Developing a next-generation Chimera 2 while maintaining Chimera 1 is the |
| | 427 | pattern we followed with the MidasPlus to Chimera transition, but it required |
| | 428 | more than 5 years to get the next-generation code into initial distribution. |
| | 429 | Chimera 2 could leverage much of the existing volume C++ code. |
| | 430 | </p> |
| | 431 | |
| | 432 | }}} |