Changes between Initial Version and Version 1 of ChIMP/RBVISupplemental


Ignore:
Timestamp:
Oct 15, 2009, 4:57:25 PM (17 years ago)
Author:
Scooter Morris
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ChIMP/RBVISupplemental

    v1 v1  
     1= RBVI Supplemental Grant =
     2
     3== Specific AIM 1 ==
     4''Development of a general web services infrastructure for communicating between Chimera and arbitrary back-end services''
     5[[BR]]
     6'''Leaders: Conrad Huang, Ben Webb'''
     7
     8Web services provide a convenient, platform independent way for clients to obtain services from a remote computer.
     9For example, while UCSF Chimera is not distributed with large databases, users can fetch a large variety of data via the web,
     10e.g. from the Protein Data Bank (PDB), the Computed Atlas of Surface Topography of proteins (CASTp), the Electron Microscopy
     11Data Bank (EMDB), etc.  There are other useful capabilities that have not been included with Chimera because they require either
     12computational or data resources that are not suitable for desktop computing, and they are not accessed via web services because
     13long response times to requests do not fit well with the interactive nature of Chimera.  To address this limitation, we propose to
     14deploy service oriented architecture (SOA) software that support long-running jobs on the RBVI web servers and implement code
     15infrastructure in Chimera to communicate with servers.
     16
     17A strong candidate for the RBVI SOA software is the Opal Toolkit from the National Biomedical Computation Resource (NBCR).
     18Opal "is a toolkit for wrapping scientific applications as Web services in a matter of hours, providing features such as scheduling,
     19standards-based Grid security and data management in an easy-to-use and configurable manner.”[http://nbcr.net/software/opal/]
     20We will deploy the Opal Toolkit or software with similar capabilities on the RBVI web servers to create a web services platform.
     21On the client side, we will first implement a simple interface to access each available service directly; that is, Chimera will “know”
     22about which RBVI web services are available and what parameters are expected for each.  This will enable us to provide additional
     23functionality in Chimera very quickly.  In the second infrastructure phase, we will implement a more powerful interface that queries
     24the web server for descriptions for all of available services.  For example, the Opal Toolkit provides a very flexible description
     25(metadata) of each service, including expected parameters and their data types, invocation command syntax, and generated output.
     26We will implement a Chimera extension that will generate a graphical user interface from web service metadata, and also map service
     27output to data structures to make them accessible to users via the standard Chimera interface.
     28
     29== Specific AIM 2 ==
     30''Enhancement of UCSF Chimera to provide an interface to the IMP algorithms for fitting multiple subunits into CryoEM Maps''
     31[[BR]]
     32'''Leaders: Tom Goddard, Keren Lasker'''
     33
     34Structural description of macromolecular assemblies is essential for a mechanistic understanding of the cell. The scope of the problem is revealed by protein interaction studies: The yeast cell contains approximately 800 distinct core complexes of 4.9 proteins, on average most of which have not yet been structurally characterized. The human proteome is likely to have an order of magnitude more distinct assemblies than the yeast cell. Therefore, there are thousands of biologically relevant assemblies whose structures still need to be determined.
     35
     36Cryo-electron microscopy (cryoEM) techniques have become a standard tool for studying the structure of large macromolecular assemblies. The increasing numbers of atomic and cryoEM data sets have stimulated the development of integrative modeling approaches that strive to model an assembly by fitting atomic structures of assembly proteins into a cryoEM density map of the whole assembly. When the structure of a homologous assembly (template) is available, the placements of the proteins can be computed by fitting the template into the target assembly density, superposing the target protein models on the corresponding template proteins, and refining the model. However, when only a cryoEM map and protein structures are available, a general method for solving the configuration problem is required.
     37
     38Proteomics data can be used as additional source of low-resolution information about interactions and spatial proximities between groups of proteins in modeling structures of large assemblies {NPC, 26S}. This data can be obtained from different types of experimental methods such as yeast-two hybrid, chemical cross-linking, various forms of mass-spectrometry, in vitro pull-downs, and co-purifications. Information about the presence of interactions or spatial proximities between a group of proteins can be translated into a conditional restraint of ambiguous upper bounds restraints between pairs of proteins {Alber, review}. Therefore, even when the details about the interfaces between interacting proteins are not available, the distances between the protein centroids can still be restrained.
     39
     40As the number of assemblies characterized by both cryoEM and proteomics is rapidly increasing, modeling approaches that combine these two types of information are required. Moreover, visualization of the corresponding data and structures is also needed.
     41
     42MultiFit {MultiFit ref} is a new method in IMP for determining the configuration of multiple high-resolution protein structures based on the quality-of-fit of each protein into the density map, the protrusion of each protein from the map envelope, and the shape complementarity between pairs of proteins. The combination of these terms reduces the ambiguity of the final solution, compared to using any individual term on its own. Proteomics data can be included in MultiFit both as terms in the scoring function and to target the sampling.
     43
     44We suggest using Chimera to visualize the process of determining the structures of large assemblies using cryoEM and proteomics data by IMP, and to visualize the resulting structures. We propose to handle five common scenarios in this combined use of Chimera and IMP. In all scenarios, a density map is displayed in Chimera, MultiFit is called out of Chimera to calculate an anchor point graph, which is then displayed on the screen. The anchor graph approximates the positions of the proteins in the assembly and the interactions between them. The nodes of the graph are the centroids of L approximately equally sized regions of density voxels. When L equals to the number of proteins in the assembly, and the proteins are of similar sizes, the centroids of the regions correspond to approximately the centroids of the proteins in the assembly. The edges of the graph are between nodes of corresponding neighboring regions, which approximate the assembly interaction map. Such a graph can help the user place proteins in the map.
     45
     46Proposed interaction:
     47 1. The user manually positions and orients the proteins in the density and MultiFit is called for local refinement.
     48 2. The user positions proteins on anchor points but is not certain about the orientations. A search of all possible orientations is performed using MultiFit.
     49 3. Given an assembly model in the density, the user positions and orients the proteins and views how values of individual restraints of the scoring function changes. The scoring function is the sum of the MultiFit terms and restraints derived from proteomics data.
     50 4. The user does not know much about the assembly configuration, but might be able to provide some interaction data coming from proteomics experiments. MultiFit runs in the background, and the results are visualized in Chimera. The user can then further refine the model by applying scenarios 1-3.
     51 5. A more challenging scenario would be to account for missing structures. The user can draw spheres or ellipsoids of different sizes, which can then be treated as low-resolution structures in the calculation of the assembly structures.
     52
     53
     54
     55[wiki:ChIMP/MultiFit MultiFit & Chimera for fitting EM maps] - Illustration of multifit use (Tom G).
     56
     57[attachment:chimera_em_proteomics_modeling.docx MultiFit / Chimera proposal] - Keren Lasker's write-up.
     58
     59== Specific AIM 3 ==
     60''Enhancement of UCSF Chimera to provide an interface to the IMP algorithms for doing structural refinement and fitting to SAXS data''
     61[[BR]]
     62'''Leaders: Tom Goddard, Dina Schneidman'''
     63
     64A Small Angle X‐Ray Scattering (SAXS) measurement determines rotationally
     65averaged scattering intensity of a molecule as a function of spatial frequency (SAXS
     66profile), typically at 1‐ to 3‐nm resolution. The experiment is simple and typically
     67takes several hours. In recent years, SAXS experiments became increasingly popular
     68and computational tools for data interpretation are required.
     69
     70SAXS profile can be transformed into an electron pair distance distribution function,
     71P(r), which is a histogram of all pairwise distances, r, of the electrons in the sample.
     72Due to the rotational averaging, the information content of a SAXS profile is lower
     73compared to an X‐ray crystallographic diffraction pattern or even a density map
     74from cryo‐EM. However SAXS profile can be very useful in modeling assembly
     75configuration, when the structure of the individual component proteins is known.
     76Given an assembly model we can compute a computational SAXS profile and
     77compare it to the experimental one. We can further use SAXS profile for scoring and
     78refinement of alternative assembly models.
     79Currently, there is no graphical interface that ties the 3D structure to its
     80corresponding 2D SAXS profile: the structures and profiles are viewed separately in
     81molecular viewers and plotting tools (Excel, Matlab, gnuplot, etc…).
     82
     83There are several ways for combining 3D structure representation and SAXS profiles
     84(starting with the easiest and going to the fancy ones):
     85
     86 1. Loading a structure and a profile simultaneously and '''displaying''' them
     87 '''together'''. This will simply eliminate the need to use a plotting tool in
     88 addition to 3D viewer. This part does not require IMP. Since Chimera already
     89 supports 2D data display, it should be relatively simple.
     90 2. '''Computing''' and display the '''SAXS profile''' for the displayed structure or parts
     91 of it. '''Fitting''' the computational SAXS profile to the experimental one and
     92 displaying them together. The SAXS module of IMP can perform profile
     93 computation and profile fitting. The computation of a profile takes less than a
     94 second for an average size protein (500‐600 residues), but scales
     95 quadratically and can take seconds for large assemblies (80 seconds for
     96 GroEL/GroES complex). Therefore, for large structures it maybe a better
     97 solution to run profile computation in the background.
     98 3. Support '''structure modification''' (change of torsion angles, movement of
     99 molecules). The SAXS profile has to be re‐computed after each modification
     100 and it’s 2D display has to be updated. This is a nice extension to 2 and should
     101 be simple once 2 is implemented. It is possible to speed up the calculation of
     102 the profile here, since only pair‐wise atomic distances that have been
     103 changed should be recomputed.
     104 4. '''Refinement of the assembly model''' to better fit the experimental SAXS
     105 profile. This part involves optimization algorithm that modifies and fits the
     106 structure to SAXS profile. The simple option it to run it in the background and
     107 display final result. To make it fancier, it is possible to display intermediate
     108 structures and their SAXS profiles.
     109
     110[attachment:"Chimera – SAXS support.pdf:wiki:ChIMP" SAXS Chimera interface] - Write-up by Dina Schneidman.
     111
     112== Specific AIM 4 ==
     113''Enhancement of UCSF Chimera to interface to MODELLER and MODWEB, including loop modeling''
     114[[BR]]
     115'''Leaders: Eric Pettersen, Ben Webb'''
     116
     117A researcher interested in a structural analysis of a protein (such as a docking study) will frequently find that while the sequence of the protein is known, the 3D structure is not.  This is largely due to the much higher effort involved in a structure determination experiment relative to sequence determination and also sometimes due to certain types of proteins (e.g. membrane proteins) not being amenable to the most common structure-determination methods (e.g. X-ray crystallography).
     118
     119A frequent approach for a researcher with such a protein is to employ a homology-modeling program to generate a structure based on a protein with a similar sequence and known structure.  Using such a program can be daunting for a researcher without a strong bioinformatics background.
     120
     121The goal of this proposal is to make the homology-modeling process more approachable for typical researchers.  Specifically, we intend to take our widely-used homology-modeling program, Modeller, and make it available as a web service (as per AIM 1).  Then we would enhance UCSF Chimera to so that a user providing a sequence and a homologous structure could use the web service to produce a homology model.  Since the homology model computation can take some time, it will use Chimera's new task-management capabilities to allow the user to quit Chimera if desired and access the completed task in a later invocation of Chimera.  The model can be saved using Chimera's normal structure-saving capabilities.
     122
     123There are also usage scenarios where the researcher only needs a sequence to generate the homology model.  For instance, using the StructureViz [Morris] Cytoscape extension (which interface to Chimera), a researcher could indicate a protein (node) of interest an ask for a homology model.  StructureViz would identify nodes in the same cluster with associated structures and use the most similar one as the homology-model template structure.  StructureViz would then remote-control Chimera's interface to the Modeller web service to create the homology model and display it in Chimera.
     124
     125== Specific AIM 5 ==
     126''Enhancement of UCSF Chimera to significantly improve animation support''
     127[[BR]]
     128'''Leaders: Conrad Huang, Tom Goddard'''
     129
     130Communicating Research using Animations and Physical Models
     131
     132This project will develop software for creating animations and will explore
     133the use of physical models to improve the communication of scientific
     134hypotheses and results between researchers.  The new animation tools will
     135allow scientists to illustrate their discoveries of dynamic molecular
     136and cellular processes in publications and presentations aimed at other
     137researchers.  The second aspect of this project will use 3-dimensional
     138printing technology to make multi-piece plastic models of molecular assemblies
     139to facilitate discussion of hypotheses and discoveries regarding the
     140archtecture and function of the assemblies among groups of 2 or 3 researchers.
     141
     142The purpose of animation capabilities we will develop within the UCSF Chimera
     143molecular visualization package is to allow researchers without special training
     144to compose molecular movies of modest complexity.  Most current animations
     145created by researchers simply spin a 3-dimensional scene around
     146360 degrees.  We intend to extend the common repetoire to include animations
     147of ligand binding, conformational change of proteins, assembly pathways of
     148multi-protein complexes, and functional motions and transformations of
     149molecular machines such as viruses, proteasomes, ribosomes, replicases, ....
     150Skills to create such animations are currently limited to a small number of
     151research labs with hundreds of hours invested in learning animation software
     152primarily used for video game development and cinematic applications.
     153The transition from print journals to online publication has created the
     154opportunity for much wider use of animation to supplement static images
     155and complex text descriptions of dynamic molecular and cellular events.
     156
     157There are four aspects of our proposed animation tools 1) command scripts,
     1582) graphical timeline editor, 3) example animations, and 4) new motion
     159capabilities.  The first two tasks provide mechanisms for composing an
     160animation.  The most basic method is to create in a text editor a sequence
     161of commands that perform the desired molecular motions, color highlighting,
     162changes in viewpoint, titling, etc.  Only about half of the ~40 needed
     163commands currently exist in the Chimera command language.  We will add the
     164missing commands that provide functions currently only available through
     165user interface dialogs.  While a command script allows flexibility and
     166simple editing it requires extensive knowledge of command syntax.  We
     167will implement a timeline editor that provides the power of command scripts
     168using graphical dialogs, menus, buttons, and parameter entry fields.
     169This interface for composing animations requires less knowledge to use since
     170all the options are presented through controls on-screen.  To allow
     171researchers to quickly learn these animation tools we will provide ten
     172example animations including command scripts, graphical timeline session
     173files, molecular data files, and resulting movies.  The examples will be
     174available on the web and cover all of the available capabilities.  These
     175can be used as templates with one's own data, cutting, pasting and deleting
     176segments as appropriate.  The most innovative component of this project
     177will add commands that provide new modes of illustrating the function of
     178molecular systems, two examples being "rigging" and exploded views.  Rigging
     179involves defining allowed hinge and glide motions for pieces of a molecular
     180assembly to turn a rigid model into an articulated one.  For example, a
     181ribosome model would allow a ratchet rotation between large and small subunits
     182and allow tRNA molecules to advance from A (aminoacyl) to the P (peptidyl)
     183to the  E (exit) binding sites.  Molecular assemblies are usually densely
     184packed and an "exploded view" mode would move the constituent molecules
     185away from one another in a minimal way so internal components are visible.
     186
     187While animations are effective for communicating research via talks and
     188publications, physical models can be valuable in one-on-one and
     189small group discussions of the architecture and mechanisms of molecular
     190machinery.  For example, a hand-size plastic model of 3 to 10 colored
     191pieces joined by small magnets can be manipulated to explore the variety
     192of possible assemblies of the 26S proteasome, a machine that degrades
     193unneeded proteins and whose architecture is currently only partially understood.
     194Physical models can be a more powerful communication device than computer
     195graphics for many subtle reasons: use of hands to point and illustrate motions,
     196easy exchange of who holds the model, facile rearrangement of pieces, and
     197the ability of participants to face each other.  We propose to purchase a
     1983-dimensional printer that prints plastic models and develop techniques
     199to design multi-piece magnet-assembled models in our Chimera software.
     200Initial tests have been done with virus and nuclear pore models using an
     201existing 3-d printer on our campus.  Print times of 10 hours or more
     202and contention with other labs make a dedicated printer for molecular
     203physical models important to furthering this work.