RBVI Supplemental Grant
Specific AIM 1
Development of a general web services infrastructure for communicating between Chimera and arbitrary back-end services
Leaders: Conrad Huang, Ben Webb
Web services provide a convenient, platform independent way for clients to obtain services from a remote computer. For example, while UCSF Chimera is not distributed with large databases, users can fetch a large variety of data via the web, e.g. from the Protein Data Bank (PDB), the Computed Atlas of Surface Topography of proteins (CASTp), the Electron Microscopy Data Bank (EMDB), etc. There are other useful capabilities that have not been included with Chimera because they require either computational or data resources that are not suitable for desktop computing, and they are not accessed via web services because long response times to requests do not fit well with the interactive nature of Chimera. To address this limitation, we propose to deploy service oriented architecture (SOA) software that support long-running jobs on the RBVI web servers and implement code infrastructure in Chimera to communicate with servers.
A strong candidate for the RBVI SOA software is the Opal Toolkit from the National Biomedical Computation Resource (NBCR). Opal "is a toolkit for wrapping scientific applications as Web services in a matter of hours, providing features such as scheduling, standards-based Grid security and data management in an easy-to-use and configurable manner.”http://nbcr.net/software/opal/ We will deploy the Opal Toolkit or software with similar capabilities on the RBVI web servers to create a web services platform. On the client side, we will first implement a simple interface to access each available service directly; that is, Chimera will “know” about which RBVI web services are available and what parameters are expected for each. This will enable us to provide additional functionality in Chimera very quickly. In the second infrastructure phase, we will implement a more powerful interface that queries the web server for descriptions for all of available services. For example, the Opal Toolkit provides a very flexible description (metadata) of each service, including expected parameters and their data types, invocation command syntax, and generated output. We will implement a Chimera extension that will generate a graphical user interface from web service metadata, and also map service output to data structures to make them accessible to users via the standard Chimera interface.
Specific AIM 2
Enhancement of UCSF Chimera to provide an interface to the IMP algorithms for fitting multiple subunits into CryoEM Maps
Leaders: Tom Goddard, Keren Lasker
Structural description of macromolecular assemblies is essential for a mechanistic understanding of the cell. The scope of the problem is revealed by protein interaction studies: The yeast cell contains approximately 800 distinct core complexes of 4.9 proteins, on average most of which have not yet been structurally characterized. The human proteome is likely to have an order of magnitude more distinct assemblies than the yeast cell. Therefore, there are thousands of biologically relevant assemblies whose structures still need to be determined.
Cryo-electron microscopy (cryoEM) techniques have become a standard tool for studying the structure of large macromolecular assemblies. The increasing numbers of atomic and cryoEM data sets have stimulated the development of integrative modeling approaches that strive to model an assembly by fitting atomic structures of assembly proteins into a cryoEM density map of the whole assembly. When the structure of a homologous assembly (template) is available, the placements of the proteins can be computed by fitting the template into the target assembly density, superposing the target protein models on the corresponding template proteins, and refining the model. However, when only a cryoEM map and protein structures are available, a general method for solving the configuration problem is required.
Proteomics data can be used as additional source of low-resolution information about interactions and spatial proximities between groups of proteins in modeling structures of large assemblies {NPC, 26S}. This data can be obtained from different types of experimental methods such as yeast-two hybrid, chemical cross-linking, various forms of mass-spectrometry, in vitro pull-downs, and co-purifications. Information about the presence of interactions or spatial proximities between a group of proteins can be translated into a conditional restraint of ambiguous upper bounds restraints between pairs of proteins {Alber, review}. Therefore, even when the details about the interfaces between interacting proteins are not available, the distances between the protein centroids can still be restrained.
As the number of assemblies characterized by both cryoEM and proteomics is rapidly increasing, modeling approaches that combine these two types of information are required. Moreover, visualization of the corresponding data and structures is also needed.
MultiFit {MultiFit ref} is a new method in IMP for determining the configuration of multiple high-resolution protein structures based on the quality-of-fit of each protein into the density map, the protrusion of each protein from the map envelope, and the shape complementarity between pairs of proteins. The combination of these terms reduces the ambiguity of the final solution, compared to using any individual term on its own. Proteomics data can be included in MultiFit both as terms in the scoring function and to target the sampling.
We suggest using Chimera to visualize the process of determining the structures of large assemblies using cryoEM and proteomics data by IMP, and to visualize the resulting structures. We propose to handle five common scenarios in this combined use of Chimera and IMP. In all scenarios, a density map is displayed in Chimera, MultiFit is called out of Chimera to calculate an anchor point graph, which is then displayed on the screen. The anchor graph approximates the positions of the proteins in the assembly and the interactions between them. The nodes of the graph are the centroids of L approximately equally sized regions of density voxels. When L equals to the number of proteins in the assembly, and the proteins are of similar sizes, the centroids of the regions correspond to approximately the centroids of the proteins in the assembly. The edges of the graph are between nodes of corresponding neighboring regions, which approximate the assembly interaction map. Such a graph can help the user place proteins in the map.
Proposed interaction:
- The user manually positions and orients the proteins in the density and MultiFit is called for local refinement.
- The user positions proteins on anchor points but is not certain about the orientations. A search of all possible orientations is performed using MultiFit.
- Given an assembly model in the density, the user positions and orients the proteins and views how values of individual restraints of the scoring function changes. The scoring function is the sum of the MultiFit terms and restraints derived from proteomics data.
- The user does not know much about the assembly configuration, but might be able to provide some interaction data coming from proteomics experiments. MultiFit runs in the background, and the results are visualized in Chimera. The user can then further refine the model by applying scenarios 1-3.
- A more challenging scenario would be to account for missing structures. The user can draw spheres or ellipsoids of different sizes, which can then be treated as low-resolution structures in the calculation of the assembly structures.
MultiFit & Chimera for fitting EM maps - Illustration of multifit use (Tom G).
MultiFit / Chimera proposal - Keren Lasker's write-up.
Specific AIM 3
Enhancement of UCSF Chimera to provide an interface to the IMP algorithms for doing structural refinement and fitting to SAXS data
Leaders: Tom Goddard, Dina Schneidman
A Small Angle X‐Ray Scattering (SAXS) measurement determines rotationally averaged scattering intensity of a molecule as a function of spatial frequency (SAXS profile), typically at 1‐ to 3‐nm resolution. The experiment is simple and typically takes several hours. In recent years, SAXS experiments became increasingly popular and computational tools for data interpretation are required.
SAXS profile can be transformed into an electron pair distance distribution function, P(r), which is a histogram of all pairwise distances, r, of the electrons in the sample. Due to the rotational averaging, the information content of a SAXS profile is lower compared to an X‐ray crystallographic diffraction pattern or even a density map from cryo‐EM. However SAXS profile can be very useful in modeling assembly configuration, when the structure of the individual component proteins is known. Given an assembly model we can compute a computational SAXS profile and compare it to the experimental one. We can further use SAXS profile for scoring and refinement of alternative assembly models. Currently, there is no graphical interface that ties the 3D structure to its corresponding 2D SAXS profile: the structures and profiles are viewed separately in molecular viewers and plotting tools (Excel, Matlab, gnuplot, etc…).
There are several ways for combining 3D structure representation and SAXS profiles (starting with the easiest and going to the fancy ones):
- Loading a structure and a profile simultaneously and displaying them together. This will simply eliminate the need to use a plotting tool in addition to 3D viewer. This part does not require IMP. Since Chimera already supports 2D data display, it should be relatively simple.
- Computing and display the SAXS profile for the displayed structure or parts of it. Fitting the computational SAXS profile to the experimental one and displaying them together. The SAXS module of IMP can perform profile computation and profile fitting. The computation of a profile takes less than a second for an average size protein (500‐600 residues), but scales quadratically and can take seconds for large assemblies (80 seconds for GroEL/GroES complex). Therefore, for large structures it maybe a better solution to run profile computation in the background.
- Support structure modification (change of torsion angles, movement of molecules). The SAXS profile has to be re‐computed after each modification and it’s 2D display has to be updated. This is a nice extension to 2 and should be simple once 2 is implemented. It is possible to speed up the calculation of the profile here, since only pair‐wise atomic distances that have been changed should be recomputed.
- Refinement of the assembly model to better fit the experimental SAXS profile. This part involves optimization algorithm that modifies and fits the structure to SAXS profile. The simple option it to run it in the background and display final result. To make it fancier, it is possible to display intermediate structures and their SAXS profiles.
SAXS Chimera interface - Write-up by Dina Schneidman.
Specific AIM 4
Enhancement of UCSF Chimera to interface to MODELLER and MODWEB, including loop modeling
Leaders: Eric Pettersen, Ben Webb
A researcher interested in a structural analysis of a protein (such as a docking study) will frequently find that while the sequence of the protein is known, the 3D structure is not. This is largely due to the much higher effort involved in a structure determination experiment relative to sequence determination and also sometimes due to certain types of proteins (e.g. membrane proteins) not being amenable to the most common structure-determination methods (e.g. X-ray crystallography).
A frequent approach for a researcher with such a protein is to employ a homology-modeling program to generate a structure based on a protein with a similar sequence and known structure. Using such a program can be daunting for a researcher without a strong bioinformatics background.
The goal of this proposal is to make the homology-modeling process more approachable for typical researchers. Specifically, we intend to take our widely-used homology-modeling program, Modeller, and make it available as a web service (as per AIM 1). Then we would enhance UCSF Chimera to so that a user providing a sequence and a homologous structure could use the web service to produce a homology model. Since the homology model computation can take some time, it will use Chimera's new task-management capabilities to allow the user to quit Chimera if desired and access the completed task in a later invocation of Chimera. The model can be saved using Chimera's normal structure-saving capabilities.
There are also usage scenarios where the researcher only needs a sequence to generate the homology model. For instance, using the StructureViz [Morris] Cytoscape extension (which interface to Chimera), a researcher could indicate a protein (node) of interest an ask for a homology model. StructureViz would identify nodes in the same cluster with associated structures and use the most similar one as the homology-model template structure. StructureViz would then remote-control Chimera's interface to the Modeller web service to create the homology model and display it in Chimera.
Specific AIM 5
Enhancement of UCSF Chimera to significantly improve animation support
Leaders: Conrad Huang, Tom Goddard
Communicating Research using Animations and Physical Models
This project will develop software for creating animations and will explore the use of physical models to improve the communication of scientific hypotheses and results between researchers. The new animation tools will allow scientists to illustrate their discoveries of dynamic molecular and cellular processes in publications and presentations aimed at other researchers. The second aspect of this project will use 3-dimensional printing technology to make multi-piece plastic models of molecular assemblies to facilitate discussion of hypotheses and discoveries regarding the archtecture and function of the assemblies among groups of 2 or 3 researchers.
The purpose of animation capabilities we will develop within the UCSF Chimera molecular visualization package is to allow researchers without special training to compose molecular movies of modest complexity. Most current animations created by researchers simply spin a 3-dimensional scene around 360 degrees. We intend to extend the common repetoire to include animations of ligand binding, conformational change of proteins, assembly pathways of multi-protein complexes, and functional motions and transformations of molecular machines such as viruses, proteasomes, ribosomes, replicases, .... Skills to create such animations are currently limited to a small number of research labs with hundreds of hours invested in learning animation software primarily used for video game development and cinematic applications. The transition from print journals to online publication has created the opportunity for much wider use of animation to supplement static images and complex text descriptions of dynamic molecular and cellular events.
There are four aspects of our proposed animation tools 1) command scripts, 2) graphical timeline editor, 3) example animations, and 4) new motion capabilities. The first two tasks provide mechanisms for composing an animation. The most basic method is to create in a text editor a sequence of commands that perform the desired molecular motions, color highlighting, changes in viewpoint, titling, etc. Only about half of the ~40 needed commands currently exist in the Chimera command language. We will add the missing commands that provide functions currently only available through user interface dialogs. While a command script allows flexibility and simple editing it requires extensive knowledge of command syntax. We will implement a timeline editor that provides the power of command scripts using graphical dialogs, menus, buttons, and parameter entry fields. This interface for composing animations requires less knowledge to use since all the options are presented through controls on-screen. To allow researchers to quickly learn these animation tools we will provide ten example animations including command scripts, graphical timeline session files, molecular data files, and resulting movies. The examples will be available on the web and cover all of the available capabilities. These can be used as templates with one's own data, cutting, pasting and deleting segments as appropriate. The most innovative component of this project will add commands that provide new modes of illustrating the function of molecular systems, two examples being "rigging" and exploded views. Rigging involves defining allowed hinge and glide motions for pieces of a molecular assembly to turn a rigid model into an articulated one. For example, a ribosome model would allow a ratchet rotation between large and small subunits and allow tRNA molecules to advance from A (aminoacyl) to the P (peptidyl) to the E (exit) binding sites. Molecular assemblies are usually densely packed and an "exploded view" mode would move the constituent molecules away from one another in a minimal way so internal components are visible.
While animations are effective for communicating research via talks and publications, physical models can be valuable in one-on-one and small group discussions of the architecture and mechanisms of molecular machinery. For example, a hand-size plastic model of 3 to 10 colored pieces joined by small magnets can be manipulated to explore the variety of possible assemblies of the 26S proteasome, a machine that degrades unneeded proteins and whose architecture is currently only partially understood. Physical models can be a more powerful communication device than computer graphics for many subtle reasons: use of hands to point and illustrate motions, easy exchange of who holds the model, facile rearrangement of pieces, and the ability of participants to face each other. We propose to purchase a 3-dimensional printer that prints plastic models and develop techniques to design multi-piece magnet-assembled models in our Chimera software. Initial tests have been done with virus and nuclear pore models using an existing 3-d printer on our campus. Print times of 10 hours or more and contention with other labs make a dedicated printer for molecular physical models important to furthering this work.
![[Chimera Issue Tracking System]](/trac/chimera/chrome/site/chimera_logo.png)