ChimeraX docs icon

Tool: Foldseek (Similar Structures)

The Foldseek tool (also called Similar Structures) searches the PDB or AlphaFold Database for structures similar to a protein chain already open in ChimeraX. The tool facilitates exploring up to hundreds of protein single-chain structures by efficiently showing them in 3D as backbone traces, potentially with ligands, and in 2D as sequence alignments or reduced-dimensionality scatter plots based on backbone conformation.

The tool can be started from the Structure Analysis section of the Tools menu and manipulated like other panels (more...). It is also implemented as the commands foldseek, similarstructures, and sequence search. See also: AlphaFold, ESMFold, Blast Protein

Search Setup
Similar Structures List
Options
Sequence Plot and Residue Attributes
Traces
Cluster Plot
Ligands
References

Search Setup

The query should be chosen from the pulldown menu of protein chains in structures already open in ChimeraX. Choices of database to search:

Choices of search method:

Clicking Search sends the input parameters and structure to the web service. When results are returned, a table of similar structures is shown in the tool window.

Similar Structures List

Searching with the Foldseek (Similar Structures) tool or the commands foldseek, sequence search, and similarstructures blast shows a table or list of hits in the tool window. Because this list is relatively large, the ChimeraX graphics and/or overall window may be resized; to avoid this, the tool can be undocked from the main window beforehand. See also the Tool windows start undocked setting in the Window preferences.

Columns in the Similar Structures table:

The % Close and % Cover values are only filled in automatically by the Foldseek search method, which uses and returns 3D coordinates. For the other search methods (which are based on sequence only), these columns can be filled in by using similarstructures fetchcoords to get α-carbon coordinates for the corresponding structures.

One or more hits can be chosen (highlighted) in the table by clicking and dragging with the left mouse button; Ctrl-click (or command-click if using a Mac) toggles whether a row is chosen.

Buttons across the bottom of the dialog:

The current contents of the tool are saved in ChimeraX sessions. In addition, search results are saved in ~/Downloads/ChimeraX under subdirectories Foldseek, MMseqs2, or BLAST, with filenames based on the query name, the database searched, and the search method, ending with the suffix .sms. These similar structures files use a JSON file format specific to ChimeraX and are listed in the File History for easy access. Simply opening an .sms file loads the set of results into the Similar Structures interface.

Doing another search or opening a file of previously saved results replaces the contents of the Similar Structures table, since (currently) the tool only allows showing one set of results at a time. Sets of results are assigned names such as fs1, fs2, mm1, mm2, bl1, and bl2 that can be used in analysis commands even if the corresponding results are not shown. However, the only way to get a set of results that is open but not shown in the table is to use the showTable false option of the search command. The names of currently open sets can be listed with the command similarstructures list.

Options

Clicking Options shows/hides the following settings:

Sequence Plot and Residue Attributes

Foldseek sequence plot

Clicking the Sequences button displays a high-level (without amino acid codes) plot of the sequence alignment of all of the hits to the query. The plot gives an overview of which parts of the query sequence are matched by the hits, and the depth of coverage.

Each row of the image is one sequence, so 200 hits would produce an image 200 pixels tall. The columns of the image correspond to the residues of the query structure. Initially, pixels in the plot are colored as follows:

Different coloring can be applied with the similarstructures sequences command.

Hovering the mouse over the sequence plot shows pop-up labels to indicate the underlying row (hit structure) and column (query residue number). Left- or right-clicking the plot raises a context menu, in which some entries reflect the row or column position of the click:

The query residue attributes coverage, conservation, and lddt are assigned when the sequence plot is shown. Different coloring schemes to show these attributes can be applied to the query structure with Render by Attribute or the command color byattribute.

Traces

Clicking the Traces button displays hit structures as “licorice” (spaghetti-like) ribbons superimposed on the query, for either all hits or just the chosen ones, as per the options. These traces are meant to give an overview of the variability of a large number of stuctures and their coverage of the query, and soft lighting is recommended to better reveal their shapes.

Only backbone α-carbons are included in the condensed structural information returned by a Foldseek search, not secondary structure information, so the ribbons do not vary in style to show helix and strand. MMseqs2 and Blast search results do not automatically include α-carbon coordinates, but clicking the Traces button will raise a dialog asking the user whether to fetch them, since it may take several minutes to do so.

All of the hit structure α-carbons are loaded as a single atomic model, one chain per structure, with chain ID set to the database ID of the structure. The residue types of the hit are retained, but the residues are renumbered according to the paired residues of the query structure.

The traces are initially displayed as follows:

  1. the ribbon is broken into segments where two consecutive aligned α-carbons are >5 Å apart
  2. ribbons are shown for ≥5 contiguous α-carbons within a segment and within 4 Å of the corresponding query α-carbons
  3. ribbons are shown for entire segments in which every α-carbon is within 10 Å of its counterpart

Different parameters can be applied with the similarstructures traces command.

Ctrl-double-clicking a trace shows a selection context menu for the corresponding hit, with entries including:

The trace ribbons can be shown/hidden or colored selectively with the menu above and the cluster plot context menu.

Cluster Plot

Foldseek clusters plot

Clicking the Clusters button displays a scatter plot of the hits clustered by backbone conformation, for either all hits or just the chosen ones, as per the options. Each structure is represented by a circle labeled with its name. Clicking the button generates the plot as follows:

  1. the five residues in the query most conserved in the sequence alignment of hits are identified and their α-carbons used as the reference atoms
  2. for each hit, the α-carbon (x,y,z) coordinates of the corresponding five residues are concatenated to give a vector of length 15; hit structures without a residue in any of the five alignment columns are omitted
  3. the vector is projected to a point in two dimensions with UMAP (Uniform Manifold Approximation and Projection)
  4. the points in 2D are clustered by distance, and the clusters are assigned random colors

Different parameters such as a different number of reference residues can be specified with the similarstructures cluster command.

The plot can be zoomed by scrolling and translated with the middle mouse button or trackpad equivalent. Clicking the plot raises a context menu. Menu items acting on traces will generate them as needed (if not already present) as described above, and those referring to a specific hit only appear when the click is on a circle:

Ligands

Clicking the Ligands button copies the ligands, ions, and solvent molecules (nonpolymer residues) from the hits onto corresponding locations on the query structure, for either all hits or just the chosen ones, as per the options. A dialog will appear to ask the user whether the structures should be fetched, since it may take several minutes to do so.

Each ligand (ion, solvent) residue is evaluated for mapping onto the query structure, as follows:

  1. protein residues within 5 Å of the ligand are identified
  2. if at least half of those nearby protein residues are paired with query residues, the α-carbons of those pairs are fitted
  3. if the resulting RMSD is ≤3 Å, the ligand is copied to corresponding position relative to the query structure

Different parameters can be applied with the similarstructures ligands command.

How many residues were copied and their residue types are reported in the Log. Often thousands of water molecules, and ions, and crystallization adjuvants are found, and they can be hidden to get a better view of more interesting ligands. For example, commands:

hide solvent
hide ions
hide :SO4

By default, the copied ligand, ion, and solvent residues are loaded as a single atomic model, in which the chain ID of a residue is generated from the PDB ID and chain ID of its source structure (e.g., 2cml_B). Pausing the cursor over a residue in the graphics window shows its name and chain ID in a pop-up balloon.

See also: AlphaFill

References

Foldseek. The Foldseek method is described in:

Fast and accurate protein structure search with Foldseek. van Kempen M, Kim SS, Tumescheit C, Mirdita M, Lee J, Gilchrist CLM, Söding J, Steinegger M. Nat Biotechnol. 2024 Feb;42(2):243-246.

Many-against-Many sequence searching (MMseqs2). The MMseqs2 method is described in:

MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Steinegger M, Söding J. Nat Biotechnol. 2017 Nov;35(11):1026-1028.

MMseqs2 desktop and local web server app for fast, interactive sequence searches. Mirdita M, Steinegger M, Söding J. Bioinformatics. 2019 Aug 15;35(16):2856-2858.
Local distance difference test (LDDT):
lDDT: a local superposition-free score for comparing protein structures and models using distance difference tests. Mariani V, Biasini M, Barbato A, Schwede T. Bioinformatics. 2013 Nov 1;29(21):2722-8.

UCSF Resource for Biocomputing, Visualization, and Informatics / November 2024