Command: alphafold

AlphaFold is an artificial intelligence method for predicting the atomic structures of biomolecules and their complexes. The method is described in:

Highly accurate protein structure prediction with AlphaFold. Jumper J, Evans R, Pritzel A, et al. Nature. 2021 Aug;596(7873):583-589.
Protein complex prediction with AlphaFold-Multimer. Evans R, O'Neill M, Pritzel A, et al. bioRxiv 2021.

The alphafold command:

finds and retrieves existing models from the AlphaFold Database:

AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Varadi M, Anyango S, Deshpande M, et al. Nucleic Acids Res. 2022 Jan 7;50(D1):D439-D444.

The database contains models for protein sequences in UniProt (single chains only, not complexes); see the version option.
runs new AlphaFold predictions on Google Colab using ColabFold, an open-source, optimized version of AlphaFold 2:
ColabFold: making protein folding accessible to all. Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S, Steinegger M. Nat Methods. 2022 Jun;19(6):679-682.
plots PAE values for AlphaFold structures and shows them with colored pseudobonds

AlphaFold-predicted structures vary in confidence levels (see coloring) and should be interpreted with caution. The alphafold command is also implemented as the tools AlphaFold and AlphaFold Error Plot. Several ChimeraX presentations and videos show modeling with AlphaFold and related analyses. See also: boltz, esmfold, blastprotein, modeller, swapaa, batch prediction commands

[back to top: alphafold]

Getting Models from the AlphaFold Database

Usage: alphafold fetch uniprot-id [ alignTo  chain-spec [ trim  true | false ]] [ colorConfidence  true | false ] [ ignoreCache  true | false ] [ pae  true | false ] [ version  1 | 2 | 3 | 4 ]
Usage: alphafold match sequence [ search  true | false ] [ trim  true | false ] [ colorConfidence  true | false ] [ ignoreCache  true | false ] [ pae  true | false ]
Usage: alphafold search sequence [ matrix  similarity-matrix ] [ cutoff  evalue ] [ maxSequences  M ] [ version  1 | 2 | 3 | 4 ]

The alphafold fetch command retrieves the model (if available) for a specific UniProt name or accession number. It is equivalent to using the open command to fetch from alphafold. Examples:
alphafold fetch p29474
– OR – (equivalent)
open p29474 from alphafold
The alphafold match command retrieves models for sequences the same as or similar to those of experimentally determined protein structures already open in ChimeraX, or other sequences independent of structure. Giving the model number of an atomic structure already open in ChimeraX specifies all of its protein chains. Examples with sequence given as a chain-spec:
alphafold match #1
alphafold match #3/B,D trim false
Alternatively, the sequence can be given as any of the following:
- the sequence-spec of a sequence in the Sequence Viewer, in the form: alignment-ID:sequence-ID (details...)
- a UniProt name or accession number
- plain text pasted directly into the command line
For a specified structure chain, a model is obtained for its exact UniProt entry if available, otherwise the single top hit identified by K-mer search of the AlphaFold Database (details...). For each model with a corresponding structure chain from the alphafold match command or the alignTo option of alphafold fetch:
1. the chain ID of the predicted structure is made the same as the corresponding chain of the existing model
2. the predicted structure is superimposed onto the existing chain using matchmaker, and the following are reported in a table in the Log:
  - Chain – chain ID
  - UniProt Name and UniProt Id (accession number)
  - RMSD – Cα root-mean-square deviation between the predicted and experimental structures, over all residues of the latter
  - Length – number of residues in the predicted structure
  - Seen – number of residues with atomic coordinates in the experimental structure
  - % Id – percent identity in the sequence alignment generated by matchmaker for superposition; the number of positions with identical residues divided by the length of the shorter sequence
3. the following attributes are assigned to the residues of the predicted structure:
  - c_alpha_distance – Cα distance between corresponding positions of the predicted and existing chains after their superposition (step 2 above)
  - missing_structure – positions missing from the coordinates of the existing chain
  - same_sequence – positions with the same residue type as the existing chain
  These attributes can be used for coloring and other purposes.
The alphafold search command uses a BLAST web service hosted by the UCSF RBVI to search the AlphaFold Database. It differs from alphafold match in that it uses BLAST instead of fast (but low-sensitivity) K-mer searching, accepts only a single chain or sequence as input, and returns a list of hits for the user to inspect, rather than fetching the single top hit per chain automatically. The query sequence can be given as any of the following:
- a chain-spec corresponding to a single chain in an atomic structure open in ChimeraX
- the sequence-spec of a sequence in the Sequence Viewer
- a UniProt name or accession number
- plain text pasted directly into the command line
The matrix option indicates which amino acid similarity-matrix to use for scoring the hits (uppercase or lowercase can be used): BLOSUM45, BLOSUM50, BLOSUM62 (default), BLOSUM80, BLOSUM90, PAM30, PAM70, PAM250, or IDENTITY. The cutoff evalue is the maximum or least significant expectation value needed to qualify as a hit (default 1e-3). Results can also be limited with the maxSequences option (default 100); this is the maximum number of unique sequences to return.
When results are returned, the hits are listed in a Blast Protein window. Double-clicking a hit uses alphafold fetch to retrieve the model, or multiple chosen hits can be retrieved at once by using the results panel context menu or Load Structures button (details...).

[back to top: alphafold]

Options

alignTo chain-spec
Superimpose the predicted structure from alphafold fetch onto a single chain in an already-open structure, and make its chain ID the same as that chain's. See also the trim option.

colorConfidence true | false
Whether to color the predicted structures by the pLDDT confidence measure in the B-factor field (default true):

100

to 90

– high accuracy expected
90

to 70

– backbone expected to be modeled well
70

to 50

– low confidence, caution
50

to 0

– should not be interpreted, may be disordered

...in other words, using

color bfactor palette alphafold

The Color Key graphical interface or a command can be used to draw a corresponding color key, for example:

key red:low orange: yellow: cornflowerblue: blue:high [other-key-options]

directory results-folder
For alphafold predict, specify a results-folder pathname (name and location) or the word browse to specify it interactively in a file browser window. The directory or folder does not need to exist already, as it will be created by the command. The pathname can include [N] to indicate substitution with the smallest positive integer that makes a new directory (default ~/Downloads/ChimeraX/AlphaFold/prediction_[N]). If the specified pathname does not include [N] but a directory of that name and location already exists and contains a results.zip file, _[N] will be appended automatically to avoid overwriting the existing directory.

ignoreCache true | false
The fetched data files (models and their PAE values) are stored locally in ~/Downloads/ChimeraX/AlphaFold/, where ~ indicates a user's home directory. If a file specified for opening is not found in this local cache or ignoreCache is set to true, the file will be fetched and cached.

search true | false
When fetching models with alphafold match, whether to search the database for the most similar sequence if the UniProt accession number for a chain is not provided in the experimental structure's input file, or is provided but not found in the AlphaFold Database (true, default). A fast but low-sensitivity (requiring high % identity) K-mer search of the the database is performed. The closest sequence match for which a model is available will be retrieved. With search false, only the experimental structure's input file will be used as a potential source of UniProt accession numbers. When present, these are given in DBREF records in PDB format and in struct_ref and struct_ref_seq tables in mmCIF.

minimize true | false
This option allows skipping energy-minimization of the result from alphafold predict, for faster job completion and/or to avoid failures that may occur during minimization.

templates true | false
Whether a calculation run with alphafold predict should use known structures from the PDB as templates. AlphaFold can use up to four structures as templates; when this option is true, ColabFold will search the PDB sequences for similarity to the target and report in the Colab log which entries (if any) are used as templates.

trim true | false
Whether to trim a predicted protein structure to the same residue range as the corresponding experimental structure given with the alphafold match command or the alignTo option of alphafold fetch. With trim true (default):

Predictions with UniProt identifier determined by alphafold match from the experimental structure's input file will be trimmed to the same residue ranges as used in the experiment. These ranges are given in DBREF records in PDB format and in struct_ref and struct_ref_seq tables in mmCIF.
Predictions retrieved with alphafold fetch or found by alphafold match searching for similar sequences in the AlphaFold Database will be trimmed to start and end with the first and last aligned positions in the sequence alignment calculated by matchmaker as part of the superposition step.

Using trim false indicates retaining the full-length models for the UniProt sequences, which could be longer.

version 1 | 2 | 3 | 4
The AlphaFold Database contains single-chain models for protein sequences in UniProt. This option specifies which version of the database to use with alphafold fetch, alphafold pae, or alphafold search (as well as blastprotein with database alphafold):

version 1 (Jul 2021): ~360,000 sequences, reference proteomes of 21 species including Homo sapiens; used by Chimera 1.3, which does not have a version option
version 2 (Dec 2021 and Jan 2022 releases combined): ~1 million sequences, v1 + most of the manually curated entries (SwissProt) + sequences relevant to neglected tropical disease or antimicrobial resistance; default in ChimeraX 1.4
version 3 (Jul 2022): >200 million sequences
version 4 (Nov 2022): bugfix of version 3, updating the coordinates of ~4% of the entries; default in ChimeraX 1.5 and later

The alphafold match command always uses version 4 and does not have this option.

[back to top: alphafold]

Running an AlphaFold Prediction

The alphafold predict command runs a calculation on Google Colab using ColabFold, an open-source, optimized version of AlphaFold 2. Users should cite:

ColabFold: making protein folding accessible to all. Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S, Steinegger M. Nat Methods. 2022 Jun;19(6):679-682.

For monomer prediction:

Usage: alphafold predict sequence [ minimize true | false ] [ templates true | false ] [ directory results-folder ]

The protein sequence to predict can be given as any of the following:

a chain-spec corresponding to a single chain in an atomic structure open in ChimeraX
the sequence-spec of a sequence in the Sequence Viewer, in the form: alignment-ID:sequence-ID (details...)
a UniProt name or accession number
plain text pasted directly into the command line

These methods specify the entire sequence only, although a truncated version could be pasted.

For multimer (protein complex) prediction:

Usage: alphafold predict chain-spec [ minimize true | false ] [ templates true | false ] [ directory results-folder ]
– or –
Usage: alphafold predict sequence1,sequence2[,sequence3...,sequenceN] [ minimize true | false ] [ templates true | false ] [ directory results-folder ]

The sequences of two or more protein chains can be specified either collectively as a chain-spec (for atomic-structure chains already open in ChimeraX), or individually within a comma-separated list using any combination of specifier types #2-4 listed above. The comma-separated list should not contain any spaces. If the same protein chain occurs multiple times in the complex, its sequence should be repeated that number of times. For example, to predict a homodimer, the same sequence (or its specifier) would need to be given twice. Prediction may only be feasible for smaller complexes (details...).

A warning will appear saying that this Colab notebook is from github (was not authored by Google), with a button to click to run anyway. Users will need to have a Google account and to sign into it via a browser. Once that is done, the sign-in may be remembered depending on the user's browser settings; it is not kept in the ChimeraX preferences. See the example video for an explanation of the images/plots from ColabFold that appear in the Colab log and where to find downloaded files.

The model will be opened automatically and colored by confidence value. The model for a sequence that was specified by structure chain will be superimposed on that chain and assigned structure-comparison attributes for further analysis (details...).

Caveats

Results may be lost if the local computer goes to sleep. Google intends Colab to be for interactive use. Even if the Colab job completes, the results may fail to download to the local computer if it has gone to sleep. It is recommended to turn off the option to enter sleep mode (meant to conserve power after some amount of idle time) before running a prediction.
The process includes installing various software packages on a virtual machine, searching sequence databases, generating a multiple sequence alignment, predicting atomic coordinates, and optionally, energy-minimizing the best structure. In addition, predicting a multimer (complex) structure may take longer than predicting the structure of a monomer with the same total number of residues. The free version of Colab limits jobs to 12 hours and may terminate them at shorter times at Google's discretion (see the FAQ). Those who want to run longer and/or more frequent calculations may need to sign up for one of the paid Colab plans.
Each chain must contain at least 16 residues. Shorter sequences are not accepted because they cannot be used to generate a reliable multiple sequence alignment.
Total sequence length cannot be very large. AlphaFold runs out of graphics memory for long sequences (~1200 amino acids on old Google Colab GPUs with 16 GB memory). Multimer predictions face the same limit on the total number of residues, so only smaller complexes can be predicted. As mentioned above, paid Colab plans provide more computational resources than the free plan. Structures with up to 3000 amino acids can be predicted using an Nvidia A100 GPU on Google Colab, costing about $1.50 for a 2000-residue prediction (May 2023); this video explains how.

[back to top: alphafold]

AlphaFold Predicted Aligned Error (PAE)

Besides the per-residue pLDDT confidence measure, AlphaFold gives for each pair of structural entities (X,Y) the expected position error at entity X if the predicted and true structures were aligned on Y. Structural entities include standard biopolymer residues as well as the individual atoms of other types of residues: ligands, ions, glycans, and post-translationally modified residues. Only AlphaFold 3 (not earlier versions) generates predictions that include these other types of residues. The “predicted aligned error” or PAE values can be shown as a 2D plot using the command alphafold pae (details below), the AlphaFold Error Plot tool, alphafold fetch or alphafold match with the option pae true, or the open command. PAE and other pairwise metrics associated with ModelArchive entries can also be plotted (see modelcif pae). See also the AlphaFold Error Estimates example and video.

Usage: alphafold pae [ model-spec ] [ uniprotId uniprot | file filename ] [ ignoreCache true | false ] [ palette palette ] [ range low,high | full ] [ plot true | false ] [ colorDomains true | false ] [ minSize M ] [ connectMaxPae N ] [ cluster resolution ] [ dividerLines true | false ] [ version 1 | 2 | 3 | 4 ]

With alphafold pae, the matrix of PAE values can be:

fetched from the AlphaFold Database with the uniprotId option, where uniprot is the the UniProt name or accession number of an entry in the database
read from a json, pkl, npy (from Chai-1), or npz (from Boltz-1) file specified with the file option, where filename is generally a pathname to a local file, either absolute or relative to the current working directory as reported by pwd. Substituting the word browse for filename brings up a file browser window for choosing the name and location interactively.

Alternatively, if the model structure was opened using alphafold fetch or alphafold match with the option pae true, the PAE matrix is aleady present and neither of the above is required.

The corresponding AlphaFold structure (already open) can be given as a model-spec to associate it with the plot. This association allows coloring by domain as described below, and for selections on the plot to highlight the corresponding parts of the structure.

By default, the PAE plot is drawn when domain coloring is not done (plot is default true when colorDomains is false) and vice versa.

Setting colorDomains to true clusters the entities into coherent domains (sets with relatively low pairwise PAE values) and uses randomly chosen colors to distinguish these domains in the structure. The entities are assigned an integer domain identifier (starting with 1) as an attribute named pae_domain that can be used to specify them in commands (for example, to recolor or select specific domains). Entities not grouped into any domain are assigned a pae_domain value of None. The clustering uses the NetworkX greedy_modularity_communities algorithm with parameters:

minSize (default 10) – minimum number of entities allowed in a domain
connectMaxPae (default 5.0 Å) – the maximum PAE value allowed between entities for them to be clustered into the same domain. Larger values give larger domains and generally increase the time to compute the clustering, which is ~5 seconds for 1000 residues when the default of 5.0 is used.
cluster (default 0.5, typical range 0.5–5.0) – graph resolution; larger values give smaller domains

The dividerLines option (default true) indicates whether, for multimer predictions, to draw lines on the plot demarcating the end of one chain and the start of another. The lines may obscure a few chain-terminal residues in the plot, and dividerLines false can be used if this is problematic. For predictions that include nonstandard residues and/or covalent modifications, divider lines also segregate the entire set of such entities from the biopolymer chain(s).

The default palette for coloring the PAE plot is pae, with colors assigned to values as follows:

Another palette with value range suitable for PAE plots is paegreen:

Although these palettes include value-color pairs, it may be helpful to give a value range if a colors-only palette is used instead (default 0,30). A range can also be used to override the values in a value-color palette, instead spacing the colors evenly across the specified range.

The plot window has a context menu with options for recoloring the plot and associated structure, hiding/showing the divider lines, and saving the plot as an image, as well as buttons for recoloring the structure:

Color PAE Domains applies coloring by PAE cluster as described above.
Color pLDDT returns the structure to the default confidence coloring.

The Color Key graphical interface or a command can be used to draw (in the main graphics window) a color key for the PAE plot. For example, to make a color key that matches the pae or paegreen scheme, respectively:

key pae :0 : : :15 : : :30 showTool true
key paegreen :0 : : :15 : : :30 showTool true

A title for the color key (e.g., “Predicted Aligned Error (Å)”) would need to be created separately with 2dlabels.

[back to top: alphafold]

Pseudobonds Colored by PAE

AlphaFold “predicted aligned error” (PAE) values give for each pair of structural entities (X,Y) the expected position error at entity X if the predicted and true structures were aligned on Y. Structural entities include standard biopolymer residues as well as the individual atoms of other types of residues: ligands, ions, glycans, and post-translationally modified residues. Only AlphaFold 3 (not earlier versions) generates predictions that include these other types of residues. PAE values can be shown with colored pseudobonds in the predicted structure:

Usage: alphafold contacts atom-spec1 [ toAtoms atom-spec2 [ flip true | false ] [ distance d ] [ maxPae max-error ] [ palette palette ] [ range low,high | full ] [ radius r ] [ dashes N ] [ name model-name ] [ replace true | false ] [ outputFile pae-file ]

See the AlphaFold Contacts example and video. See also: size, style, contacts, crosslinks, rename

A PAE plot containing the specified parts must already be shown. The PAE matrix is not symmetrical. The first specification atom-spec1 gives the aligned entities, whereas toAtoms atom-spec2 gives those whose error values are reported, except that using flip true swaps the meaning of atom-spec1 and atom-spec2. If one set of entities is higher-confidence (higher in pLDDT, lower in PAE) than the other, it is usually best to specify them as the aligned set so that the coloring will show the error values of the lower-confidence set.

Omitting the toAtoms option defines atom-spec2 as everything covered by the PAE plot except for what is included in atom-spec1; however, if toAtoms is omitted and atom-spec1 includes everything in the plot, atom-spec2 will also be defined as everything in the plot.

The distance option limits the number of pseudobonds by only drawing them between pairs of entities with any distance ≤ d Å (default 3.0). For standard (biopolymer) residues, these pseudobonds are drawn to the principal atom (CA or P) regardless of which atoms were within the distance cutoff. The maxPae option can be used to further limit the pseudobonds to only those representing PAE values ≤ max-error (no default value; if the option is omitted, there is no restriction by PAE).

The default palette for coloring the pseudobonds by PAE value is paecontacts, with colors assigned to values as follows:

Although this palette includes value-color pairs, it may be helpful to give a value range if a colors-only palette is used instead (default 0,30). A range can also be used to override the values in a value-color palette, instead spacing the colors evenly across the specified range.

The pseudobond stick radius (default 0.2 Å) and number of dashes (default 1, meaning a solid stick) can also be specified.

The name option allows specifying the pseudobond model-name (default PAE Contacts). If a model by that name already exists, any pre-existing pseudobonds will be removed from that model and replaced by the new ones (replace true, default) unless replace false is used.

The outputFile option allows saving a list of the entity pairs (those meeting the distance criterion) and their PAE values to a plain text file. The pae-file argument is the output file pathname, enclosed in quotation marks if it includes spaces, or the word browse to specify it interactively in a file browser window.

Examples:

alphafold contacts #1
alphafold contacts /A to /B distance 8
alphafold contacts sel palette blue:red range 1,5

The following would select all pseudobonds and label them with the residue names and numbers of the entities that they connect:

sel pbonds
label sel pseudobonds text "{0.atoms[0].residue.name} {0.atoms[0].residue.number} to {0.atoms[1].residue.name} {0.atoms[1].residue.number}"

[back to top: alphafold]

Batch Prediction Commands

The following commands facilitate batch predictions by advanced users who have access to running ColabFold directly (outside of ChimeraX) from the Linux shell. ColabFold [github] [citation] runs on Linux computers with Nvidia graphics. See also: alphafold predict, predicting dimers with AlphaFold

alphafold monomers – estimate ColabFold runtime for monomer predictions given a set of sequences (does not run the predictions) or display the results of previous predictions
alphafold dimers – estimate ColabFold runtime for dimer predictions given a set of sequences (does not run the predictions)
alphafold interfaces – examine the PAE scores of previous dimer predictions to determine which pairs are actually predicted to bind

• alphafold monomers ( sequence-list | open results-folder ) [ maxLength L ] [ recycles N ] [ models NN-list ] [ outputFasta filename ] [ outputJson filename ] [ outputYaml true | false ] [ msaDirectory directory ] [ listSequences true | false ]

Given a sequence-list, alphafold monomers estimates the time it would take ColabFold to run monomer predictions for the sequences on an Nvidia 3090 GPU. It also reports in the Log the colabfold_batch command that would be used to run the predictions from the Linux shell, but it does not actually run the predictions. Alternatively, if the monomer predictions have been run already, alphafold monomers can be used with the open option to display the results in ChimeraX.
The sequence-list can be given as either of the following:

a file in FASTA format, specified by pathname or the word browse to specify it interactively in a file browser window
a chain-spec for one or more chains in atomic structures open in ChimeraX
Only protein sequences are considered, and duplicates are eliminated. The maxLength option excludes sequences with more than L residues (no default, i.e. unlimited length). The recycles option controls the number of iterations for each prediction (default 3), with smaller values giving shorter run times. The models option controls which of the five neural networks (called “models” in AlphaFold) to use, where NN-list is a comma-separated list of integers in the range 1-5 (default 1,2,3,4,5). Normally all five are used, producing five predicted structures for each sequence, but using fewer neural networks will decrease the run time. The sequences for which predictions would be run (filtered by uniqueness and maximum length, if any) can be written to a new FASTA file using the outputFasta option as a convenience for running Colabfold, and/or to a JSON file using the outputJson option as a convenience for running AlphaFold 3 (see AlphaFold 3 JSON specification). Substituting the word browse for filename brings up a file browser window for choosing the name and location interactively.
The outputYaml option (default false) specifies whether to write yaml input files for Boltz. If true, a separate yaml file is created for each monomer with name based on the monomer sequence name. If msaDirectory is specified, the yaml output will specify a .csv multiple sequence alignment (MSA) file for each monomer in that directory. The directory and MSA files need not exist and can be created later by MSA computation software before the Boltz predictions are run. If msaDirectory is not specified, no MSA files are specified in the Boltz yaml input.
The listSequences option (default false) specifies whether to log descriptions of all of the monomers.
Alternatively, if the monomer predictions have been run already, using the open option to give a results-folder (specified by pathname or the word browse to specify it interactively in a file browser window) opens the resulting structures and displays them in a convenient fashion: tiled, labeled, and colored by pLDDT.

• alphafold dimers sequence-list [ maxSpacing K ] [ withSequences sequence-list2 ] [ homodimers true | false ] [ maxLength L ] [ recycles N ] [ models NN-list ] [ outputFasta filename ] [ outputJson filename ] [ outputYaml true | false ] [ msaDirectory directory ] [ listSequences true | false ]

Given a sequence-list, alphafold dimers estimates the time it would take ColabFold to run dimer predictions on an Nvidia 3090 GPU for all pairs of the sequences. The command also reports in the Log the colabfold_batch command that would be used to run the predictions from the Linux shell, but it does not actually run the predictions. The maxSpacing option can be used to restrict the pairs to those no more than K sequences apart in the input list (for example, K = 1 indicates that only sequences immediately adjacent in the list can be paired). Typically this is used with a list that is in genome order, to pair only sequences close together in the genome; default is no restriction on spacing. If withSequences is used to give a second list of sequences, only pairs with one sequence in the first list and the other in the second list will be considered (cannot be used together with maxSpacing).
Each sequence list can be given as either of the following:

a file in FASTA format, specified by pathname or the word browse to specify it interactively in a file browser window
a chain-spec for one or more chains in atomic structures open in ChimeraX
Only protein sequences are considered, and duplicates are eliminated, except that the homodimers option indicates whether to predict structures for pairs of chains with identical sequences (default true). The sequence pairs for which predictions would be run can be written to a new FASTA file using the outputFasta option (in the arrangement expected by ColabFold: a separate entry for each dimer, in which the two sequences are separated by a colon), and/or to a JSON file using the outputJson option as a convenience for running AlphaFold 3 (see AlphaFold 3 JSON specification). Substituting the word browse for filename brings up a file browser window for choosing the name and location interactively.
The outputYaml option (default false) specifies whether to write yaml input files for Boltz. If true, a separate yaml file is created for each dimer with name based on the dimer sequence names. If msaDirectory is specified, the yaml output will specify multiple sequence alignment (MSA) files in that directory with file names of the form {seq1-name}_{seq2-name}_0.csv and {seq1-name}_{seq2-name}_1.csv. The directory and MSA files need not exist and can be created later by MSA computation software before the Boltz predictions are run. If msaDirectory is not specified, no MSA files are specified in the Boltz yaml input.
The maxLength, recycles, and models options are the same as described for alphafold monomers, except that maxLength applies to the total combined length of the two sequences. The listSequences option (default false) specifies whether to log descriptions of all of the dimers.

• alphafold interfaces dimer-results-folder [ distance d ] [ maxPae max-error ] [ minConfPairs N ] [ open true | false ] [ resultsFile csv-output-file ] [ shortNames true | false ]

alphafold interfaces only works for biopolymer chain dimer predictions, primarily protein-protein; it will not work on structures that include nonstandard residues such as ligands or covalent modifications such as glycosylations because those have per-atom PAE values. We may fix this limitation in the future if Google allows AlphaFold 3 to be used for large numbers of predictions.

When AlphaFold (ColabFold) is given two sequences to predict as a complex, it will always place the two chains near each other even if there are no favorable binding interactions. The AlphaFold predicted aligned error (PAE) score indicates confidence in the predicted residue-residue interactions, and thus whether or not the two chains are really predicted to bind one another.
The alphafold interfaces command considers all PDB files (.pdb suffix) in the dimer-results-folder directory, which can be specified by its pathname or the word browse to specify it interactively in a file browser window. Each PDB file should contain a ColabFold dimer prediction and should be named according to the ColabFold convention, which includes the sequence names and the rank and network used for the prediction. The directory will also include a PAE file (suffix .json) for each PDB file. For each predicted dimer:

the PDB file is opened and pairs of residues across the chain-chain interface with any atoms ≤ distance d apart (default 4.0 Å) are considered
of these, residue pairs with PAE ≤ maxPae max-error (default 5.0 Å) are counted. Both of the PAE values for a given pair of residues are considered (the PAE matrix is not symmetrical, meaning that res1-res2 and res2-res1 values are different), and the lower one is compared to the maxPae setting.
if at least minConfPairs N (default 10) of the interface residue pairs meet both the distance and PAE criteria, the dimer is included in the list of possible binders
the structure is closed
Processing all of the dimer predictions can take a few minutes, and progress is reported in the ChimeraX status line. After the calculation completes, a table listing the possible binders is written to the Log, including the sequence names, number of high-confidence residue pairs, and how many residues in each chain are part of a high-confidence pair. The open option allows opening the highest-ranked model structure for each possible-binder prediction (default false).
Results are also automatically written to a CSV file (filename specified with the resultsFile option, default interfaces.csv). Contents include the numbers of high-confidence residue pairs found for each PDB file and their residue numbers. If the alphafold interfaces command is run a second time, it will look for this file and use it instead of reevaluating the predictions. If it is desired to rerun the evaluation with different criteria, the interface.csv file should be deleted or renamed beforehand.
By default, ColabFold generates five structures for each pair of sequences. The table has one row for each pair of sequences for which at least one “possible binder” structure was found. The Models column lists how many of the five model structures met the binding criteria. In each row of the table, the sequence names are a link to open the highest-ranked model structure for that sequence pair. The Open best link below the table opens the highest-ranked structure for every listed sequence pair (same as the open command option). The links below the table to Hide and Show disordered loops allow hiding or showing the ribbon for residues with pLDDT ≤ 50.
The sequence names are parsed from the PDB filenames, in which they are separated by a “.” character and terminated by “_unrelaxed.” The sequence names may be long if the ColabFold input used long names. The names can be shortened in the possible-binders table using the shortNames option (default true). That option considers words in the sequence names separated by underscore characters (“_”) and looks for a unique word in each sequence that it will use as the short name. If no unique word is found, it will use the full name. It is very helpful to make the FASTA file descriptions in the input to ColabFold as concise as possible, since they will be used as the sequence names in the output. Shorter names make it easier to navigate the hundreds of files that ColabFold produces.

UCSF Resource for Biocomputing, Visualization, and Informatics / September 2025