pdb: Read and write PDB files

Directly calling the routines below would typically be about your third choice for reading/writing PDB files. The first choice would be to execute the equivalent ChimeraX command directly with code like:

from chimerax.core.commands import run
opened_models = run(session, "open /path/to/file.pdb")

or:

from chimerax.core.commands import run
run(session, "save /path/to/file.pdb #1")

This approach has the advantage of not needing to find equivalent API calls to every command you want to execute, or figuring out what arguments are needed for those calls. This approach is discussed in the Pro Tip section near the top of the Developer Tutorial.

The second approach would be to use the open- or save-command managers. Those managers know which bundles provide support for opening/saving various file formats, and provide a generic interface for opening/saving files, e.g.:

models, status_message = session.open_command.open_data("/path/to/file.pdb", [other Python format-specific keywords])

or:

session.save_command.save_data("/path/to/file.pdb", models=[model1],  [other Python format-specific keywords])

This second approach has the disadvantage that the values for the keywords may not be obvious in some cases (i.e. you would have to look at the underlying API). Also, the models returned by open_data() have not been added to the session. Details like this are discussed in the Python Functions implementing User Commands documentation, under open and save.

And finally, the third approach would be to call the PDB-saving API directly. Something like:

from chimerax.pdb import open_pdb
models, status_message = open_pdb(session, "/path/to/file.pdb")

or:

from chimerax.pdb import save_pdb
save_pdb(session, "/path/to/file.pdb", models=[model1])

The only advantage of this third approach is in the rare case where you need to use an esoteric Python-only keyword argument that isn’t supported in the equivalent open/save command. For instance, save_pdb() has a polymeric_res_names argument for when you need to output residues in ATOM records that would otherwise be output as HETATM records (this capability is used by the modeller bundle).

pdb: PDB format support

Read Protein DataBank (PDB) files.

open_pdb(session, stream, file_name=None, *, auto_style=True, coordsets=False, atomic=True, max_models=None, log_info=True, combine_sym_atoms=True, segid_chains=False, slider=True, missing_coordsets='renumber')

Experimental API . Read PDB data from a file or stream and return a list of models and status information.

stream is either a string a string with a file system path to a PDB file, or an open input stream to PDB data.

file_name is the name to give to the resulting model(s). Typically only needed if the input is an anonymous stream or the input file name wouldn’t be a good model name.

auto_style is passed through to the Structure or AtomicStructure constructor.

coordsets controls whether a multi-MODEL PDB is opened as a list of structures or as a single structure with multiple coordinate sets.

atomic controls whether AtomicStructure or Structure is used as the class for the structure. The latter should be used for PDB files that don’t actually contain atomic data per se, like SAX “PDB” files or coarse-grain models.

max_models limits the number of models this routine can return.

log_info is passed through to the Structure or AtomicStructure constructor.

combine_sym_atoms controls whether otherwise identical atoms with no bonds that are also very close together in space should be combined into a single atom.

segid_chains controls whether the chain ID should come from the normal chain ID columns or from the “segment ID” columns.

slider controls whether a slider tool is shown when a multi-model PDB file is opened as a trajectory.

missing_coordsets is for the rare case where MODELs are being collated into a trajectory and the MODEL numbers are not consecutive. The possible values are ‘fill’ (fill in the missing with copies of the preceding coord set), ‘ignore’ (don’t fill in; use MODEL number as is for coordset ID), and ‘renumber’ (don’t fill in and use the next available coordset ID).

save_pdb(session, output, *, models=None, selected_only=False, displayed_only=False, all_coordsets=False, pqr=False, rel_model=None, serial_numbering='h36', polymeric_res_names=None)

Experimental API . Write PDB data to a file.

output is a file system path to a writable location. It can contain the strings “[NAME]” and/or “[ID]”, which will be replaced with the model’s name/id, respectively.

models is a list of models to output. If not specified, all structure models will be output.

selected_only controls whether only currently selected atoms should be output.

displayed_only controls whether only currently displayed atoms should be output.

all_coordsets controls whether or not, for a multi-coordset model, all coordsets should be written out (using MODEL records) or just the current coordinate set.

pqr controls whether ATOM/HETATM records will be written out in non-standard PQR format or not.

rel_model if given, is a model that the output coordinates should be written “relative to”, i.e. whatever then inverse of relmodel’s current transformation is, apply that to the atomic coordinates before outputting them.

serial_numbering controls how serial numbers are output when they would exceed PDB column limits. “h36” means to use Hybrid-36 numbering. “amber” means to steal a column from the “ATOM “ record and not correct them in other types of records (e.g. CONECT).

polymeric_res_names is a list of residue names that should be considered “standard” as far as the the output of ATOM vs. HETATM goes. If not specified, the residue names that the RCSB considers standard will be used.