wiki:mmCIFWriting

Version 5 (modified by Greg Couch, 8 years ago) ( diff )

--

Writing mmCIF Files

The ultimate goal is to write out mmCIF files with all of the information ChimeraX uses. That is not possible for several reasons:

  • Information needed to reconstruct some tables is discarded after being used (e.g., the sheet information for strands)
  • Whether or not the structure has been edited incompatibly is unknown, so associated metadata may or may not still be valid

Milestones

The ​ChimeraX Fast mmCIF Guidelines gives ChimeraX's requirements for reading.

v0
Add support for metadata that we extend when writing files (software, citation, citation_author)
v1
Add the entity and chem_comp tables. And the audit_conform table with our annotation that all keywords are in lowercase. Reuse entity descriptions from metadata if possible.
v2
The minimal set of tables needed to properly reconstruct the polymeric connectivity are the entity_poly_seq and atom_site tables. atom_site_anisotrop is easy, so do it also.
v3
Add secondary structure annotations. Just output helices and each strand in a separate sheet. Fix someday when sheet information becomes available.
v4
Add symmetry metadata if saved (pdbx_struct_assembly, pdbx_struct_assembly_gen, pdbx_struct_oper_list)
v5
Initial version of the struct_conn table, just disufide and inter-residue metal coordination bonds
v6
Add hydrogen bonds -- maybe limit to inter-strand h-bonds?
v7
Add bonds missing from residue templates
v8
Make sure intra-residue metal coordination bonds are handled correctly.
v9
Add support for writing the atom_site and atom_site_anisotrop tables with fixed width columns and add that annotation to the audit_conform table.
v10
Embed residue templates (chem_atom and chem_bond tables), like the ​PDBe does in its "updated" mmCIF files. Residues with metal coordination bonds in them have "ambiguous" connectivity.
v11
Review other metadata that is keep and try to incorporate it.

Secondary Structure Notes

The secondary structure annotations, specifically the sheet annotations, are difficult since ChimeraX does not keep the original sheet information nor does it compute it. Functionally, the currently needed secondary structure information can be computed with the internal DSSP code. But those annotations don't always match the author's. Ideally, the sheet information would both preserved and be computed, so the mmCIF file could contain that information. (The sheet information would eventually be useful for a fancy cartoon representation too.)

Other Metadata

ChimeraX keeps and uses other metadata that is only valid if the structure is not edited incompatibly. For example, adding or removing hydrogens would be okay, adding a missing residue should be okay, deleting some residues may or may not be okay. Not sure what to do yet, the safest thing to do, is to skip writing the metadata. An initial conservative approach would be to modify the code to mark when the structure has been edited for any reason, and only write the metadata if it hasn't been edited. If we could handle this issue, our original goal of writing what we read could be met.

Note: See TracWiki for help on using the wiki.