Context Navigation

← Previous Ticket
Next Ticket →

#284 closed defect (fixed)

handle mmCIF with missing sequence information

Reported by:	Greg Couch	Owned by:	Greg Couch
Priority:	blocker	Milestone:	Alpha Release
Component:	Input/Output	Version:
Keywords:		Cc:	Tom Goddard, olibclarke@…
Blocked By:		Blocking:
Notify when closed:		Platform:	all
Project:	ChimeraX

Description

mmCIF file in #278 is missing the entity_poly_seq table, so the code to look up residue ranges for the secondary structure fails.

The workaround is to construct the equivalent (but incomplete) information from the atom_site table.

Attachments (7)

copy_1.cif (3.4 MB ) - added by Tom Goddard 10 years ago.: mmCIF written by Phenix, from Oliver Clarke
1a0m_trimmed.cif (26.5 KB ) - added by Tom Goddard 10 years ago.: mmCIF from PDB with only _entity_poly_seq and _atom_site tables
1a0m_atoms.cif (26.2 KB ) - added by Greg Couch 10 years ago.: 1a0m with just atoms
copy_1_ccp4.cif (3.2 MB ) - added by Tom Goddard 10 years ago.: mmCIF written by CCP4 shows several long bonds in ChimeraX.
1bl8.pdb (266.1 KB ) - added by Tom Goddard 10 years ago.
1bl8_refine_001.cif (391.2 KB ) - added by Tom Goddard 10 years ago.: 1bl8.pdb refined with phenix.refine with cif output, no sequence table
mmtbx.cif (390.8 KB ) - added by Tom Goddard 10 years ago.: Ran mmtbx.prepare_pdb_deposition on 1bl8_refine_001.cif to add sequence table.

Change History (21)

comment:1 by Tom Goddard, 10 years ago

Greg says this also prevents forming inter-residue bonds between amino acids.

comment:2 by Greg Couch, 10 years ago

Added code to try to reconstruct entity_poly_seq table by treating each chain as a separate entity. That improves the reading of this file, but does not fix everything. Another wrong aspect of the file is that there are literally four copies of chain B -- so four indistinguishable sets of helix and sheet data, four copies of each atom that are labelled the same, etc. Just a bad file.

comment:3 by Greg Couch, 10 years ago

Resolution:	→ wontfix
Status:	new → closed

So this bug report was good in that support for mmCIF files with an entity_poly_seq table was added, but we're not going add workarounds for all of the problems in the file.

comment:4 by goddard@…, 10 years ago

Could you summarize the problems you found in this mmCIF file?  I can look into getting the Coot developer to fix the issues if I know what they are.

comment:5 by Tom Goddard, 10 years ago

Cc:	olibclarke@… added
Resolution:	wontfix
Status:	closed → reopened

I've attached another mmCIF file copy_1.cif from Oliver Clarke, this one written by Phenix, that opens in ChimeraX with no inter-residue bonds. The file does not have an entity_poly_seq table. In fact all entity_id values in the _atom_site table are given as "?" (missing).

In the current ChimeraX testing a standard PDB mmCIF (1a0m) with and without the entity_poly_seq table showed that it does not connect the residues without the table. I've attached the example 1A0M.cif file with all tables deleted except _atom_site and _entity_poly_seq. This file opens correctly, but if the _entity_poly_seq table is deleted inter-residue connectivity is lost. The same result is obtained in Chimera 1 with its mmCIF reader, the inter-residue connectivity requires the _entity_poly_seq table.

The entity_poly_seq table is not required and we have examples that show Phenix and Coot do not write it. So it will probably be important to be able to connect polymers without having the table.

by Tom Goddard, 10 years ago

Attachment:	copy_1.cif added

mmCIF written by Phenix, from Oliver Clarke

by Tom Goddard, 10 years ago

Attachment:	1a0m_trimmed.cif added

mmCIF from PDB with only _entity_poly_seq and _atom_site tables

by Greg Couch, 10 years ago

Attachment:	1a0m_atoms.cif added

1a0m with just atoms

comment:6 by Greg Couch, 10 years ago

So 1a0m with just atoms works in todays daily build. I added code to fake an entity_poly_seq table if there is none, but if fails for the copy_1.cif file because a bug.

comment:7 by Greg Couch, 10 years ago

The problems with the mmCIF file from #278 stem from the fact that the atom site table is essentially PDB format ATOM/HETATM/TER records and that data isn't legal PDB data either.

The problems include atom names with spaces in them, TER records (only ATOM/HETATM allowed), duplicate atom_site.id's (should be unique), chains with different sequences using the same entity id, nonunique chain identifiers (which screws up the helix and sheet information with duplicate rows), and probably more errors.

Think of the mmCIF file a normalized database. The label_xxx fields are used as keys in the defining tables and are used for referencing in other tables. This can be confusing if you're used to PDB format files, for example, the atom_site.label_seq_id uniquely identifies the residue within an entity, the atom_site.pdbx_PDB_ins_code is actually paired with the atom_site.auth_seq_id, the auth_xxx fields are what the author named them as and might not be unique.

HTH,

Greg

comment:8 by Tom Goddard, 10 years ago

Oliver Clarke provided copy_1_ccp4.cif written by CCP4. It does not have the _entity_poly_seq table but the current ChimeraX is connecting the residues. But it produces about a dozen very long bonds where there are missing residues. These long bonds should be missing segment pseudobonds. The residue number jumps by 2 or more across these missing residues.

by Tom Goddard, 10 years ago

Attachment:	copy_1_ccp4.cif added

mmCIF written by CCP4 shows several long bonds in ChimeraX.

comment:9 by Tom Goddard, 10 years ago

I asked Oliver to try running the Phenix mmtbx.prepare_pdb_deposition program to produce an mmCIF file.

http://www.phenix-online.org/documentation/overviews/xray-structure-deposition.html

That program adds the entity_poly_seq table and asks the user to provide the sequence. I've attached the files Oliver used in this test, starting with 1bl8.pdb, then producing 1bl8_refine_001.cif using phenix.refine which has no sequence table, then running the deposition program to make mmtbx.cif.

Current ChimeraX can connect residues even with the 1bl8_refine_001.cif with no sequence table because of changes Greg made to handle this case. This structure has no missing segments which would produce wrong long bonds.

by Tom Goddard, 10 years ago

Attachment:	1bl8.pdb added

by Tom Goddard, 10 years ago

Attachment:	1bl8_refine_001.cif added

1bl8.pdb refined with phenix.refine with cif output, no sequence table

by Tom Goddard, 10 years ago

Attachment:	mmtbx.cif added

Ran mmtbx.prepare_pdb_deposition on 1bl8_refine_001.cif to add sequence table.

comment:10 by Greg Couch, 10 years ago

mmbtx.cif includes data sections with the residue templates after the data section with the structure. This means that ChimeraX will not know the template for a new ligand when it tries to connect the structure.

comment:11 by Scooter Morris, 9 years ago

Milestone:	→ Alpha Release

comment:12 by Greg Couch, 9 years ago

Resolution:	→ fixed
Status:	reopened → closed

All of the example cif files now open. Files without the entity_poly_seq table, now implicitly create connectivity between residues of the same type (peptide, nucleotide), and really long bonds are turned into missing structure bonds.

The trailing data blocks with residue templates are ignored. But if the templates are added to the first data block with a chem_comp table to list residue types and the chem_comp_bond table has all of the residues' connectivity, then those residue templates will be used -- this is the method that the European PDB uses in their "updated" mmCIF files. So it would be ideal if Phenix adopted this alternative way of giving the templates.

comment:13 by Greg Couch, 9 years ago

Component:	Input/Output → mmCIF Reader/Writer

comment:14 by Eric Pettersen, 7 years ago

Component:	mmCIF Reader/Writer → Input/Output

reducing # of categories, so lumping into I/O category

Note: See TracTickets for help on using tickets.

Download in other formats: