#284 closed defect (fixed)
handle mmCIF with missing sequence information
Reported by: | Greg Couch | Owned by: | Greg Couch |
---|---|---|---|
Priority: | blocker | Milestone: | Alpha Release |
Component: | Input/Output | Version: | |
Keywords: | Cc: | Tom Goddard, olibclarke@… | |
Blocked By: | Blocking: | ||
Notify when closed: | Platform: | all | |
Project: | ChimeraX |
Description
mmCIF file in #278 is missing the entity_poly_seq table, so the code to look up residue ranges for the secondary structure fails.
The workaround is to construct the equivalent (but incomplete) information from the atom_site table.
Attachments (7)
Change History (21)
comment:1 by , 10 years ago
comment:2 by , 10 years ago
Added code to try to reconstruct entity_poly_seq table by treating each chain as a separate entity. That improves the reading of this file, but does not fix everything. Another wrong aspect of the file is that there are literally four copies of chain B -- so four indistinguishable sets of helix and sheet data, four copies of each atom that are labelled the same, etc. Just a bad file.
comment:3 by , 10 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
So this bug report was good in that support for mmCIF files with an entity_poly_seq table was added, but we're not going add workarounds for all of the problems in the file.
comment:4 by , 10 years ago
Could you summarize the problems you found in this mmCIF file? I can look into getting the Coot developer to fix the issues if I know what they are.
comment:5 by , 10 years ago
Cc: | added |
---|---|
Resolution: | wontfix |
Status: | closed → reopened |
I've attached another mmCIF file copy_1.cif from Oliver Clarke, this one written by Phenix, that opens in ChimeraX with no inter-residue bonds. The file does not have an entity_poly_seq table. In fact all entity_id values in the _atom_site table are given as "?" (missing).
In the current ChimeraX testing a standard PDB mmCIF (1a0m) with and without the entity_poly_seq table showed that it does not connect the residues without the table. I've attached the example 1A0M.cif file with all tables deleted except _atom_site and _entity_poly_seq. This file opens correctly, but if the _entity_poly_seq table is deleted inter-residue connectivity is lost. The same result is obtained in Chimera 1 with its mmCIF reader, the inter-residue connectivity requires the _entity_poly_seq table.
The entity_poly_seq table is not required and we have examples that show Phenix and Coot do not write it. So it will probably be important to be able to connect polymers without having the table.
by , 10 years ago
Attachment: | 1a0m_trimmed.cif added |
---|
mmCIF from PDB with only _entity_poly_seq and _atom_site tables
comment:6 by , 10 years ago
So 1a0m with just atoms works in todays daily build. I added code to fake an entity_poly_seq table if there is none, but if fails for the copy_1.cif file because a bug.
comment:7 by , 10 years ago
The problems with the mmCIF file from #278 stem from the fact that the atom site table is essentially PDB format ATOM/HETATM/TER records and that data isn't legal PDB data either.
The problems include atom names with spaces in them, TER records (only ATOM/HETATM allowed), duplicate atom_site.id's (should be unique), chains with different sequences using the same entity id, nonunique chain identifiers (which screws up the helix and sheet information with duplicate rows), and probably more errors.
Think of the mmCIF file a normalized database. The label_xxx fields are used as keys in the defining tables and are used for referencing in other tables. This can be confusing if you're used to PDB format files, for example, the atom_site.label_seq_id uniquely identifies the residue within an entity, the atom_site.pdbx_PDB_ins_code is actually paired with the atom_site.auth_seq_id, the auth_xxx fields are what the author named them as and might not be unique.
HTH,
Greg
comment:8 by , 10 years ago
Oliver Clarke provided copy_1_ccp4.cif written by CCP4. It does not have the _entity_poly_seq table but the current ChimeraX is connecting the residues. But it produces about a dozen very long bonds where there are missing residues. These long bonds should be missing segment pseudobonds. The residue number jumps by 2 or more across these missing residues.
by , 10 years ago
Attachment: | copy_1_ccp4.cif added |
---|
mmCIF written by CCP4 shows several long bonds in ChimeraX.
comment:9 by , 10 years ago
I asked Oliver to try running the Phenix mmtbx.prepare_pdb_deposition program to produce an mmCIF file.
http://www.phenix-online.org/documentation/overviews/xray-structure-deposition.html
That program adds the entity_poly_seq table and asks the user to provide the sequence. I've attached the files Oliver used in this test, starting with 1bl8.pdb, then producing 1bl8_refine_001.cif using phenix.refine which has no sequence table, then running the deposition program to make mmtbx.cif.
Current ChimeraX can connect residues even with the 1bl8_refine_001.cif with no sequence table because of changes Greg made to handle this case. This structure has no missing segments which would produce wrong long bonds.
by , 10 years ago
by , 10 years ago
Attachment: | 1bl8_refine_001.cif added |
---|
1bl8.pdb refined with phenix.refine with cif output, no sequence table
by , 10 years ago
Ran mmtbx.prepare_pdb_deposition on 1bl8_refine_001.cif to add sequence table.
comment:10 by , 10 years ago
mmbtx.cif includes data sections with the residue templates after the data section with the structure. This means that ChimeraX will not know the template for a new ligand when it tries to connect the structure.
comment:11 by , 9 years ago
Milestone: | → Alpha Release |
---|
comment:12 by , 9 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
All of the example cif files now open. Files without the entity_poly_seq table, now implicitly create connectivity between residues of the same type (peptide, nucleotide), and really long bonds are turned into missing structure bonds.
The trailing data blocks with residue templates are ignored. But if the templates are added to the first data block with a chem_comp table to list residue types and the chem_comp_bond table has all of the residues' connectivity, then those residue templates will be used -- this is the method that the European PDB uses in their "updated" mmCIF files. So it would be ideal if Phenix adopted this alternative way of giving the templates.
comment:13 by , 9 years ago
Component: | Input/Output → mmCIF Reader/Writer |
---|
comment:14 by , 7 years ago
Component: | mmCIF Reader/Writer → Input/Output |
---|
reducing # of categories, so lumping into I/O category
Greg says this also prevents forming inter-residue bonds between amino acids.