Opened 8 years ago
Closed 7 years ago
#752 closed defect (fixed)
CIF/PDB I/O problems
Reported by: | Tristan Croll | Owned by: | Greg Couch |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | Input/Output | Version: | |
Keywords: | Cc: | pett | |
Blocked By: | Blocking: | ||
Notify when closed: | Platform: | all | |
Project: | ChimeraX |
Description
I'm looking at a fairly large structure (4ljz) which has zinc ions in two chains (D and J). If I save this from ChimeraX as a mmCIF file and re-open it, chains D and J appear to be treated as ligands in their entirety - they're shown initially as sticks with CPK colouring rather than the VDW/colour by chain of the rest of the structure, and are not included in ribbon calculation.
If I try to open this mmCIF in Coot, it seems to interpret it as an empty structure. Phenix handles it without trouble and returns a PDB file. Loading that into ChimeraX reveals another piece of weirdness: if the PDB contains LINK records (between the zinc ions and surrounding cysteines) then everything is fine. If the LINK records aren't present, then for some reason all breaks in the peptide backbone become ligated with long bonds.
I can provide files if you like, but they're ~10MB apiece.
Attachments (1)
Change History (12)
comment:1 by , 8 years ago
Component: | Unassigned → Input/Output |
---|---|
Owner: | changed from | to
Priority: | blocker → major |
comment:2 by , 8 years ago
I don't observe the problems reported with chains D and J of 4LJZ in the mmCIF file written by ChimeraX (today's build). The ChimeraX mmCIF does not show stick and cpk coloring for D and J in my test -- it shows the expected spheres with chain coloring. My written mmCIF file is attached cx_4ljz.cif produced with commands
open 4ljz
save ~/Desktop/cx_4ljz.cif
When displaying ribbon only a problem with chains D and J is apparent -- the secondary structure (helices and sheets) are shown as strand (thin tube) even though the mmCIF has secondary structure table entries for chains D and J. Removing atom_site table lines for the 3 ions (ZN and MG) of chain D remedies results in correct ribbon display. The ChimeraX mmCIF does not include the ATOM / HETATM field (atom_site.group_PDB), but inserting that as the first table field with ions identified as HETATM did not fix the ribbons.
I'm not sure what the cause of the ribbon not showing helix / sheet segments correctly. If the code has no way of knowing the 3 ions in chain D are not part of the polymer, this could certainly mess things up. This circumstance never happens in an mmCIF from the RCSB because ions are always placed in separate chains, not together with polymer chains. Our written mmCIF uses the author chain ids which combine polymer and ions in the same chain. This may have never been tested in ChimeraX.
Another possible cause of the problem is that the ions mess up the amino acid sequence determination of chain D. The mmCIF written by ChimeraX does not include separate primary sequence tables.
comment:3 by , 8 years ago
Owner: | changed from | to
---|
Reassigning back to Eric. Although I wrote the current hack mmCIF writer I think either Greg or Eric should be in charge of developing an mmCIF writer. Greg is the most sensible choice since he wrote the mmCIF reader, but he may have limited time. Eric is also very knowledgable about the intricacies of PDB format which carry over to mmCIF (for instance how mixing of polymers and hetatms in the same chain should be handled). I'll add a Chimera meeting agenda item to decide who is in charge of developing the mmCIF writer.
by , 8 years ago
Attachment: | cx_4ljz.cif added |
---|
comment:4 by , 8 years ago
You may want to have a look at this: https://gemmi.readthedocs.io/en/latest/about.html. It’s intended to become the standard library for CCP4, and includes a fully-compliant C++ mmCIF implementation. Tristan Croll Research Fellow Cambridge Institute for Medical Research University of Cambridge CB2 0XY
follow-up: 4 comment:5 by , 8 years ago
Cc: | added |
---|---|
Owner: | changed from | to
Reassigning to Greg, since I believe he got tasked with re-implementing the mmCIF writer.
comment:6 by , 8 years ago
Status: | assigned → feedback |
---|
New mmCIF writing code works much better. Would like to hear about about any problems with Coot or Phenix. In particular, I'm curious if the strands locations are understood or not, since the technically correct use of putting the strands in the struct_conf table might be unexpected by other programs.
comment:7 by , 8 years ago
Is this in the builds yet? I downloaded today's daily build for Linux (see below), and if I do: open 5u8q save 5u8q.cif #1 open 5u8q.cif ... then I get the same issues as before - i.e. protein chains that contain ligands are treated as ligands in their entirety. built: 2018-03-07 02:58:12 PST committed: 2018-03-08 19:21:29 PST size: 322,439,581 bytes md5: 4c161e41f8c7f3ff4685d451519fbda6 On 2018-03-08 01:19, ChimeraX wrote:
follow-up: 7 comment:8 by , 8 years ago
None of the builds worked last night, so no, this isn't in a build yet...
--Eric
comment:9 by , 8 years ago
OK, looks good. Saving 5u8q.pdb as a .cif and reloading it works fine now. Opening the .cif file in Coot works without trouble. The only (quite minor in the scheme of things) issue is that if I pass the .cif back through phenix.cif_as_pdb, the SHEET records are lost (which doesn't happen with the "official" 5u8q.cif). On 2018-03-09 18:26, ChimeraX wrote:
follow-up: 9 comment:10 by , 8 years ago
ChimeraX doesn't keep track of the sheet information, only where the strands are. So it was using the legacy part of the mmCIF specification to specify the strands in the same way as the helices in the struct_conf table. However, to see it would help, I've modified how the strand information is written out to use the struct_sheet_range table instead, but with an "unknown" sheet identifier. Please let me know if that helps or hinders phenix.
-- Greg
comment:11 by , 7 years ago
Resolution: | → fixed |
---|---|
Status: | feedback → closed |
I'm going to consider this "fixed" for now. The sheets are still a problem that will need to be addressed.
The first thing to fix would be the mmCIF output, so assigning to Tom...