Opened 8 years ago

Closed 7 years ago

#752 closed defect (fixed)

CIF/PDB I/O problems

Reported by: Tristan Croll Owned by: Greg Couch
Priority: major Milestone:
Component: Input/Output Version:
Keywords: Cc: pett
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

I'm looking at a fairly large structure (4ljz) which has zinc ions in two chains (D and J). If I save this from ChimeraX as a mmCIF file and re-open it, chains D and J appear to be treated as ligands in their entirety - they're shown initially as sticks with CPK colouring rather than the VDW/colour by chain of the rest of the structure, and are not included in ribbon calculation.

If I try to open this mmCIF in Coot, it seems to interpret it as an empty structure. Phenix handles it without trouble and returns a PDB file. Loading that into ChimeraX reveals another piece of weirdness: if the PDB contains LINK records (between the zinc ions and surrounding cysteines) then everything is fine. If the LINK records aren't present, then for some reason all breaks in the peptide backbone become ligated with long bonds.

I can provide files if you like, but they're ~10MB apiece.

Attachments (1)

cx_4ljz.cif (3.3 MB ) - added by Tom Goddard 8 years ago.

Change History (12)

comment:1 by pett, 8 years ago

Component: UnassignedInput/Output
Owner: changed from pett to Tom Goddard
Priority: blockermajor

The first thing to fix would be the mmCIF output, so assigning to Tom...

comment:2 by Tom Goddard, 8 years ago

I don't observe the problems reported with chains D and J of 4LJZ in the mmCIF file written by ChimeraX (today's build). The ChimeraX mmCIF does not show stick and cpk coloring for D and J in my test -- it shows the expected spheres with chain coloring. My written mmCIF file is attached cx_4ljz.cif produced with commands

open 4ljz
save ~/Desktop/cx_4ljz.cif

When displaying ribbon only a problem with chains D and J is apparent -- the secondary structure (helices and sheets) are shown as strand (thin tube) even though the mmCIF has secondary structure table entries for chains D and J. Removing atom_site table lines for the 3 ions (ZN and MG) of chain D remedies results in correct ribbon display. The ChimeraX mmCIF does not include the ATOM / HETATM field (atom_site.group_PDB), but inserting that as the first table field with ions identified as HETATM did not fix the ribbons.

I'm not sure what the cause of the ribbon not showing helix / sheet segments correctly. If the code has no way of knowing the 3 ions in chain D are not part of the polymer, this could certainly mess things up. This circumstance never happens in an mmCIF from the RCSB because ions are always placed in separate chains, not together with polymer chains. Our written mmCIF uses the author chain ids which combine polymer and ions in the same chain. This may have never been tested in ChimeraX.

Another possible cause of the problem is that the ions mess up the amino acid sequence determination of chain D. The mmCIF written by ChimeraX does not include separate primary sequence tables.

comment:3 by Tom Goddard, 8 years ago

Owner: changed from Tom Goddard to pett

Reassigning back to Eric. Although I wrote the current hack mmCIF writer I think either Greg or Eric should be in charge of developing an mmCIF writer. Greg is the most sensible choice since he wrote the mmCIF reader, but he may have limited time. Eric is also very knowledgable about the intricacies of PDB format which carry over to mmCIF (for instance how mixing of polymers and hetatms in the same chain should be handled). I'll add a Chimera meeting agenda item to decide who is in charge of developing the mmCIF writer.

by Tom Goddard, 8 years ago

Attachment: cx_4ljz.cif added

in reply to:  5 comment:4 by tic20@…, 8 years ago

You may want to have a look at this: https://gemmi.readthedocs.io/en/latest/about.html. It’s intended to become the standard library for CCP4, and includes a fully-compliant C++ mmCIF implementation.
 
 
Tristan Croll
Research Fellow
Cambridge Institute for Medical Research
University of Cambridge CB2 0XY
 

 

comment:5 by pett, 8 years ago

Cc: pett added
Owner: changed from pett to Greg Couch

Reassigning to Greg, since I believe he got tasked with re-implementing the mmCIF writer.

comment:6 by Greg Couch, 8 years ago

Status: assignedfeedback

New mmCIF writing code works much better. Would like to hear about about any problems with Coot or Phenix. In particular, I'm curious if the strands locations are understood or not, since the technically correct use of putting the strands in the struct_conf table might be unexpected by other programs.

in reply to:  8 comment:7 by tic20@…, 8 years ago

Is this in the builds yet? I downloaded today's daily build for Linux 
(see below), and if I do:

open 5u8q
save 5u8q.cif #1
open 5u8q.cif

... then I get the same issues as before - i.e. protein chains that 
contain ligands are treated as ligands in their entirety.


built: 2018-03-07 02:58:12 PST
committed: 2018-03-08 19:21:29 PST
size: 322,439,581 bytes
md5: 4c161e41f8c7f3ff4685d451519fbda6

On 2018-03-08 01:19, ChimeraX wrote:

comment:8 by pett, 8 years ago

None of the builds worked last night, so no, this isn't in a build yet...

--Eric

in reply to:  10 comment:9 by tic20@…, 8 years ago

OK, looks good. Saving 5u8q.pdb as a .cif and reloading it works fine 
now. Opening the .cif file in Coot works without trouble. The only 
(quite minor in the scheme of things) issue is that if I pass the .cif 
back through phenix.cif_as_pdb, the SHEET records are lost (which 
doesn't happen with the "official" 5u8q.cif).

On 2018-03-09 18:26, ChimeraX wrote:

comment:10 by Greg Couch, 8 years ago

ChimeraX doesn't keep track of the sheet information, only where the strands are. So it was using the legacy part of the mmCIF specification to specify the strands in the same way as the helices in the struct_conf table. However, to see it would help, I've modified how the strand information is written out to use the struct_sheet_range table instead, but with an "unknown" sheet identifier. Please let me know if that helps or hinders phenix.

-- Greg

comment:11 by Greg Couch, 7 years ago

Resolution: fixed
Status: feedbackclosed

I'm going to consider this "fixed" for now. The sheets are still a problem that will need to be addressed.

Note: See TracTickets for help on using tickets.