Opened 5 years ago

Closed 3 years ago

#4479 closed enhancement (fixed)

Read small-molecule CIF files

Reported by: Tom Goddard Owned by: Greg Couch
Priority: low Milestone:
Component: Input/Output Version:
Keywords: Cc: Eric Pettersen, Elaine Meng, kristen.browne@…
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

ChimeraX currently cannot read small molecule CIF file. For the attached file it gives the warning

Skipping atom_site category: Missing column 'label_asym_id' near line 191

Although these are not a high priority, they were readable in Chimera and there are some users who use this (e.g. Chimera ticket #18141).

Attachments (11)

2-compound.cif (201.5 KB ) - added by Tom Goddard 5 years ago.
fecl2_0 (1).cif (2.8 KB ) - added by kristen.browne@… 3 years ago.
Added by email2trac
phosphorene_0.cif (1.3 KB ) - added by kristen.browne@… 3 years ago.
Added by email2trac
ms4689_0.cif (23.5 KB ) - added by kristen.browne@… 3 years ago.
Added by email2trac
4323526.cif (36.6 KB ) - added by kristen.browne@… 3 years ago.
Added by email2trac
is2642.cif (15.7 KB ) - added by kristen.browne@… 3 years ago.
Added by email2trac
M48L96.cif (154.2 KB ) - added by kristen.browne@… 3 years ago.
Added by email2trac
10078.cif (22.4 KB ) - added by kristen.browne@… 3 years ago.
Added by email2trac
fjw1901.cif (253.4 KB ) - added by kristen.browne@… 3 years ago.
Added by email2trac
mo_12hp_kp1_0m_0.cif (566.1 KB ) - added by kristen.browne@… 3 years ago.
Added by email2trac
298447.cif (9.5 KB ) - added by kristen.browne@… 3 years ago.
Added by email2trac

Download all attachments as: .zip

Change History (29)

by Tom Goddard, 5 years ago

Attachment: 2-compound.cif added

comment:1 by Eric Pettersen, 3 years ago

Cc: kristen.browne@… added

Also used in NIH pipeline (see #8025 for example), so I don't know if priority should stay "low" or not.

comment:2 by Greg Couch, 3 years ago

Added small molecule CIF support to daily build.

Open with:

  • "open 9253.cif format corecif"
  • "open 1000001 from cod" to open from the Crystallography Open Database
  • "open 1000001 from pcod" to open from the Predicted Crystallography Open Database

Still to do:

  • figure out why "open 9253.cif format smallcif" doesn't work
  • add symmetry support, eg., COD 2102215
  • when opening file as mmCIF fails, try coreCIF
  • would like "open formats" to list cod database with format corecif not cod
  • COD instead of cod for database?

comment:3 by Eric Pettersen, 3 years ago

smallcif not working seems like my bug. Let me look into it. The Clustal ALN sequence format has the same problem -- the second nickname doesn't work.

comment:5 by Greg Couch, 3 years ago

Also need to handle alternate locations. For example, COD 4128365, which is the largest entry in the COD at 60M, has alternate locations.

comment:6 by Tom Goddard, 3 years ago

On Nov 15, 2022, at 8:56 PM, Greg Couch wrote:

Hi Tom,

Are you aware of Spglib?  "Spglib is a library for finding and handling crystal symmetries written in C."  And there is a Python interface.  https://spglib.github.io/spglib/index.html, https://pypi.org/project/spglib/.  Looks interesting.

Once I understand it a bit more, I'd like to talk to you about symmetry in small molecule CIF files.  Not sure if the above library will help or not at this point.

   -- Greg
From: Tom Goddard 
Subject: Re: spglib
Date: November 16, 2022 at 10:28:40 AM PST
To: Greg Couch
Cc: Eric Pettersen 

Hi Greg,

 I didn't know of that space group library.  We have Python code that figures out the symmetries and skewed coordinates.  But the small molecule crystals are a nightmare for other reasons.  I tried back in Chimera days to handle them better but gave up.  Here are two of the devilish issues.  The asymmetric unit can have half an atom or 1/3 or 1/4 or 1/6.  When you apply symmetry you get two copies of the atom right on top of each other.  Usually this is apparent in the cif file because the occupancy of the atom is 0.5 or 0.333 or 0.25.  Of course some filter could look for atoms on top of each other and remove them.  But then the real killer is the bonds in the crystal are not specified in the file.  I think our bond finding code does a very bad job on most small molecule crystals and finding the correct bonds is hard.  The bonds may even be from one asymmetric unit to another.

 I think handling small molecule crystals well is a lot of work.

	Tom

in reply to:  8 comment:7 by Elaine Meng, 3 years ago

I don't see any of this in the daily build: UCSF ChimeraX version: 1.6.dev202211150145 (2022-11-15).  Do I need to wait for the Nov 16 build?  It's a little confusing because download page shows the date Nov 14 but the version information in what I just downloaded gives Nov 15.

command "open formats" does not list (small) CIF format nor the COD or PCOD databases

command "open 1000001 from cod" gives error message no such database


in reply to:  9 ; comment:8 by Greg Couch, 3 years ago

Evidently there is a coding error that the windows compiler didn't 
catch.  So try again tomorrow.

     -- Greg

On 11/16/22 11:12, Elaine Meng wrote:

comment:9 by Greg Couch, 3 years ago

Ignoring alternative atom locations for now. Too hard to figure which atoms correspond to which other atoms from the name. Would probably have to do a graph traversal to get right.

Also decided not to support symmetry for now.

comment:10 by Greg Couch, 3 years ago

Well, added alternate atom support that depends on the atoms in a disorder group being in the same order as in the first group of an assembly. This will not work all of time, which is why we use name based alternate atoms in mmCIF files. But the correlation of names in a CIF file appears to be ad hoc. Tested with COD 4128365 which has disjoint disorder assembly J groups.

Interesting bug, "sel /A:1" fails, while "sel /A" works. This breaks the alternate atom GUI.

comment:11 by Greg Couch, 3 years ago

The interesting bug was due to the residue insertion code being set to the null byte instead of a blank character in the C++ layer, so there was a (hidden) null byte in the name. Changed the C++ layer to default to a blank character.

comment:12 by Greg Couch, 3 years ago

Next bug, figure out why when 2-compound.cif is saved as a mmCIF file, it restores without the metal coordination bonds.

Last edited 3 years ago by Greg Couch (previous) (diff)

in reply to:  14 comment:13 by kristen.browne@…, 3 years ago

While it's not expected that all of these would be fixed, we've noticed most of our small molecule cifs are running great now!

I've attached those few that are still failing here for discussion in tomorrow's meeting so we know which to fix on our own.

Thanks!

K

-----Original Message-----
From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu> 
Sent: Tuesday, November 29, 2022 10:34 PM
Cc: goddard@cgl.ucsf.edu; gregc@cgl.ucsf.edu; Browne, Kristen (NIH/NIAID) [C] <kristen.browne@nih.gov>; meng@cgl.ucsf.edu; pett@cgl.ucsf.edu
Subject: [EXTERNAL] Re: [ChimeraX] #4479: Read small-molecule CIF files

#4479: Read small-molecule CIF files
-----------------------------------+------------------------
          Reporter:  Tom Goddard   |      Owner:  Greg Couch
              Type:  enhancement   |     Status:  assigned
          Priority:  low           |  Milestone:
         Component:  Input/Output  |    Version:
        Resolution:                |   Keywords:
        Blocked By:                |   Blocking:
Notify when closed:                |   Platform:  all
           Project:  ChimeraX      |
-----------------------------------+------------------------

Comment (by Greg Couch):

 Next bug, figure out why when 2-compound.cif is saved in a session, it  restores without the metal coordination bonds.

--
Ticket URL: <https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rbvi.ucsf.edu%2Ftrac%2FChimeraX%2Fticket%2F4479%23comment%3A12&amp;data=05%7C01%7Ckristen.browne%40nih.gov%7C756b54564a1a479ef26108dad283c2a6%7C14b77578977342d58507251ca2dc2b06%7C0%7C0%7C638053760595103702%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=i3liPyGecy1Fv7AQf6Je9akvw61fQAmvHMj8Q0l6R4I%3D&amp;reserved=0>
ChimeraX <https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rbvi.ucsf.edu%2Fchimerax%2F&amp;data=05%7C01%7Ckristen.browne%40nih.gov%7C756b54564a1a479ef26108dad283c2a6%7C14b77578977342d58507251ca2dc2b06%7C0%7C0%7C638053760595103702%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=2U9%2BtUMoYxU%2BhNuyTidCbRLYzzjn1ABNAhJ5Vn7mrS4%3D&amp;reserved=0>
ChimeraX Issue Tracker
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and are confident the content is safe.

fecl2_0 (1).cif

phosphorene_0.cif

ms4689_0.cif

4323526.cif

is2642.cif

M48L96.cif

10078.cif

fjw1901.cif

mo_12hp_kp1_0m_0.cif

298447.cif

by kristen.browne@…, 3 years ago

Attachment: fecl2_0 (1).cif added

Added by email2trac

by kristen.browne@…, 3 years ago

Attachment: phosphorene_0.cif added

Added by email2trac

by kristen.browne@…, 3 years ago

Attachment: ms4689_0.cif added

Added by email2trac

by kristen.browne@…, 3 years ago

Attachment: 4323526.cif added

Added by email2trac

by kristen.browne@…, 3 years ago

Attachment: is2642.cif added

Added by email2trac

by kristen.browne@…, 3 years ago

Attachment: M48L96.cif added

Added by email2trac

by kristen.browne@…, 3 years ago

Attachment: 10078.cif added

Added by email2trac

by kristen.browne@…, 3 years ago

Attachment: fjw1901.cif added

Added by email2trac

by kristen.browne@…, 3 years ago

Attachment: mo_12hp_kp1_0m_0.cif added

Added by email2trac

by kristen.browne@…, 3 years ago

Attachment: 298447.cif added

Added by email2trac

comment:14 by Tom Goddard, 3 years ago

I get a C++ exception trying to open the attached 298447.cif as mmcif on Mac ARM with December 6 daily build.

open /Users/goddard/Downloads/298447.cif

mmCIF parsing error: unknown error (NSt3__117bad_function_callE): std::exception

But opening specifying small molecule cif works correctly in the same daily build

open /Users/goddard/Downloads/298447.cif format smallcif

298447.cif title:
C8 H16

As far as I know ChimeraX currently does not automatically detect the small molecule cif format, so adding the "format smallcif" option is necessary.

comment:15 by Greg Couch, 3 years ago

These are caused by a bug in the recognizing CIF categories that I'm working on. It only affects DDL v1 CIF files, that is ones that don't use a period to separate the category from the rest of the data name (v2 style). For example atom_site_fract_x (v1) vs. atom_site.fract_x (v2).

comment:16 by Greg Couch, 3 years ago

All fixed except for 4323526.cif​, which comes up empty unless "format corecif" is used when opening. Tracking it down.

comment:17 by Greg Couch, 3 years ago

Fixed detecting 4323526.cif as a small CIF file.

comment:18 by Greg Couch, 3 years ago

Resolution: fixed
Status: assignedclosed

Any further work on small molecule CIF files should be on a new ticket.

Note: See TracTickets for help on using tickets.