Opened 5 years ago
Closed 3 years ago
#4479 closed enhancement (fixed)
Read small-molecule CIF files
| Reported by: | Tom Goddard | Owned by: | Greg Couch |
|---|---|---|---|
| Priority: | low | Milestone: | |
| Component: | Input/Output | Version: | |
| Keywords: | Cc: | Eric Pettersen, Elaine Meng, kristen.browne@… | |
| Blocked By: | Blocking: | ||
| Notify when closed: | Platform: | all | |
| Project: | ChimeraX |
Description
ChimeraX currently cannot read small molecule CIF file. For the attached file it gives the warning
Skipping atom_site category: Missing column 'label_asym_id' near line 191
Although these are not a high priority, they were readable in Chimera and there are some users who use this (e.g. Chimera ticket #18141).
Attachments (11)
Change History (29)
by , 5 years ago
| Attachment: | 2-compound.cif added |
|---|
comment:1 by , 3 years ago
| Cc: | added |
|---|
comment:2 by , 3 years ago
Added small molecule CIF support to daily build.
Open with:
- "open 9253.cif format corecif"
- "open 1000001 from cod" to open from the Crystallography Open Database
- "open 1000001 from pcod" to open from the Predicted Crystallography Open Database
Still to do:
- figure out why "open 9253.cif format smallcif" doesn't work
- add symmetry support, eg., COD 2102215
- when opening file as mmCIF fails, try coreCIF
- would like "open formats" to list cod database with format corecif not cod
- COD instead of cod for database?
comment:3 by , 3 years ago
smallcif not working seems like my bug. Let me look into it. The Clustal ALN sequence format has the same problem -- the second nickname doesn't work.
comment:4 by , 3 years ago
Okay, the "smallcif" thing is fixed.
Fix: https://github.com/RBVI/ChimeraX/commit/86e371ef832cbaa673bb21f49b0a76d06f88bd51
comment:5 by , 3 years ago
Also need to handle alternate locations. For example, COD 4128365, which is the largest entry in the COD at 60M, has alternate locations.
comment:6 by , 3 years ago
On Nov 15, 2022, at 8:56 PM, Greg Couch wrote: Hi Tom, Are you aware of Spglib? "Spglib is a library for finding and handling crystal symmetries written in C." And there is a Python interface. https://spglib.github.io/spglib/index.html, https://pypi.org/project/spglib/. Looks interesting. Once I understand it a bit more, I'd like to talk to you about symmetry in small molecule CIF files. Not sure if the above library will help or not at this point. -- Greg
From: Tom Goddard Subject: Re: spglib Date: November 16, 2022 at 10:28:40 AM PST To: Greg Couch Cc: Eric Pettersen Hi Greg, I didn't know of that space group library. We have Python code that figures out the symmetries and skewed coordinates. But the small molecule crystals are a nightmare for other reasons. I tried back in Chimera days to handle them better but gave up. Here are two of the devilish issues. The asymmetric unit can have half an atom or 1/3 or 1/4 or 1/6. When you apply symmetry you get two copies of the atom right on top of each other. Usually this is apparent in the cif file because the occupancy of the atom is 0.5 or 0.333 or 0.25. Of course some filter could look for atoms on top of each other and remove them. But then the real killer is the bonds in the crystal are not specified in the file. I think our bond finding code does a very bad job on most small molecule crystals and finding the correct bonds is hard. The bonds may even be from one asymmetric unit to another. I think handling small molecule crystals well is a lot of work. Tom
comment:7 by , 3 years ago
I don't see any of this in the daily build: UCSF ChimeraX version: 1.6.dev202211150145 (2022-11-15). Do I need to wait for the Nov 16 build? It's a little confusing because download page shows the date Nov 14 but the version information in what I just downloaded gives Nov 15. command "open formats" does not list (small) CIF format nor the COD or PCOD databases command "open 1000001 from cod" gives error message no such database
follow-up: 7 comment:8 by , 3 years ago
Evidently there is a coding error that the windows compiler didn't catch. So try again tomorrow. -- Greg On 11/16/22 11:12, Elaine Meng wrote:
follow-up: 8 comment:9 by , 3 years ago
Ignoring alternative atom locations for now. Too hard to figure which atoms correspond to which other atoms from the name. Would probably have to do a graph traversal to get right.
Also decided not to support symmetry for now.
comment:10 by , 3 years ago
Well, added alternate atom support that depends on the atoms in a disorder group being in the same order as in the first group of an assembly. This will not work all of time, which is why we use name based alternate atoms in mmCIF files. But the correlation of names in a CIF file appears to be ad hoc. Tested with COD 4128365 which has disjoint disorder assembly J groups.
Interesting bug, "sel /A:1" fails, while "sel /A" works. This breaks the alternate atom GUI.
comment:11 by , 3 years ago
The interesting bug was due to the residue insertion code being set to the null byte instead of a blank character in the C++ layer, so there was a (hidden) null byte in the name. Changed the C++ layer to default to a blank character.
comment:12 by , 3 years ago
Next bug, figure out why when 2-compound.cif is saved as a mmCIF file, it restores without the metal coordination bonds.
comment:13 by , 3 years ago
While it's not expected that all of these would be fixed, we've noticed most of our small molecule cifs are running great now!
I've attached those few that are still failing here for discussion in tomorrow's meeting so we know which to fix on our own.
Thanks!
K
-----Original Message-----
From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
Sent: Tuesday, November 29, 2022 10:34 PM
Cc: goddard@cgl.ucsf.edu; gregc@cgl.ucsf.edu; Browne, Kristen (NIH/NIAID) [C] <kristen.browne@nih.gov>; meng@cgl.ucsf.edu; pett@cgl.ucsf.edu
Subject: [EXTERNAL] Re: [ChimeraX] #4479: Read small-molecule CIF files
#4479: Read small-molecule CIF files
-----------------------------------+------------------------
Reporter: Tom Goddard | Owner: Greg Couch
Type: enhancement | Status: assigned
Priority: low | Milestone:
Component: Input/Output | Version:
Resolution: | Keywords:
Blocked By: | Blocking:
Notify when closed: | Platform: all
Project: ChimeraX |
-----------------------------------+------------------------
Comment (by Greg Couch):
Next bug, figure out why when 2-compound.cif is saved in a session, it restores without the metal coordination bonds.
--
Ticket URL: <https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rbvi.ucsf.edu%2Ftrac%2FChimeraX%2Fticket%2F4479%23comment%3A12&data=05%7C01%7Ckristen.browne%40nih.gov%7C756b54564a1a479ef26108dad283c2a6%7C14b77578977342d58507251ca2dc2b06%7C0%7C0%7C638053760595103702%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=i3liPyGecy1Fv7AQf6Je9akvw61fQAmvHMj8Q0l6R4I%3D&reserved=0>
ChimeraX <https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rbvi.ucsf.edu%2Fchimerax%2F&data=05%7C01%7Ckristen.browne%40nih.gov%7C756b54564a1a479ef26108dad283c2a6%7C14b77578977342d58507251ca2dc2b06%7C0%7C0%7C638053760595103702%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=2U9%2BtUMoYxU%2BhNuyTidCbRLYzzjn1ABNAhJ5Vn7mrS4%3D&reserved=0>
ChimeraX Issue Tracker
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and are confident the content is safe.
follow-up: 13 comment:14 by , 3 years ago
I get a C++ exception trying to open the attached 298447.cif as mmcif on Mac ARM with December 6 daily build.
open /Users/goddard/Downloads/298447.cif mmCIF parsing error: unknown error (NSt3__117bad_function_callE): std::exception
But opening specifying small molecule cif works correctly in the same daily build
open /Users/goddard/Downloads/298447.cif format smallcif 298447.cif title: C8 H16
As far as I know ChimeraX currently does not automatically detect the small molecule cif format, so adding the "format smallcif" option is necessary.
comment:15 by , 3 years ago
These are caused by a bug in the recognizing CIF categories that I'm working on. It only affects DDL v1 CIF files, that is ones that don't use a period to separate the category from the rest of the data name (v2 style). For example atom_site_fract_x (v1) vs. atom_site.fract_x (v2).
comment:16 by , 3 years ago
All fixed except for 4323526.cif, which comes up empty unless "format corecif" is used when opening. Tracking it down.
comment:18 by , 3 years ago
| Resolution: | → fixed |
|---|---|
| Status: | assigned → closed |
Any further work on small molecule CIF files should be on a new ticket.
Also used in NIH pipeline (see #8025 for example), so I don't know if priority should stay "low" or not.