#2184 closed defect (fixed)
Problematic mmCIF file
Reported by: | Tristan Croll | Owned by: | Greg Couch |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | Input/Output | Version: | |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Notify when closed: | Platform: | all | |
Project: | ChimeraX |
Description
Will attach the offending .cif file (it's an unpublished and likely high-profile model from a colleague who's given me permission to share it but asks that it remain confidential). Attempting to load it on ChimeraX start (i.e. with ChimeraX RCLH1LH2.cif
) causes a segmentation fault with the traceback below. Loading it after ChimeraX starts doesn't crash, but a whole lot of residues are simply missing. The log reports:
skipping chem_comp category: Missing column 'type' near line 108 Unknown polymer entity '?' near line 160 Skipping residue with duplicate label_seq_id 3 in chain AQ Skipping residue with duplicate label_seq_id 5 in chain AQ Skipping residue with duplicate label_seq_id 7 in chain AQ Skipping residue with duplicate label_seq_id 9 in chain AQ Skipping residue with duplicate label_seq_id 11 in chain AQ 644 messages similar to the above omitted Missing or incomplete entity_poly_seq table. Inferred polymer connectivity. skipping chem_comp category: Missing column 'type' near line 45377 skipping chem_comp category: Missing column 'type' near line 45404 skipping chem_comp category: Missing column 'type' near line 46309 skipping chem_comp category: Missing column 'type' near line 47201 skipping chem_comp category: Missing column 'type' near line 47228
Traceback:
#0 0x00007fffe2a649d0 in std::_Hash_bytes(void const*, unsigned long, unsigned long) () from /lib64/libstdc++.so.6 #1 0x00007fff3e4d456b in std::_Hash_impl::hash (__seed=3339675911, __clength=<optimized out>, __ptr=<optimized out>) at /usr/include/c++/4.9/bits/functional_hash.h:131 #2 std::hash<std::string>::operator() (this=<optimized out>, __s= "01;\005\000\000\000\000\000\240\063\a\000\000\000\000\220\031\371\004\000\000\000\000@1;\005\000\000\000\000ph\370\003", '\000' <repeats 12 times>, "\300\000\000\000\000\000\000\000A\000\000\000\000\000\000\000p\255\001\005\000\000\000\000\020\004F\005", '\000' <repeats 20 times>, "`\253%\006\000\000\000\000}\f\000\000\000\000\000\000@\000\000\000\000\000\000\000Q\000\000\000\000\000\000\000\200[3\005\000\000\000\000\262\000\000\000\000\000\000\000\263\000\000\000\000\000\000\000CD\000\000\000\000\000\000\350\346\065\005\000\000\000\000\270\344\065\005\000\000\000\000 \000\000\000\000\000\000\000\200\353\065\005\000\000\000\000V%\306\060\271\332\327\301"...) at /usr/include/c++/4.9/bits/basic_string.h:3084 #3 mmcif::ExtractMolecule::hash_ResidueKey::operator() (this=<optimized out>, k=...) at mmcif_cpp/mmcif.cpp:332 #4 std::__detail::_Hash_code_base<mmcif::ExtractMolecule::ResidueKey, std::pair<mmcif::ExtractMolecule::ResidueKey const, atomstruct::Residue*>, std::__detail::_Select1st, mmcif::ExtractMolecule::hash_ResidueKey, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, true>::_M_hash_code (this=<optimized out>, __k=...) at /usr/include/c++/4.9/bits/hashtable_policy.h:1261 #5 std::_Hashtable<mmcif::ExtractMolecule::ResidueKey, std::pair<mmcif::ExtractMolecule::ResidueKey const, atomstruct::Residue*>, std::allocator<std::pair<mmcif::ExtractMolecule::ResidueKey const, atomstruct::Residue*> >, std::__detail::_Select1st, std::equal_to<mmcif::ExtractMolecule::ResidueKey>, mmcif::ExtractMolecule::hash_ResidueKey, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::find (__k=..., this=<optimized out>) at /usr/include/c++/4.9/bits/hashtable.h:1302 #6 std::unordered_map<mmcif::ExtractMolecule::ResidueKey, atomstruct::Residue*, mmcif::ExtractMolecule::hash_ResidueKey, std::equal_to<mmcif::ExtractMolecule::ResidueKey>, std::allocator<std::pair<mmcif::ExtractMolecule::ResidueKey const, atomstruct::Residue*> > >::find (__x=..., this=0x523b3d0) at /usr/include/c++/4.9/bits/unordered_map.h:574 #7 mmcif::ExtractMolecule::finished_parse (this=0x7fffffffb040) at mmcif_cpp/mmcif.cpp:813 #8 0x00007fff3e4f1d1c in readcif::CIFFile::internal_parse(bool) ()
Attachments (1)
Change History (9)
by , 6 years ago
Attachment: | RCLH1LH2.cif added |
---|
follow-up: 1 comment:2 by , 6 years ago
Status: | assigned → accepted |
---|
comment:3 by , 6 years ago
The segmentation should be fixed. valgrind on Linux does not find any errors.
comment:4 by , 6 years ago
This is an awful mmCIF file. The minimal change to make ChimeraX work better would be to give each chain a separate entity_id. But ChimeraX would be much, much happier with the mmCIF files that Phenix generates for submission to the PDB. The entity_poly_seq table is especially useful.
FYI, the PDB provides tools for checking if mmCIF files are valid at https://sw-tools.rcsb.org/apps/MMCIF-DICT-SUITE/index.html. I took the liberty of checking the given mmCIF against the mmcif_std.dic from 22 March 2018 and there are 44775 errors -- mostly the bogus atom_site.label_entity_id value.
comment:5 by , 6 years ago
Wow - that’s interesting. I’ll raise this with the PHENIX team. I actually did try passing it through their deposition tool, but that didn’t actually fix the problem.
follow-up: 5 comment:6 by , 6 years ago
Resolution: | → fixed |
---|---|
Status: | accepted → closed |
Improved the code that deals with missing entities. Give warning when there's a need to guess. Use label_asym_id as the entity_id for uniqueness (was using auth_asym_id which is not necessarily unique). Now all 44696 atoms are displayed.
FYI, I consider all warnings that are given when opening a mmCIF file as mistakes that could be fixed with a better mmCIF file.
comment:7 by , 6 years ago
Thought of one more thing. It would help if the chem_comp_atom and chem_comp_bond tables were in the same data block as the molecular data instead of being in separate data blocks. Just like the PDBe does with their "updated" mmCIF files. Right now, ChimeraX effectively ignores the data blocks without atomic data in them and guesses at the connectivity for unknown residue types.
comment:8 by , 6 years ago
OK, thanks. I've passed these comments on to the Phenix team. On 2019-07-13 10:45, ChimeraX wrote: