Opened 5 years ago
Closed 3 years ago
#4342 closed defect (fixed)
mmCIF format-compliance issues
Reported by: | Tristan Croll | Owned by: | Greg Couch |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Input/Output | Version: | |
Keywords: | Cc: | pett | |
Blocked By: | Blocking: | ||
Notify when closed: | Platform: | all | |
Project: | ChimeraX |
Description
The following bug report has been submitted: Platform: Linux-3.10.0-1127.19.1.el7.x86_64-x86_64-with-centos-7.8.2003-Core ChimeraX Version: 1.1 (2020-09-09 22:22:27 UTC) Description Note from Oleg on the Phenix team, relating to the attached file. The first is not so important (although it might be worth logging a warning when writing an mmCIF file with chain IDs >4 characters?). The second is a compliance issue (still pretty minor in the grand scheme of things). The root cause is that I failed to re-order residues in the model after reassigning chain IDs (the original model deposited in the wwPDB had some sugar residues disconnected, causing the wwPDB's own scripts to assign them different chain IDs to the rest of their tree). I can do that, of course, but maybe this would be better handled in the writer? First, wwPDB limits the chain id with 4 characters (https://www.wwpdb.org/deposition/preparing-pdbx-mmcif-files) in mmCIF: PDB format allows only single-character chain IDs, while PDBx/mmCIF can accommodate chain IDs of up to four characters. It is not clear from the document whether they mean label_asym_id or auth_asym_id. Both don't have limits in the formal mmCIF dictionary (e.g. https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_atom_site.auth_asym_id.html). Another minor issue I found in your file (probably does not affect Phenix, but not format complaint). _atom_site.label_asym_id cannot be reused and have rather strict rules when it need to be changed to the next one. Basically, if chain is interrupted by another chain - need to change label_asym_id. See cctbx_project/iotbx/pdb/hierarchy.py def get_label_asym_id() for how we do it. It is based on a rather long and hand-waving e-mail exchange with John Berrisford that I can dig up if you are interested. In your file, e.g. label_asym_id corresponding to Aglyc3 is the same in two separate places. Log: > alias preview_toolshed toolshed url https://cxtoolshed- > preview.rbvi.ucsf.edu; toolshed reload available > alias production_toolshed toolshed url https://cxtoolshed.rbvi.ucsf.edu; > toolshed reload available > alias st isolde step $* > alias aw isolde add water $* > alias awsf isolde add water sim false > alias al isolde add ligand $* > alias so setattr sel atoms occupancy $* UCSF ChimeraX version: 1.1 (2020-09-09) © 2016-2020 Regents of the University of California. All rights reserved. How to cite UCSF ChimeraX OpenGL version: 3.3.0 NVIDIA 455.32.00 OpenGL renderer: TITAN Xp/PCIe/SSE2 OpenGL vendor: NVIDIA Corporation Manufacturer: Dell Inc. Model: Precision T5600 OS: CentOS Linux 7 Core Architecture: 64bit ELF CPU: 32 Intel(R) Xeon(R) CPU E5-2687W 0 @ 3.10GHz Cache Size: 20480 KB Memory: total used free shared buff/cache available Mem: 62G 10G 38G 228M 13G 51G Swap: 4.9G 0B 4.9G Graphics: 03:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP102 [TITAN Xp] [10de:1b02] (rev a1) Subsystem: NVIDIA Corporation Device [10de:11df] Kernel driver in use: nvidia PyQt version: 5.12.3 Compiled Qt version: 5.12.4 Runtime Qt version: 5.12.9 Installed Packages: alabaster: 0.7.12 appdirs: 1.4.4 Babel: 2.8.0 backcall: 0.2.0 blockdiag: 2.0.1 certifi: 2020.6.20 chardet: 3.0.4 ChimeraX-AddH: 2.1.3 ChimeraX-AlignmentAlgorithms: 2.0 ChimeraX-AlignmentHdrs: 3.2 ChimeraX-AlignmentMatrices: 2.0 ChimeraX-Alignments: 2.1 ChimeraX-Arrays: 1.0 ChimeraX-Async: 0.1 ChimeraX-Atomic: 1.6.1 ChimeraX-AtomSearch: 2.0 ChimeraX-AxesPlanes: 2.0 ChimeraX-BasicActions: 1.1 ChimeraX-BILD: 1.0 ChimeraX-BlastProtein: 1.0.1 ChimeraX-BondRot: 2.0 ChimeraX-BugReporter: 1.0 ChimeraX-BuildStructure: 2.0 ChimeraX-Bumps: 1.0 ChimeraX-BundleBuilder: 1.0 ChimeraX-ButtonPanel: 1.0 ChimeraX-CageBuilder: 1.0 ChimeraX-CellPack: 1.0 ChimeraX-Centroids: 1.1 ChimeraX-ChemGroup: 2.0 ChimeraX-Clashes: 2.0 ChimeraX-Clipper: 0.15.0 ChimeraX-ColorActions: 1.0 ChimeraX-ColorGlobe: 1.0 ChimeraX-CommandLine: 1.1.3 ChimeraX-ConnectStructure: 2.0 ChimeraX-Contacts: 1.0 ChimeraX-Core: 1.1 ChimeraX-CoreFormats: 1.0 ChimeraX-coulombic: 1.0.1 ChimeraX-Crosslinks: 1.0 ChimeraX-Crystal: 1.0 ChimeraX-DataFormats: 1.0 ChimeraX-Dicom: 1.0 ChimeraX-DistMonitor: 1.1 ChimeraX-DistUI: 1.0 ChimeraX-Dssp: 2.0 ChimeraX-EMDB-SFF: 1.0 ChimeraX-ExperimentalCommands: 1.0 ChimeraX-FileHistory: 1.0 ChimeraX-FunctionKey: 1.0 ChimeraX-Geometry: 1.1 ChimeraX-gltf: 1.0 ChimeraX-Graphics: 1.0 ChimeraX-Hbonds: 2.0 ChimeraX-Help: 1.0 ChimeraX-HKCage: 1.3 ChimeraX-IHM: 1.0 ChimeraX-ImageFormats: 1.0 ChimeraX-IMOD: 1.0 ChimeraX-IO: 1.0 ChimeraX-ISOLDE: 1.1.0 ChimeraX-Label: 1.0 ChimeraX-LinuxSupport: 1.0 ChimeraX-ListInfo: 1.0 ChimeraX-Log: 1.1.1 ChimeraX-LookingGlass: 1.1 ChimeraX-Map: 1.0.1 ChimeraX-MapData: 2.0 ChimeraX-MapEraser: 1.0 ChimeraX-MapFilter: 2.0 ChimeraX-MapFit: 2.0 ChimeraX-MapSeries: 2.0 ChimeraX-Markers: 1.0 ChimeraX-Mask: 1.0 ChimeraX-MatchMaker: 1.1 ChimeraX-MDcrds: 2.0 ChimeraX-MedicalToolbar: 1.0.1 ChimeraX-Meeting: 1.0 ChimeraX-MLP: 1.0 ChimeraX-mmCIF: 2.2 ChimeraX-MMTF: 2.0 ChimeraX-Modeller: 1.0 ChimeraX-ModelPanel: 1.0 ChimeraX-ModelSeries: 1.0 ChimeraX-Mol2: 2.0 ChimeraX-Morph: 1.0 ChimeraX-MouseModes: 1.0 ChimeraX-Movie: 1.0 ChimeraX-Neuron: 1.0 ChimeraX-Nucleotides: 2.0 ChimeraX-OpenCommand: 1.2.1 ChimeraX-PDB: 2.1 ChimeraX-PDBBio: 1.0 ChimeraX-Phenix: 0.1 ChimeraX-PickBlobs: 1.0 ChimeraX-Positions: 1.0 ChimeraX-PresetMgr: 1.0 ChimeraX-PubChem: 2.0 ChimeraX-Read-Pbonds: 1.0 ChimeraX-Registration: 1.1 ChimeraX-RemoteControl: 1.0 ChimeraX-ResidueFit: 1.0 ChimeraX-RestServer: 1.0 ChimeraX-RNALayout: 1.0 ChimeraX-RotamerLibMgr: 2.0 ChimeraX-RotamerLibsDunbrack: 2.0 ChimeraX-RotamerLibsDynameomics: 2.0 ChimeraX-RotamerLibsRichardson: 2.0 ChimeraX-SaveCommand: 1.2 ChimeraX-SchemeMgr: 1.0 ChimeraX-SDF: 2.0 ChimeraX-Segger: 1.0 ChimeraX-Segment: 1.0 ChimeraX-SeqView: 2.2 ChimeraX-Shape: 1.0.1 ChimeraX-Shell: 1.0 ChimeraX-Shortcuts: 1.0 ChimeraX-ShowAttr: 1.0 ChimeraX-ShowSequences: 1.0 ChimeraX-SideView: 1.0 ChimeraX-Smiles: 2.0 ChimeraX-SmoothLines: 1.0 ChimeraX-SpaceNavigator: 1.0 ChimeraX-StdCommands: 1.0.4 ChimeraX-STL: 1.0 ChimeraX-Storm: 1.0 ChimeraX-Struts: 1.0 ChimeraX-Surface: 1.0 ChimeraX-SwapAA: 2.0 ChimeraX-SwapRes: 2.0 ChimeraX-TapeMeasure: 1.0 ChimeraX-Test: 1.0 ChimeraX-Toolbar: 1.0 ChimeraX-ToolshedUtils: 1.0 ChimeraX-Tug: 1.0 ChimeraX-UI: 1.2.3 ChimeraX-uniprot: 2.0 ChimeraX-ViewDockX: 1.0 ChimeraX-Vive: 1.1 ChimeraX-VolumeMenu: 1.0 ChimeraX-VTK: 1.0 ChimeraX-WavefrontOBJ: 1.0 ChimeraX-WebCam: 1.0 ChimeraX-WebServices: 1.0 ChimeraX-Zone: 1.0 colorama: 0.4.3 comtypes: 1.1.7 cxservices: 1.0 cycler: 0.10.0 Cython: 0.29.20 decorator: 4.4.2 distlib: 0.3.1 distro: 1.5.0 docutils: 0.16 filelock: 3.0.12 funcparserlib: 0.3.6 grako: 3.16.5 graphviz: 0.14.1 h5py: 2.10.0 html2text: 2020.1.16 idna: 2.10 ihm: 0.16 imagecodecs: 2020.5.30 imagecodecs-lite: 2020.1.31 imagesize: 1.2.0 ipykernel: 5.3.0 ipython: 7.15.0 ipython-genutils: 0.2.0 jedi: 0.17.2 Jinja2: 2.11.2 jupyter-client: 6.1.3 jupyter-core: 4.6.3 kiwisolver: 1.2.0 line-profiler: 2.1.2 lxml: 4.5.1 MarkupSafe: 1.1.1 matplotlib: 3.2.1 msgpack: 1.0.0 netifaces: 0.10.9 networkx: 2.4 numexpr: 2.7.1 numpy: 1.18.5 numpydoc: 1.0.0 objgraph: 3.4.1 openvr: 1.12.501 packaging: 20.4 ParmEd: 3.2.0 parso: 0.7.1 pexpect: 4.8.0 pickleshare: 0.7.5 Pillow: 7.1.2 pip: 20.2.2 pkginfo: 1.5.0.1 prompt-toolkit: 3.0.7 psutil: 5.7.0 ptyprocess: 0.6.0 pycollada: 0.7.1 pydicom: 2.0.0 Pygments: 2.6.1 PyOpenGL: 3.1.5 PyOpenGL-accelerate: 3.1.5 pyparsing: 2.4.7 PyQt5-commercial: 5.12.3 PyQt5-sip: 4.19.19 PyQtWebEngine-commercial: 5.12.1 python-dateutil: 2.8.1 pytz: 2020.1 pyzmq: 19.0.2 qtconsole: 4.7.4 QtPy: 1.9.0 RandomWords: 0.3.0 requests: 2.24.0 scipy: 1.4.1 Send2Trash: 1.5.0 SEQCROW: 0.24.3 setuptools: 49.4.0 sfftk-rw: 0.6.6.dev0 six: 1.15.0 snowballstemmer: 2.0.0 sortedcontainers: 2.2.2 Sphinx: 3.1.1 sphinxcontrib-applehelp: 1.0.2 sphinxcontrib-blockdiag: 2.0.0 sphinxcontrib-devhelp: 1.0.2 sphinxcontrib-htmlhelp: 1.0.3 sphinxcontrib-jsmath: 1.0.1 sphinxcontrib-qthelp: 1.0.3 sphinxcontrib-serializinghtml: 1.1.4 suds-jurko: 0.6 tables: 3.6.1 tifffile: 2020.6.3 tinyarray: 1.2.2 tornado: 6.0.4 traitlets: 5.0.4 urllib3: 1.25.10 versioneer: 0.18 wcwidth: 0.2.5 webcolors: 1.11.1 wheel: 0.34.2 File attachment: 6xra_rebuilt_glycan_chains_reassigned.cif
Attachments (4)
Change History (12)
by , 5 years ago
Attachment: | 6xra_rebuilt_glycan_chains_reassigned.cif added |
---|
comment:1 by , 5 years ago
Component: | Unassigned → Input/Output |
---|---|
Owner: | set to |
Platform: | → all |
Project: | → ChimeraX |
Status: | new → assigned |
Summary: | ChimeraX bug report submission → mmCIF format-compliance issues |
comment:2 by , 5 years ago
I agree that a warning about chain identifiers of more than four characters is appropriate. ChimeraX doesn't care. And it is a bug if the label_asym_id's are reused.
Can you provide a session file that reproduces the error? The mmCIF writer generates its output based on ChimeraX's internal data structures. So it would be useful to be able to track where that is going wrong.
comment:3 by , 5 years ago
Only for ChimeraX 1.1 at this stage (it's an ISOLDE session). ________________________________ From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu> Sent: 09 March 2021 18:42 Cc: gregc@cgl.ucsf.edu <gregc@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk> Subject: Re: [ChimeraX] #4342: mmCIF format-compliance issues #4342: mmCIF format-compliance issues ------------------------------------+------------------------ Reporter: Tristan Croll | Owner: Greg Couch Type: defect | Status: assigned Priority: normal | Milestone: Component: Input/Output | Version: Resolution: | Keywords: Blocked By: | Blocking: Notify when closed: | Platform: all Project: ChimeraX | ------------------------------------+------------------------ Comment (by Greg Couch): I agree that a warning about chain identifiers of more than four characters is appropriate. ChimeraX doesn't care. And it is a bug if the label_asym_id's are reused. Can you provide a session file that reproduces the error? The mmCIF writer generates its output based on ChimeraX's internal data structures. So it would be useful to be able to track where that is going wrong. -- Ticket URL: <https://plato.cgl.ucsf.edu/trac/ChimeraX/ticket/4342#comment:2> ChimeraX <http://www.rbvi.ucsf.edu/chimerax/> ChimeraX Issue Tracker
follow-up: 3 comment:4 by , 5 years ago
Cc: | added |
---|
comment:5 by , 5 years ago
... but in the meantime I can give you a rundown of the different things that I did with the model, in case it gives any clues: * Some of the glycans were not linked to their parent ASN - added those bonds (probably not the cause, since this is a common problem) * Some of the glycans had one terminal MAN unlinked - this had caused the wwPDB's new glycan remediation scheme to assign those MANs to different chain IDs. I linked them back to the parent glycan tree (without reassigning chain IDs at that stage) * For all the glycans built past the first two NAGs, the beta-mannose (BMA) had been incorrectly modelled as MAN. I deleted the MAN (which caused the creation of a missing-structure pseudobond), added and linked in the correct BMA, and manually deleted the pseudobond. * Ran a script to reassign chain IDs and renumber the glycans according to their final bonding (but this script did not reorder residues in memory - something I should probably change) Hope this helps. I can try to replicate the scenario in a daily build without ISOLDE's involvement, but not until next week at the earliest. Have to wrap up some work to help a particularly highly-strung PhD student. * ________________________________ From: Tristan Croll <tic20@cam.ac.uk> Sent: 09 March 2021 18:47 To: ChimeraX-bugs@cgl.ucsf.edu <ChimeraX-bugs@cgl.ucsf.edu> Subject: Re: [ChimeraX] #4342: mmCIF format-compliance issues Only for ChimeraX 1.1 at this stage (it's an ISOLDE session). ________________________________ From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu> Sent: 09 March 2021 18:42 Cc: gregc@cgl.ucsf.edu <gregc@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk> Subject: Re: [ChimeraX] #4342: mmCIF format-compliance issues #4342: mmCIF format-compliance issues ------------------------------------+------------------------ Reporter: Tristan Croll | Owner: Greg Couch Type: defect | Status: assigned Priority: normal | Milestone: Component: Input/Output | Version: Resolution: | Keywords: Blocked By: | Blocking: Notify when closed: | Platform: all Project: ChimeraX | ------------------------------------+------------------------ Comment (by Greg Couch): I agree that a warning about chain identifiers of more than four characters is appropriate. ChimeraX doesn't care. And it is a bug if the label_asym_id's are reused. Can you provide a session file that reproduces the error? The mmCIF writer generates its output based on ChimeraX's internal data structures. So it would be useful to be able to track where that is going wrong. -- Ticket URL: <https://plato.cgl.ucsf.edu/trac/ChimeraX/ticket/4342#comment:2> ChimeraX <http://www.rbvi.ucsf.edu/chimerax/> ChimeraX Issue Tracker
follow-up: 5 comment:6 by , 5 years ago
A ChimeraX 1.1 session using ISOLDE would work. It "should" restore in 1.2, even without ISOLDE. That will be another test.
comment:7 by , 5 years ago
An ISOLDE session saved from ChimeraX 1.1 won't open in the latest daily build - it just gives a message in the log saying I need to install ISOLDE in order to open it. But the good news is that the offending session didn't involve ISOLDE. There's a separate bug which I've been holding back on reporting until I can (a) update to the daily builds and (b) make sure it's not something I'm doing wrong: in an ISOLDE session where I've been busily adding and removing ligands, if I attempt to change the chain_ids of all residues with polymer_type==Residue.PT_NONE, I still get the error message: "ValueError: Cannot set polymeric chain ID directly from Residue; must use Chain". But if I save the model, reopen in a separate ChimeraX instance and then run the same script it works correctly. After some messing around with different approaches, I've found that if I save to mmCIF, load, run my reassignment script, then save back to mmCIF then the label_asym_ids appear strictly unique. But if the intermediate file is in PDB format, then the final mmCIF exhibits the problem. Attaching a daily-build session file, the input PDB, and the code I actually use to reassign the chain IDs. The latter is part of the ISOLDE tree so isn't implemented as executable, but is standalone. You just need to run recluster_ligands(session, model). ________________________________ From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu> Sent: 10 March 2021 00:21 Cc: gregc@cgl.ucsf.edu <gregc@cgl.ucsf.edu>; pett@cgl.ucsf.edu <pett@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk> Subject: Re: [ChimeraX] #4342: mmCIF format-compliance issues #4342: mmCIF format-compliance issues ------------------------------------+------------------------ Reporter: Tristan Croll | Owner: Greg Couch Type: defect | Status: assigned Priority: normal | Milestone: Component: Input/Output | Version: Resolution: | Keywords: Blocked By: | Blocking: Notify when closed: | Platform: all Project: ChimeraX | ------------------------------------+------------------------ Comment (by Greg Couch): A ChimeraX 1.1 session using ISOLDE would work. It "should" restore in 1.2, even without ISOLDE. That will be another test. -- Ticket URL: <https://plato.cgl.ucsf.edu/trac/ChimeraX/ticket/4342#comment:6> ChimeraX <http://www.rbvi.ucsf.edu/chimerax/> ChimeraX Issue Tracker
follow-up: 7 comment:8 by , 3 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
The molecular data in the session file was unexpected by the mmCIF writing code. The mmcif writing code uses the cached Residue.mmcif_chain_id (label_asym_id) value to write out mmCIF files with the same label_asym_id as was read in. When molecular editing is done, the default mmcif_chain_id value is the same as the author chain id (auth_asym_id), and that might conflict. In this session there are different non-polymer HET residues that have the same mmcif_chain_id. That was not accounted for.
In the mmCIF output, the different HET residues are different entities and thus must have different label_asym_id's. Another case where different label_asym_id's are needed is if the non-polymer residues are associated with a particular polymer chain, then the those residues should have a separate label_asym_id per chain (that is different from the chain's label_asym_id).
Revised the mmCIF writing code to be less trusting of the mmcif_chain_id value, so a legal mmCIF file is written.
Added by email2trac