#35 closed defect (duplicate)
mmCIF Reader Master Ticket
Reported by: | Scooter Morris | Owned by: | Greg Couch |
---|---|---|---|
Priority: | blocker | Milestone: | Initial Tools |
Component: | Input/Output | Version: | |
Keywords: | Cc: | chimera-programmers | |
Blocked By: | Blocking: | ||
Notify when closed: | Platform: | all | |
Project: | chimera |
Description
Change History (9)
comment:1 by , 11 years ago
comment:2 by , 11 years ago
A readcif library has been written and put in the git repository that parsing CIF/mmCIF files. It it 1.2-1.5x slower than the hydra code, but can handle any legal CIF/mmCIF file.
The next step is to integrate mmCIF templates, which reuses the readcif code, but parses different tables and connect the appropriate residues.
comment:3 by , 11 years ago
mmCIF templates are in, checking to see if the templates are up-to-date has been disabled for now for speed.
Added code to propagate bonds to secondary models of NMR ensembles.
Need to add given secondary structure assignments, but there's no ribbon representation to use that information yet.
mmCIF files can have hydrogen bonds, plan to save them so they don't have to be computed. Analysis tools may wish to recompute the hydrogen bonds to have a consistent criteria. Much like running ksdssp for secondary structure.
The next step is to walk the PDB and mmCIF databases and see if the PDB reader and the mmCIF reader give the same results. Waiting for a blob interface to psuedobonds, so those can be compared too.
comment:5 by , 10 years ago
support for generic mmCIF tables is in. The list of extra categories that are needed is passed in when opening the mmCIF file and the data is returned via the AtomicStructure's metadata dictionary. mmCIF categories are two entries in the metadata dictionary. The category's name as a key returns the vector/list of column headers, and the category's name concatenated with ' data' is a vector/list with all of the rows of data concatenated.
comment:6 by , 10 years ago
Request by Oliver Clarke <olibclarke@…> on 11/11/15 in the chimera-users mailing list:
Hi all,
As discussed on the ccp4bb (https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ccp4bb;2223469c.1511), the mmCIF format incorporates an atom_sites_foornote record (http://mmcif.wwpdb.org/dictionaries/mmcif_nmr-star.dic/Categories/atom_sites_footnote.html) which allows for the direct association of specific comments/annotations with particular atoms/residues.
As part of expanding mmCIF support in Chimera2, would it be possible to consider incorporating the capacity to read, write, and display these records as labels or notes? This could be a great way of incorporating built-in-annotations in a structure pertaining to specific features that could potentially be readable by other software such as Coot and pymol.
Cheers,
Oliver.
Analysis
Currently, the PDB only provides the free-form text for the footnotes and is missing the computer-readable information about which atoms and residues are associated with the footnote. The specification refers to a conformer_family_coord_set category that would hold the information, and those, in turn, refer to data in save frames. There are no examples of how that would work. And other than in mmCIF dictionaries, there are no save frames in mmCIF files.
So, right now, to show a footnote's associated residues, we would to parse the free-form text and hope we got the right result.
comment:7 by , 10 years ago
More from Oliver Clark:
Relating to this, I wonder whether it is possible to find a home for attributes calculated in Chimera 2 in one of the mmCIF fields?
I mean that's the advantage of mmCIF in theory right, that it is extensible and can support attributes beyond just position and B-factor?
This would make transferring custom defined or calculated attributes between different chimera sessions much easier I think, if they were inherent to the file containing the coordinates, and again would at least allow for the possibility of data exchange with other programs.
Oliver.
comment:8 by , 9 years ago
Resolution: | → duplicate |
---|---|
Status: | new → closed |
No longer using "master" tickets. Instead use individual tickets for specific features.
comment:9 by , 7 years ago
Component: | mmCIF Reader/Writer → Input/Output |
---|
reducing # of categories, so lumping into I/O category
Minimal mmcif reading code was put in Hydra to determine the best speed and memory use that could be expected, achieving about 1 million atoms per second, with in memory use about the same as the mmcif file size. By far the most lines to be parsed are in the mmCIF _atom_site table, typically accounting for 90% or more of the lines in the file. That table is parsed in Hydra by C++ code hydra/_image3d/parsecif.cpp. Additional tables _entity, _struct_asym and _entity_poly_seq are read in Python code (hydra/file_io/mmcif.py) for getting chain sequences (including residues with no coordinate atom_site data).
My early tests of the RCSB cifparse-obj C++ library suggested it was about 15 times slower reading a large 2.4 million atom mmCIF file (HIV capsid 3j3q) than the Hydra hack code. Here's the slides for the talk I gave about mmCIF use at an RCSB workshop Fall 2013:
http://www.cgl.ucsf.edu/chimera/data/mmcif-oct2013/mmcif.html