Context Navigation

← Previous Ticket
Next Ticket →

#35 closed defect (duplicate)

mmCIF Reader Master Ticket

Reported by:	Scooter Morris	Owned by:	Greg Couch
Priority:	blocker	Milestone:	Initial Tools
Component:	Input/Output	Version:
Keywords:		Cc:	chimera-programmers
Blocked By:		Blocking:
Notify when closed:		Platform:	all
Project:	chimera

Description

Change History (9)

comment:1 by Tom Goddard, 11 years ago

Minimal mmcif reading code was put in Hydra to determine the best speed and memory use that could be expected, achieving about 1 million atoms per second, with in memory use about the same as the mmcif file size. By far the most lines to be parsed are in the mmCIF _atom_site table, typically accounting for 90% or more of the lines in the file. That table is parsed in Hydra by C++ code hydra/_image3d/parsecif.cpp. Additional tables _entity, _struct_asym and _entity_poly_seq are read in Python code (hydra/file_io/mmcif.py) for getting chain sequences (including residues with no coordinate atom_site data).

My early tests of the RCSB cifparse-obj C++ library suggested it was about 15 times slower reading a large 2.4 million atom mmCIF file (HIV capsid 3j3q) than the Hydra hack code. Here's the slides for the talk I gave about mmCIF use at an RCSB workshop Fall 2013:

http://www.cgl.ucsf.edu/chimera/data/mmcif-oct2013/mmcif.html

comment:2 by Greg Couch, 11 years ago

A readcif library has been written and put in the git repository that parsing CIF/mmCIF files. It it 1.2-1.5x slower than the hydra code, but can handle any legal CIF/mmCIF file.

The next step is to integrate mmCIF templates, which reuses the readcif code, but parses different tables and connect the appropriate residues.

comment:3 by Greg Couch, 11 years ago

mmCIF templates are in, checking to see if the templates are up-to-date has been disabled for now for speed.

Added code to propagate bonds to secondary models of NMR ensembles.

Need to add given secondary structure assignments, but there's no ribbon representation to use that information yet.

mmCIF files can have hydrogen bonds, plan to save them so they don't have to be computed. Analysis tools may wish to recompute the hydrogen bonds to have a consistent criteria. Much like running ksdssp for secondary structure.

The next step is to walk the PDB and mmCIF databases and see if the PDB reader and the mmCIF reader give the same results. Waiting for a blob interface to psuedobonds, so those can be compared too.

comment:4 by Greg Couch, 10 years ago

Secondary structure assignment is in.

comment:5 by Greg Couch, 10 years ago

support for generic mmCIF tables is in. The list of extra categories that are needed is passed in when opening the mmCIF file and the data is returned via the AtomicStructure's metadata dictionary. mmCIF categories are two entries in the metadata dictionary. The category's name as a key returns the vector/list of column headers, and the category's name concatenated with ' data' is a vector/list with all of the rows of data concatenated.

comment:6 by Greg Couch, 10 years ago

Request by Oliver Clarke <olibclarke@…> on 11/11/15 in the chimera-users mailing list:

Hi all,

As discussed on the ccp4bb (https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ccp4bb;2223469c.1511), the mmCIF format incorporates an atom_sites_foornote record (http://mmcif.wwpdb.org/dictionaries/mmcif_nmr-star.dic/Categories/atom_sites_footnote.html) which allows for the direct association of specific comments/annotations with particular atoms/residues.

As part of expanding mmCIF support in Chimera2, would it be possible to consider incorporating the capacity to read, write, and display these records as labels or notes? This could be a great way of incorporating built-in-annotations in a structure pertaining to specific features that could potentially be readable by other software such as Coot and pymol.

Cheers,
Oliver.

Analysis

Currently, the PDB only provides the free-form text for the footnotes and is missing the computer-readable information about which atoms and residues are associated with the footnote. The specification refers to a conformer_family_coord_set category that would hold the information, and those, in turn, refer to data in save frames. There are no examples of how that would work. And other than in mmCIF dictionaries, there are no save frames in mmCIF files.

So, right now, to show a footnote's associated residues, we would to parse the free-form text and hope we got the right result.

comment:7 by Greg Couch, 10 years ago

comment:8 by Tom Goddard, 9 years ago

Resolution:	→ duplicate
Status:	new → closed

No longer using "master" tickets. Instead use individual tickets for specific features.

comment:9 by pett, 7 years ago

Component:	mmCIF Reader/Writer → Input/Output

reducing # of categories, so lumping into I/O category

Note: See TracTickets for help on using tickets.

Download in other formats: