Opened 5 years ago
Closed 5 years ago
#3372 closed defect (fixed)
Writing mmCIF file takes 35 minutes
Reported by: | Owned by: | Greg Couch | |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Input/Output | Version: | |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Notify when closed: | Platform: | all | |
Project: | ChimeraX |
Description
The following bug report has been submitted: Platform: Darwin-19.4.0-x86_64-i386-64bit ChimeraX Version: 1.1.dev202006052309 (2020-06-05 23:09:04 UTC) Description It took an extremely long time, 35 minutes to write a 600,000 atom mmCIF file (52 Mbytes). Something strange is going on since after 5 minutes I looked at the partially written file and all the atoms were written. The next 30 minutes just managed to write one "#" character. I've attached the file. You won't be able to open it without another excruciatingly long wait unless you use the "autoStyle false" open command option because nucleotides will try to show on this 30000 nucleotide RNA strand. Log: UCSF ChimeraX version: 1.1.dev202006052309 (2020-06-05) © 2016-2020 Regents of the University of California. All rights reserved. How to cite UCSF ChimeraX > rna path ~/ucsf/presentations/sars-rna-may2020/stems loopPattern horseshoe > pattern sphere randomBranchTilt 30 RNA sphere radius 428.7 > rna model ~/ucsf/presentations/sars-rna-may2020/sequence.fasta #1 Missing or incomplete entity_poly_seq table. Inferred polymer connectivity. Chain information for RNA #2 --- Chain | Description I | No description available > hide #1 models > save /Users/goddard/Desktop/sarscov2_rna.cif models #2 Not saving entity_poly_seq for non-authoritative sequences > select clear OpenGL version: 4.1 ATI-3.8.24 OpenGL renderer: AMD Radeon Pro Vega 20 OpenGL Engine OpenGL vendor: ATI Technologies Inc.Hardware: Hardware Overview: Model Name: MacBook Pro Model Identifier: MacBookPro15,3 Processor Name: 8-Core Intel Core i9 Processor Speed: 2.4 GHz Number of Processors: 1 Total Number of Cores: 8 L2 Cache (per Core): 256 KB L3 Cache: 16 MB Hyper-Threading Technology: Enabled Memory: 32 GB Boot ROM Version: 1037.100.362.0.0 (iBridge: 17.16.14281.0.0,0) Software: System Software Overview: System Version: macOS 10.15.4 (19E287) Kernel Version: Darwin 19.4.0 Time since boot: 27 days 21:18 Graphics/Displays: Intel UHD Graphics 630: Chipset Model: Intel UHD Graphics 630 Type: GPU Bus: Built-In VRAM (Dynamic, Max): 1536 MB Vendor: Intel Device ID: 0x3e9b Revision ID: 0x0002 Automatic Graphics Switching: Supported gMux Version: 5.0.0 Metal: Supported, feature set macOS GPUFamily2 v1 Radeon Pro Vega 20: Chipset Model: Radeon Pro Vega 20 Type: GPU Bus: PCIe PCIe Lane Width: x8 VRAM (Total): 4 GB Vendor: AMD (0x1002) Device ID: 0x69af Revision ID: 0x00c0 ROM Revision: 113-D2060I-087 VBIOS Version: 113-D20601MA0T-016 Option ROM Version: 113-D20601MA0T-016 EFI Driver Version: 01.01.087 Automatic Graphics Switching: Supported gMux Version: 5.0.0 Metal: Supported, feature set macOS GPUFamily2 v1 Displays: Color LCD: Display Type: Built-In Retina LCD Resolution: 2880 x 1800 Retina Framebuffer Depth: 24-Bit Color (ARGB8888) Main Display: Yes Mirror: Off Online: Yes Automatically Adjust Brightness: No Connection Type: Internal PyQt version: 5.12.3 Compiled Qt version: 5.12.4 Runtime Qt version: 5.12.8 File attachment: sarscov2_rna.cif
Attachments (1)
Change History (9)
by , 5 years ago
Attachment: | sarscov2_rna.cif added |
---|
comment:1 by , 5 years ago
Saving a PDB file of this same model took 12 seconds. Saving a session for this model took 1 second and opened in about 1 second.
follow-up: 1 comment:2 by , 5 years ago
Component: | Unassigned → Input/Output |
---|---|
Owner: | set to |
Platform: | → all |
Project: | → ChimeraX |
Status: | new → assigned |
Summary: | ChimeraX bug report submission → Writing mmCIF file takes 35 minutes |
comment:3 by , 5 years ago
Status: | assigned → accepted |
---|
I suspect a lot of the time is in making the columns fixed width for faster reading. There should be an option to turn that off. Will investigate.
comment:4 by , 5 years ago
Since most of the time was _after_ the atoms were written (according to Tom), wouldn't that make fixed-width columns an unlikely suspect?
comment:5 by , 5 years ago
There's not much after writing the atom_site table. But that could be the slow part. There's scanning all of the bonds for disulfide and non-standard inter-residue linkages. And building the secondary structure tables. Scanning all of the bonds is looking at each bond individually. Is there a way to get all of the inter-residue bonds without instantiating them?
comment:6 by , 5 years ago
80% of the time writing out a mmCIF table is from making sure each data value is formatted correctly (i.e., using the mmCIF.quote function). So that will be the next thing optimized.
comment:7 by , 5 years ago
Digger deeper, the slow part is where every bond is examined to see if it is a non-standard bond. Those are disulfide bonds, bonds in non-standard residues, bonds between chains, bonds between non-adjacent residues in chain, and non-polymeric bonds. Currently a Python object is made for every atom and bond. Thinking out loud, perhaps a "generator" can be written in C++ that does the work.
comment:8 by , 5 years ago
Resolution: | → fixed |
---|---|
Status: | accepted → closed |
Implemented scanning for non-standard inter-residue bonds in C++ so Python objects don't need to be created for every atom, bond, residue, and chain. I was unwilling to wait 35 minutes to confirm that it was that bad before, but with the new code it took 4:05 minutes to write out the mmCIF file. Using "fixedWidth false" lowered it to 3:59. I believe my computer is slower, so it should be more than 8X faster on your computer.
Also, as part of this, Cython was used to speed up the mmCIF code. In particular, using Cython to compile the CIFTable class speed things up.
To get things even faster, a larger rewrite of the mmCIF writing code into C++ would be needed.
Added by email2trac