Opened 5 years ago

Closed 5 years ago

#3372 closed defect (fixed)

Writing mmCIF file takes 35 minutes

Reported by: goddard@… Owned by: Greg Couch
Priority: normal Milestone:
Component: Input/Output Version:
Keywords: Cc:
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

The following bug report has been submitted:
Platform:        Darwin-19.4.0-x86_64-i386-64bit
ChimeraX Version: 1.1.dev202006052309 (2020-06-05 23:09:04 UTC)
Description
It took an extremely long  time, 35 minutes to write a 600,000 atom mmCIF file (52 Mbytes).  Something strange is going on since after 5 minutes I looked at the partially written file and all the atoms were written.  The next 30 minutes just managed to write one "#" character.

I've attached the file.  You won't be able to open it without another excruciatingly long wait unless you use the "autoStyle false" open command option because nucleotides will try to show on this 30000 nucleotide RNA strand.

Log:
UCSF ChimeraX version: 1.1.dev202006052309 (2020-06-05)  
© 2016-2020 Regents of the University of California. All rights reserved.  
How to cite UCSF ChimeraX  

> rna path ~/ucsf/presentations/sars-rna-may2020/stems loopPattern horseshoe
> pattern sphere randomBranchTilt 30

RNA sphere radius 428.7  

> rna model ~/ucsf/presentations/sars-rna-may2020/sequence.fasta #1

Missing or incomplete entity_poly_seq table. Inferred polymer connectivity.  

Chain information for RNA #2  
---  
Chain | Description  
I | No description available  
  

> hide #1 models

> save /Users/goddard/Desktop/sarscov2_rna.cif models #2

Not saving entity_poly_seq for non-authoritative sequences  

> select clear




OpenGL version: 4.1 ATI-3.8.24
OpenGL renderer: AMD Radeon Pro Vega 20 OpenGL Engine
OpenGL vendor: ATI Technologies Inc.Hardware:

    Hardware Overview:

      Model Name: MacBook Pro
      Model Identifier: MacBookPro15,3
      Processor Name: 8-Core Intel Core i9
      Processor Speed: 2.4 GHz
      Number of Processors: 1
      Total Number of Cores: 8
      L2 Cache (per Core): 256 KB
      L3 Cache: 16 MB
      Hyper-Threading Technology: Enabled
      Memory: 32 GB
      Boot ROM Version: 1037.100.362.0.0 (iBridge: 17.16.14281.0.0,0)

Software:

    System Software Overview:

      System Version: macOS 10.15.4 (19E287)
      Kernel Version: Darwin 19.4.0
      Time since boot: 27 days 21:18

Graphics/Displays:

    Intel UHD Graphics 630:

      Chipset Model: Intel UHD Graphics 630
      Type: GPU
      Bus: Built-In
      VRAM (Dynamic, Max): 1536 MB
      Vendor: Intel
      Device ID: 0x3e9b
      Revision ID: 0x0002
      Automatic Graphics Switching: Supported
      gMux Version: 5.0.0
      Metal: Supported, feature set macOS GPUFamily2 v1

    Radeon Pro Vega 20:

      Chipset Model: Radeon Pro Vega 20
      Type: GPU
      Bus: PCIe
      PCIe Lane Width: x8
      VRAM (Total): 4 GB
      Vendor: AMD (0x1002)
      Device ID: 0x69af
      Revision ID: 0x00c0
      ROM Revision: 113-D2060I-087
      VBIOS Version: 113-D20601MA0T-016
      Option ROM Version: 113-D20601MA0T-016
      EFI Driver Version: 01.01.087
      Automatic Graphics Switching: Supported
      gMux Version: 5.0.0
      Metal: Supported, feature set macOS GPUFamily2 v1
      Displays:
        Color LCD:
          Display Type: Built-In Retina LCD
          Resolution: 2880 x 1800 Retina
          Framebuffer Depth: 24-Bit Color (ARGB8888)
          Main Display: Yes
          Mirror: Off
          Online: Yes
          Automatically Adjust Brightness: No
          Connection Type: Internal

PyQt version: 5.12.3
Compiled Qt version: 5.12.4
Runtime Qt version: 5.12.8
File attachment: sarscov2_rna.cif

sarscov2_rna.cif

Attachments (1)

sarscov2_rna.cif (50.1 MB ) - added by goddard@… 5 years ago.
Added by email2trac

Change History (9)

by goddard@…, 5 years ago

Attachment: sarscov2_rna.cif added

Added by email2trac

in reply to:  2 comment:1 by goddard@…, 5 years ago

Saving a PDB file of this same model took 12 seconds.

Saving a session for this model took 1 second and opened in about 1 second.

comment:2 by Tom Goddard, 5 years ago

Component: UnassignedInput/Output
Owner: set to Greg Couch
Platform: all
Project: ChimeraX
Status: newassigned
Summary: ChimeraX bug report submissionWriting mmCIF file takes 35 minutes

comment:3 by Greg Couch, 5 years ago

Status: assignedaccepted

I suspect a lot of the time is in making the columns fixed width for faster reading. There should be an option to turn that off. Will investigate.

comment:4 by pett, 5 years ago

Since most of the time was _after_ the atoms were written (according to Tom), wouldn't that make fixed-width columns an unlikely suspect?

comment:5 by Greg Couch, 5 years ago

There's not much after writing the atom_site table. But that could be the slow part. There's scanning all of the bonds for disulfide and non-standard inter-residue linkages. And building the secondary structure tables. Scanning all of the bonds is looking at each bond individually. Is there a way to get all of the inter-residue bonds without instantiating them?

comment:6 by Greg Couch, 5 years ago

80% of the time writing out a mmCIF table is from making sure each data value is formatted correctly (i.e., using the mmCIF.quote function). So that will be the next thing optimized.

comment:7 by Greg Couch, 5 years ago

Digger deeper, the slow part is where every bond is examined to see if it is a non-standard bond. Those are disulfide bonds, bonds in non-standard residues, bonds between chains, bonds between non-adjacent residues in chain, and non-polymeric bonds. Currently a Python object is made for every atom and bond. Thinking out loud, perhaps a "generator" can be written in C++ that does the work.

comment:8 by Greg Couch, 5 years ago

Resolution: fixed
Status: acceptedclosed

Implemented scanning for non-standard inter-residue bonds in C++ so Python objects don't need to be created for every atom, bond, residue, and chain. I was unwilling to wait 35 minutes to confirm that it was that bad before, but with the new code it took 4:05 minutes to write out the mmCIF file. Using "fixedWidth false" lowered it to 3:59. I believe my computer is slower, so it should be more than 8X faster on your computer.

Also, as part of this, Cython was used to speed up the mmCIF code. In particular, using Cython to compile the CIFTable class speed things up.

To get things even faster, a larger rewrite of the mmCIF writing code into C++ would be needed.

Note: See TracTickets for help on using tickets.