Opened 5 years ago

Closed 5 years ago

#3891 closed defect (fixed)

Session size 8x larger ChimeraX 1.0 vs 0.92

Reported by: Tom Goddard Owned by: Tom Goddard
Priority: moderate Milestone:
Component: Sessions Version:
Keywords: Cc: Greg Couch, Eric Pettersen
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

Opening 3j3q and saving a session makes a 50 Mbyte session file in ChimeraX 0.92 but a 400 Mbyte file in ChimeraX 1.0. What happened?

Change History (11)

comment:1 by Tom Goddard, 5 years ago

Assigned to Eric since he might have some ideas of more stuff that went into the file (registered attributes?). But it might be a msgpack issue and Greg would be the one to look at that.

comment:2 by Tom Goddard, 5 years ago

For reference 3j3q.cif is about 240 Mbytes.

comment:3 by Eric Pettersen, 5 years ago

Cc: Eric Pettersen added
Owner: changed from Eric Pettersen to Tom Goddard

What happened is that you disabled the builtin internal compression that session files used to have.

in reply to:  4 ; comment:4 by goddard@…, 5 years ago

Oh yeah.  We could use some blazing fast BLOSC compression.

in reply to:  5 ; comment:5 by goddard@…, 5 years ago

Could use lz4 compression.  The slow gzip compression I eliminated made saving and restore many times slower than without compression.

comment:7 by Tom Goddard, 5 years ago

Here is a comparison of speed and file size of compression methods on a session containing 3j3q. All using command-line compression lz4, zstd, zip, gzip, bzip2, xz with default settings. Timings on a 2019 MacBook Pro , macOS 10.15.7, with SSD drive.

size compress decompress

test.cxs 399M
test.cxs.lz4 102M 0.485s 1.005s
test.cxs.zst 62M 0.841s 1.038s
test.cxs.zip 55M 5.879s 1.256s
test.cxs.gz 55M 5.442s 0.554s
test.cxs.bz2 43M 36.771s 6.126s
test.cxs.xz 28M 81.287s 2.705s

And here are times for comparison opening the mmCIF file, saving and opening the uncompressed session in current ChimeraX daily build.

time open 3j3q
command time 7.745 seconds

time save test.cxs
command time 8.364 seconds

time save test_gzip.cxc uncompressed false
command time 22.51 seconds

close
time open test.cxs
command time 5.201 seconds

close
time open test_gzip.cxs
command time 9.215 seconds

Last edited 5 years ago by Tom Goddard (previous) (diff)

comment:8 by Tom Goddard, 5 years ago

It looks like zstd might be best achieving higher compression ratio that lz4. But the PyPi zstd package does not have a stream compression API like the Python gzip, lz4 APIs that ChimeraX is setup to work with.

comment:9 by Tom Goddard, 5 years ago

Notice that while the command-line gzip compression took 5.4 seconds, the gzip session compression in ChimeraX took 14 seconds (= 22.51 - 8.36). Not clear why the Python gzip is slower. Possibly gzip uses multiple threads when compressing a file but only a single thread when compressing a stream. Or maybe compression level or number of thread defaults are different between the command-line gzip and ChimeraX use of Python GzipFile(). At any rate, the command-line timings don't necessarily reflect speed of ChimeraX doing the compression so I will need to test the compression methods in ChimeraX on the stream.

Likewise opening the gzip compressed file took 9.2 seconds versus 5.2 uncompressed, 4 additional seconds, while the command-line gunzip only took 0.55 seconds to decompress the file.

Formerly ChimeraX used gzip compression on sessions by default and in this example the write time is 2.7 times longer and read time is 1.8 times longer than an uncompressed file.

Last edited 5 years ago by Tom Goddard (previous) (diff)

comment:10 by Tom Goddard, 5 years ago

Timings in ChimeraX saving and opening sessions uncompressed, gzip compressed, and lz4 compressed (using PyPy lz4 module).
Conclusion: lz4 save is 2% longer and open is 7% faster than uncompressed. gzip save is 148% longer and open is 4% slower than uncompressed.
File sizes: 399M uncompressed, 102M lz4, 55M gzip.

lz4 takes about the same time to save and is faster than read and 4 times smaller file size than uncompressed.
gzip is 2.5 times slower to write, as fast as uncompressed to read, and 7 times smaller file size.

Seems like enabling lz4 by default is the best choice. Users can specify gzip if smaller size is worth it in exchange for slower save.

open 3j3q
command time 7.823 seconds

# uncompressed
save test.cxs
command time 8.477 seconds

close

open test.cxs
command time 5.35 seconds

# restart

open 3j3q
command time 7.791 seconds

save test_lz4.cxs compress true
command time 8.585 seconds

close

open test_lz4.cxs
command time 4.988 seconds

# restart

open 3j3q
command time 7.883 seconds

save test_gz.cxs compress true compressionMethod gzip
command time 21.03 seconds

close

open test_gz.cxs
command time 5.563 seconds

comment:11 by Tom Goddard, 5 years ago

Resolution: fixed
Status: assignedclosed

Fixed.

Made lz4 the default compression when saving a session. Formerly the default was no compression. Long ago the default was gzip compression.

I removed the undocumented "uncompress" option from the save command for sessions and added the "compress" option with values 'lz4', 'gzip', or 'none'.

Note: See TracTickets for help on using tickets.