Opened 5 years ago
Closed 5 years ago
#3891 closed defect (fixed)
Session size 8x larger ChimeraX 1.0 vs 0.92
| Reported by: | Tom Goddard | Owned by: | Tom Goddard |
|---|---|---|---|
| Priority: | moderate | Milestone: | |
| Component: | Sessions | Version: | |
| Keywords: | Cc: | Greg Couch, Eric Pettersen | |
| Blocked By: | Blocking: | ||
| Notify when closed: | Platform: | all | |
| Project: | ChimeraX |
Description
Opening 3j3q and saving a session makes a 50 Mbyte session file in ChimeraX 0.92 but a 400 Mbyte file in ChimeraX 1.0. What happened?
Change History (11)
comment:1 by , 5 years ago
comment:3 by , 5 years ago
| Cc: | added |
|---|---|
| Owner: | changed from to |
What happened is that you disabled the builtin internal compression that session files used to have.
follow-up: 5 comment:5 by , 5 years ago
Could use lz4 compression. The slow gzip compression I eliminated made saving and restore many times slower than without compression.
comment:6 by , 5 years ago
lz4 is good. There's a nice tabular comparison of different compression methods at https://www.phoronix.com/forums/forum/phoronix/latest-phoronix-articles/1215332-canonical-s-snap-packaging-switching-to-lzo-compression-for-faster-startup-times?p=1215370#post1215370. gzip uses zlib.
comment:7 by , 5 years ago
Here is a comparison of speed and file size of compression methods on a session containing 3j3q. All using command-line compression lz4, zstd, zip, gzip, bzip2, xz with default settings. Timings on a 2019 MacBook Pro , macOS 10.15.7, with SSD drive.
size compress decompress
test.cxs 399M
test.cxs.lz4 102M 0.485s 1.005s
test.cxs.zst 62M 0.841s 1.038s
test.cxs.zip 55M 5.879s 1.256s
test.cxs.gz 55M 5.442s 0.554s
test.cxs.bz2 43M 36.771s 6.126s
test.cxs.xz 28M 81.287s 2.705s
And here are times for comparison opening the mmCIF file, saving and opening the uncompressed session in current ChimeraX daily build.
time open 3j3q
command time 7.745 seconds
time save test.cxs
command time 8.364 seconds
time save test_gzip.cxc uncompressed false
command time 22.51 seconds
close
time open test.cxs
command time 5.201 seconds
close
time open test_gzip.cxs
command time 9.215 seconds
comment:8 by , 5 years ago
It looks like zstd might be best achieving higher compression ratio that lz4. But the PyPi zstd package does not have a stream compression API like the Python gzip, lz4 APIs that ChimeraX is setup to work with.
comment:9 by , 5 years ago
Notice that while the command-line gzip compression took 5.4 seconds, the gzip session compression in ChimeraX took 14 seconds (= 22.51 - 8.36). Not clear why the Python gzip is slower. Possibly gzip uses multiple threads when compressing a file but only a single thread when compressing a stream. Or maybe compression level or number of thread defaults are different between the command-line gzip and ChimeraX use of Python GzipFile(). At any rate, the command-line timings don't necessarily reflect speed of ChimeraX doing the compression so I will need to test the compression methods in ChimeraX on the stream.
Likewise opening the gzip compressed file took 9.2 seconds versus 5.2 uncompressed, 4 additional seconds, while the command-line gunzip only took 0.55 seconds to decompress the file.
Formerly ChimeraX used gzip compression on sessions by default and in this example the write time is 2.7 times longer and read time is 1.8 times longer than an uncompressed file.
comment:10 by , 5 years ago
Timings in ChimeraX saving and opening sessions uncompressed, gzip compressed, and lz4 compressed (using PyPy lz4 module).
Conclusion: lz4 save is 2% longer and open is 7% faster than uncompressed. gzip save is 148% longer and open is 4% slower than uncompressed.
File sizes: 399M uncompressed, 102M lz4, 55M gzip.
lz4 takes about the same time to save and is faster than read and 4 times smaller file size than uncompressed.
gzip is 2.5 times slower to write, as fast as uncompressed to read, and 7 times smaller file size.
Seems like enabling lz4 by default is the best choice. Users can specify gzip if smaller size is worth it in exchange for slower save.
open 3j3q
command time 7.823 seconds
# uncompressed
save test.cxs
command time 8.477 seconds
close
open test.cxs
command time 5.35 seconds
# restart
open 3j3q
command time 7.791 seconds
save test_lz4.cxs compress true
command time 8.585 seconds
close
open test_lz4.cxs
command time 4.988 seconds
# restart
open 3j3q
command time 7.883 seconds
save test_gz.cxs compress true compressionMethod gzip
command time 21.03 seconds
close
open test_gz.cxs
command time 5.563 seconds
comment:11 by , 5 years ago
| Resolution: | → fixed |
|---|---|
| Status: | assigned → closed |
Fixed.
Made lz4 the default compression when saving a session. Formerly the default was no compression. Long ago the default was gzip compression.
I removed the undocumented "uncompress" option from the save command for sessions and added the "compress" option with values 'lz4', 'gzip', or 'none'.
Assigned to Eric since he might have some ideas of more stuff that went into the file (registered attributes?). But it might be a msgpack issue and Greg would be the one to look at that.