Opened 8 years ago
Last modified 8 years ago
#811 assigned enhancement
Optimize cmap hdf5 format read speed for light microscopy data
Reported by: | Tom Goddard | Owned by: | Tom Goddard |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | Volume Data | Version: | |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Notify when closed: | Platform: | all | |
Project: | ChimeraX |
Description
The Chimera map format (*.cmap) HDF5 seems to be slow reading subsampled copies of images. For example, using 2048 x 1889 x 1999 8-bit ratbrain.cmap and reading the step 2 subsampled copy (921 Mbytes), takes about 22 seconds, while reading the step 2 version in MRC format takes about 9 seconds. This may be because the cmap 3d arrays are fragmented across the disk because the code that writes them writes a slab of planes at a time for all resolutions. So the highest resolution slab chunks in the HDF5 chunked array are probably interleaved on disk with the step 2 resolution chunks.
Important to get the fastest possible read speed for light microscopy data sets that are many Gbytes. Try writing each resolution completely without interleaving. Try larger chunk size (currently using tiny 64 Kbytes).
Timings for reading step 2 ratbrain data (1024 x 943 x 999, 921 Mbytes). Not including display time. Used "sudo purge" on Mac OS between tests to clear disk cache.
Removing status messages every 7% saved 0.3 seconds.
So sequential write of arrays saved about 7.5 seconds, and 4 Mbyte chunks instead of 64 Kbytes saved another 4.5 seconds, getting to about 5% slower than reading an MRC file.