Opened 8 years ago

Last modified 8 years ago

#811 assigned enhancement

Optimize cmap hdf5 format read speed for light microscopy data

Reported by: Tom Goddard Owned by: Tom Goddard
Priority: major Milestone:
Component: Volume Data Version:
Keywords: Cc:
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

The Chimera map format (*.cmap) HDF5 seems to be slow reading subsampled copies of images. For example, using 2048 x 1889 x 1999 8-bit ratbrain.cmap and reading the step 2 subsampled copy (921 Mbytes), takes about 22 seconds, while reading the step 2 version in MRC format takes about 9 seconds. This may be because the cmap 3d arrays are fragmented across the disk because the code that writes them writes a slab of planes at a time for all resolutions. So the highest resolution slab chunks in the HDF5 chunked array are probably interleaved on disk with the step 2 resolution chunks.

Important to get the fastest possible read speed for light microscopy data sets that are many Gbytes. Try writing each resolution completely without interleaving. Try larger chunk size (currently using tiny 64 Kbytes).

Change History (1)

comment:1 by Tom Goddard, 8 years ago

Timings for reading step 2 ratbrain data (1024 x 943 x 999, 921 Mbytes). Not including display time. Used "sudo purge" on Mac OS between tests to clear disk cache.

cmap file from Chimera: 21.8 sec
sequentially written subsample arrays, 64Kbyte chunk: 14.2 seconds
sequentially written subsample arrays with flush 64Kbyte chunk: 14.3 seconds
single chunked array in cmap (no subsamples, no full res), 64 Kb chunks: 10.15 seconds
sequentially written, 4 Mbyte chunks: 9.65 sec
MRC file (8-bit signed): 9.25 sec

Removing status messages every 7% saved 0.3 seconds.

So sequential write of arrays saved about 7.5 seconds, and 4 Mbyte chunks instead of 64 Kbytes saved another 4.5 seconds, getting to about 5% slower than reading an MRC file.

Note: See TracTickets for help on using tickets.