#825 closed defect (fixed)
Incorrect planes read from OME TIFF file
| Reported by: | Tom Goddard | Owned by: | Tom Goddard |
|---|---|---|---|
| Priority: | major | Milestone: | |
| Component: | Volume Data | Version: | |
| Keywords: | Cc: | danielt@…, Graham.Johnson@… | |
| Blocked By: | Blocking: | ||
| Notify when closed: | Platform: | all | |
| Project: | ChimeraX |
Description
Reading an 8-channel OME TIFF file from the Allen Institute AICS-12_269.ome.tif with dimensions 924 x 624 x 47 with initial display at step 2 with first 3 channels initially shown the last plane (46) of channels 2 and 3 contain the wrong data, in fact, pixel values from the preceding channel plane 44.
Debug output shows that the ome tiff reader seeks to the correct plane but Pillow 4.2.1 (from PyPi) gives the wrong data.
Changing the ome tiff reading code so that it reopens the file instead of trying to seek backwards fixes the problem.
Change History (5)
comment:1 by , 8 years ago
comment:2 by , 8 years ago
| Resolution: | → fixed |
|---|---|
| Status: | assigned → closed |
I put in code to always reopen the TIFF file instead of seeking to an earlier 2D plane in the file. In order to get faster TIFF reading and less buggy reading I should look at directly using libtiff in the future.
comment:3 by , 7 years ago
Bug still exists in PIL 5.4.1.
The work-around of closing and reopening causes read of 3 channels to open the file and go through the headers of most all the images 3 times. This probably is greatly hurting performance, possibly 3x slower. Also subsequent reads (e.g step 1) will repeat the 3 scans of all the plane headers in the TIFF file. Without this work-around the headers would just be read once by PIL TIFFImagePlugin.py.
comment:4 by , 7 years ago
Patched PIL to fix its caching of image offsets and submitted bug report, test, and patch to PIL github.
Removed work-around in ChimeraX that reopened tiff image when seeking backwards. Much faster read of multichannel data now.
It makes no sense that the wrong plane data occurs as the last plane of channels 2,3,.... The reader first reads z planes for channel 1, then jumps back and reads z planes for channel 2, then jumps back and reads them for channel 3, .... The planes in the file for the 8 channels are interleaved. In the multi-image tiff the first plane is z=0, ch=0, the second plane is z=0, ch=1, third plane z=0, ch=2. A given channel has z-planes that are separated by 8-planes in the file.
The wrong plane data only comes after seeking forward through all of channel 1 planes, then jumping backward, then reading forward for 46 planes, then the first wrong plane occurs. This makes it seem unrelated to seeking backwards. So I don't trust that reopening the file instead of seeking backwards will avoid the Pillow bug in all cases.
The Pillow (Python Image library) multi-image TIFF reader has many many problems -- it seems very poorly implemented. Some of the problems we have encountered in the past are: 1) Can't read pixel types used in microscopy like unsigned 16-bit int, 2) closes the file unexpectedly after the last plane is read, 3) extremely poor performance (20x slower) copying pixel data to numpy arrays if the most natural syntax is used, 4) does not cache positions of plane headers so very poor performance seeking to specific planes, 5) cannot handle multiple Description tags, keeping only the last (this loses the OME XML header data), 6) cannot read some image compression methods (e.g. deflate) unless libtiff is available.
This poor implementation in Pillow is compounded by the TIFF format being poorly suited to saving large image data for many reasons: 1) the number of 2d images contained cannot be determined with out seeking throughout the entire file (the plane headers are in a linked list) making for slow opening of data, 2) no support for organizing 2D planes into 3D images, OME adds an XML header in a TIFF tag to work around this, 3) 32-bit offsets in file limit it to 4 Gbytes with the usual work-around of splitting across multiple files (another option is to use BigTIFF which is 64-bit but has almost no adoption), 4) no standard for saving subsampled planes for handling very large data at multiple resolutions.