Opened 6 years ago

Closed 6 years ago

Last modified 6 years ago

#2937 closed defect (fixed)

Segmentation fault in Point_List::bounding_box() for large models (?)

Reported by: Tristan Croll Owned by: Tom Goddard
Priority: major Milestone:
Component: Core Version:
Keywords: Cc:
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

I've been seeing a quite intermittent segmentation fault lately when associating a model with a cryo-EM map in Clipper, but every time I attempted to capture it with gdb it would stubbornly refuse to manifest (until now). I suspect it's more common for larger models, but I've seen it happen once during ISOLDE's flexible-fitting tutorial with 6mhz (910 residues). Anyway, I was finally able to reproduce it (at least, I assume it's the same crash) with:

isolde start
open 6q3g
open 4459 from emdb
clipper assoc #2 to #1

Note that this is a staggeringly large model/map (1 billion voxels) so be patient.

Stack trace:

Program received signal SIGSEGV, Segmentation fault.
0x00007fffec58aaca in Point_List::bounding_box() const [clone .part.9] () from /opt/UCSF/ChimeraX/lib/python3.7/site-packages/chimerax/core/geometry/_geometry.so
Missing separate debuginfos, use: debuginfo-install ucsf-chimerax-0.92-1.el7.x86_64
(gdb) bt
#0  0x00007fffec58aaca in Point_List::bounding_box() const [clone .part.9] () at /opt/UCSF/ChimeraX/lib/python3.7/site-packages/chimerax/core/geometry/_geometry.so
#1  0x00007fffec58c9c6 in find_close_points_boxes(Point_List const&, Point_List const&, float, Index_Set&, Index_Set&, Nearest_Points*) ()
    at /opt/UCSF/ChimeraX/lib/python3.7/site-packages/chimerax/core/geometry/_geometry.so
#2  0x00007fffec58d3af in find_close_points_boxes(Point_List const&, Point_List const&, float, Index_Set&, Index_Set&, Nearest_Points*) ()
    at /opt/UCSF/ChimeraX/lib/python3.7/site-packages/chimerax/core/geometry/_geometry.so
#3  0x00007fffec58fd98 in find_close_points(Close_Points_Method, float const*, int, float const*, int, float, float*, std::vector<int, std::allocator<int> >*, std::vector<int, std::allocator<int> >*, std::vector<int, std::allocator<int> >*) () at /opt/UCSF/ChimeraX/lib/python3.7/site-packages/chimerax/core/geometry/_geometry.so
#4  0x00007fffec592cd7 in find_close_points () at /opt/UCSF/ChimeraX/lib/python3.7/site-packages/chimerax/core/geometry/_geometry.so
#5  0x00007ffff7906223 in _PyMethodDef_RawFastCallKeywords (method=0x7fffec5aa960 <Geometry_Cpp::geometry_cpp_methods+224>, self=<optimized out>, args=0x7ffe640c4318, nargs=3, kwnames=<optimized out>)
    at Objects/call.c:694
#6  0x00007ffff79062a5 in _PyCFunction_FastCallKeywords (func=0x7fffe56dfc30, args=<optimized out>, nargs=<optimized out>, kwnames=<optimized out>) at Objects/call.c:734
#7  0x00007ffff78df9ca in _PyEval_EvalFrameDefault (kwnames=0x0, oparg=3, pp_stack=<synthetic pointer>) at Python/ceval.c:4568
#8  0x00007ffff78df9ca in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3124
#9  0x00007ffff79f1f7e in _PyEval_EvalCodeWithName (_co=0x7fff94b4ced0, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, argcount=1, kwnames=0x7fff94b41468, kwargs=0x7ffe6b746bf0, kwcount=1, kwstep=1, defs=0x7fff94b456a8, defcount=3, kwdefs=0x0, closure=0x0, name=0x7fff94b41470, qualname=0x7fff94b3fb20) at Python/ceval.c:3930
#10 0x00007ffff7905d05 in _PyFunction_FastCallKeywords (func=<optimized out>, stack=<optimized out>, nargs=<optimized out>, kwnames=<optimized out>) at Objects/call.c:433
#11 0x00007ffff78dfda1 in _PyEval_EvalFrameDefault (kwnames=0x7fff94b41450, oparg=<optimized out>, pp_stack=<synthetic pointer>) at Python/ceval.c:4616
#12 0x00007ffff78dfda1 in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3139
#13 0x00007ffff79f1f7e in _PyEval_EvalCodeWithName (_co=_co@entry=0x7fff94b4cc00, globals=globals@entry=0x7fff3b491f50, locals=locals@entry=0x0, args=args@entry=0x7fffffff93b0, argcount=argcount@entry=3, kwnames=kwnames@entry=0x0, kwargs=0x0, kwcount=0, kwstep=2, defs=0x7fff94b45658, defcount=3, kwdefs=0x0, closure=0x7fffac127610, name=0x7ffff7f944f0, qualname=0x7fff94b3fa30) at Python/ceval.c:3930
#14 0x00007ffff7905aa5 in _PyFunction_FastCallDict (func=0x7fffac12ccb0, args=0x7fffffff93b0, nargs=3, kwargs=0x0) at Objects/call.c:376
#15 0x00007ffff7906e22 in _PyObject_Call_Prepend (callable=callable@entry=0x7fffac12ccb0, obj=obj@entry=0x7ffd40ef0050, args=args@entry=0x7ffe5d5cd2d0, kwargs=kwargs@entry=0x0) at Objects/call.c:908
#16 0x00007ffff796acd5 in slot_tp_init (self=0x7ffd40ef0050, args=0x7ffe5d5cd2d0, kwds=0x0) at Objects/typeobject.c:6636
...

Change History (9)

in reply to:  1 ; comment:1 by Tristan Croll, 6 years ago

I believe this would be where in my code `find_close_points()` is being 
called: 
https://github.com/tristanic/chimerax-clipper/blob/07b4e7de7203558f3e4c5903cf1dbbdafd1ebe24/src/util.py#L114.

On 2020-03-09 13:42, ChimeraX wrote:

comment:2 by Tristan Croll, 6 years ago

Infuriatingly difficult one to track down. I've repeated the first dozen or so steps from ISOLDE's bulk fitting tutorial close to a hundred times over and seen it crash exactly twice - and never when I was actually looking for it. The second was actually in the middle of a demonstration, which was a bit embarrassing... May have to try going through it with Valgrind.

comment:3 by pett, 6 years ago

Owner: changed from pett to Tom Goddard

comment:4 by Tom Goddard, 6 years ago

I reproduced this crash. It is almost surely because find_close_points() can at most handle 231 / 3 < 700 million points because it is using 32-bit signed integer indices into the xyz coordinate array. Since this example has 1 billion points it crashes.

The find close points code should either check the limit and raise an error if it is handed too many points instead of crashing, or probably should really use 64-bit integers so it can handle any size.

That said the clipper/util.py that is calling this

data = numpy.array(volume.data.matrix(), order='C')
close_i = find_close_points(model.atoms.scene_coords, grid_coords, mask_radius)[1]
close_vals = data.ravel()[close_i]

is staggeringly inefficient. It is finding the map values at the closest grid points by making a 12 Gbyte list of the grid points and then running an extremely slow calculation to find the nearest ones to the atoms. There is a volume interpolate_values() method that can find the values at the nearest grid points I'd guess about 1000x faster by simply finding where each atom sits in the grid and getting the value at that grid point.

in reply to:  5 ; comment:5 by Tristan Croll, 6 years ago

Hmm... so that means this is almost certainly different to the other 
intermittent crash I see. Bummer.

Point taken about inefficiency - but the aim of the method isn't just to 
get the gradients at the *closest* grid points, but at *all* grid points 
within a cutoff distance. This is to account for the fact that the model 
may still be quite bad, with many atoms outside of density - so simply 
finding the gradient at each atom could be quite misleading. That being 
said, you've made me realise that I've already tackled almost exactly 
this problem in optimising Clipper's map re-zoning, and in fact at the 
point this method is called I already have a binary mask covering the 
region of interest. Will rewrite to take advantage of that.

On 2020-03-10 05:31, ChimeraX wrote:

in reply to:  6 ; comment:6 by Tristan Croll, 6 years ago

OK, I've done some refactoring to guess_suitable_contour() to remove the 
dependence on find_close_points(). It's still not exactly *fast*, but 
about 3x faster than before. Seems fast enough, given that this method 
is generally only run once for each map. Yes, it would be *much* faster 
to just interpolate the gradients at each atomic position, but I want 
this method to give a robust result even when the starting model is 
*really* bad.

On 2020-03-10 09:36, ChimeraX wrote:

in reply to:  7 ; comment:7 by Tristan Croll, 6 years ago

FWIW, I've run ChimeraX.valgrind through all the steps involved in 
seeing the crash when it happens, and it gives an apparently clean bill 
of health. For that matter, I haven't been able to trigger a crash on my 
machine (other than the big-map case) for the last two days. Mysterious.

On 2020-03-10 09:36, ChimeraX wrote:

comment:8 by Tom Goddard, 6 years ago

Resolution: fixed
Status: assignedclosed

Fixed.

I made find_close_points() handle large map sizes using 64-bit indices instead of 32-bit.

Your example case takes 60 seconds to find 20 million grid points. I don't think this very slow method is giving a better result -- for a large structure like this 1 million atom example faster interpolation 1 million more or less random samples where the molecule is placed.

in reply to:  9 ; comment:9 by Tristan Croll, 6 years ago

OK, I went ahead and did some more optimisation - new code at 
https://github.com/tristanic/chimerax-clipper/blob/8e4f494eefd5d8b0f799ddbe9aa863cd96aa3694/src/util.py#L89. 
Basically, I now calculate a 3A radius mask around the model on a 1.5A 
grid, then use the interpolated map values where mask==1. Brings this 
scenario down to 3.7 seconds.

Turns out I was also using essentially the same logic to guess a 
suitable MDFF coupling constant for the map (just using gradients rather 
than absolute values). I've just reworked it to just use the atom 
coordinates, randomly perturbed by up to 0.25A along each axis. Might 
still switch to using the mask-based approach, but the result seems 
about the same as before for a few different scenarios.

Anyway, thanks for pointing this out. Never really came up before, 
because for "ordinary" models/maps the old approach was still fast 
enough to not generally be an issue.

On 2020-03-11 01:51, ChimeraX wrote:
Note: See TracTickets for help on using tickets.