#2937 closed defect (fixed)
Segmentation fault in Point_List::bounding_box() for large models (?)
Reported by: | Tristan Croll | Owned by: | Tom Goddard |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | Core | Version: | |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Notify when closed: | Platform: | all | |
Project: | ChimeraX |
Description
I've been seeing a quite intermittent segmentation fault lately when associating a model with a cryo-EM map in Clipper, but every time I attempted to capture it with gdb
it would stubbornly refuse to manifest (until now). I suspect it's more common for larger models, but I've seen it happen once during ISOLDE's flexible-fitting tutorial with 6mhz (910 residues). Anyway, I was finally able to reproduce it (at least, I assume it's the same crash) with:
isolde start open 6q3g open 4459 from emdb clipper assoc #2 to #1
Note that this is a staggeringly large model/map (1 billion voxels) so be patient.
Stack trace:
Program received signal SIGSEGV, Segmentation fault. 0x00007fffec58aaca in Point_List::bounding_box() const [clone .part.9] () from /opt/UCSF/ChimeraX/lib/python3.7/site-packages/chimerax/core/geometry/_geometry.so Missing separate debuginfos, use: debuginfo-install ucsf-chimerax-0.92-1.el7.x86_64 (gdb) bt #0 0x00007fffec58aaca in Point_List::bounding_box() const [clone .part.9] () at /opt/UCSF/ChimeraX/lib/python3.7/site-packages/chimerax/core/geometry/_geometry.so #1 0x00007fffec58c9c6 in find_close_points_boxes(Point_List const&, Point_List const&, float, Index_Set&, Index_Set&, Nearest_Points*) () at /opt/UCSF/ChimeraX/lib/python3.7/site-packages/chimerax/core/geometry/_geometry.so #2 0x00007fffec58d3af in find_close_points_boxes(Point_List const&, Point_List const&, float, Index_Set&, Index_Set&, Nearest_Points*) () at /opt/UCSF/ChimeraX/lib/python3.7/site-packages/chimerax/core/geometry/_geometry.so #3 0x00007fffec58fd98 in find_close_points(Close_Points_Method, float const*, int, float const*, int, float, float*, std::vector<int, std::allocator<int> >*, std::vector<int, std::allocator<int> >*, std::vector<int, std::allocator<int> >*) () at /opt/UCSF/ChimeraX/lib/python3.7/site-packages/chimerax/core/geometry/_geometry.so #4 0x00007fffec592cd7 in find_close_points () at /opt/UCSF/ChimeraX/lib/python3.7/site-packages/chimerax/core/geometry/_geometry.so #5 0x00007ffff7906223 in _PyMethodDef_RawFastCallKeywords (method=0x7fffec5aa960 <Geometry_Cpp::geometry_cpp_methods+224>, self=<optimized out>, args=0x7ffe640c4318, nargs=3, kwnames=<optimized out>) at Objects/call.c:694 #6 0x00007ffff79062a5 in _PyCFunction_FastCallKeywords (func=0x7fffe56dfc30, args=<optimized out>, nargs=<optimized out>, kwnames=<optimized out>) at Objects/call.c:734 #7 0x00007ffff78df9ca in _PyEval_EvalFrameDefault (kwnames=0x0, oparg=3, pp_stack=<synthetic pointer>) at Python/ceval.c:4568 #8 0x00007ffff78df9ca in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3124 #9 0x00007ffff79f1f7e in _PyEval_EvalCodeWithName (_co=0x7fff94b4ced0, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, argcount=1, kwnames=0x7fff94b41468, kwargs=0x7ffe6b746bf0, kwcount=1, kwstep=1, defs=0x7fff94b456a8, defcount=3, kwdefs=0x0, closure=0x0, name=0x7fff94b41470, qualname=0x7fff94b3fb20) at Python/ceval.c:3930 #10 0x00007ffff7905d05 in _PyFunction_FastCallKeywords (func=<optimized out>, stack=<optimized out>, nargs=<optimized out>, kwnames=<optimized out>) at Objects/call.c:433 #11 0x00007ffff78dfda1 in _PyEval_EvalFrameDefault (kwnames=0x7fff94b41450, oparg=<optimized out>, pp_stack=<synthetic pointer>) at Python/ceval.c:4616 #12 0x00007ffff78dfda1 in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3139 #13 0x00007ffff79f1f7e in _PyEval_EvalCodeWithName (_co=_co@entry=0x7fff94b4cc00, globals=globals@entry=0x7fff3b491f50, locals=locals@entry=0x0, args=args@entry=0x7fffffff93b0, argcount=argcount@entry=3, kwnames=kwnames@entry=0x0, kwargs=0x0, kwcount=0, kwstep=2, defs=0x7fff94b45658, defcount=3, kwdefs=0x0, closure=0x7fffac127610, name=0x7ffff7f944f0, qualname=0x7fff94b3fa30) at Python/ceval.c:3930 #14 0x00007ffff7905aa5 in _PyFunction_FastCallDict (func=0x7fffac12ccb0, args=0x7fffffff93b0, nargs=3, kwargs=0x0) at Objects/call.c:376 #15 0x00007ffff7906e22 in _PyObject_Call_Prepend (callable=callable@entry=0x7fffac12ccb0, obj=obj@entry=0x7ffd40ef0050, args=args@entry=0x7ffe5d5cd2d0, kwargs=kwargs@entry=0x0) at Objects/call.c:908 #16 0x00007ffff796acd5 in slot_tp_init (self=0x7ffd40ef0050, args=0x7ffe5d5cd2d0, kwds=0x0) at Objects/typeobject.c:6636 ...
Change History (9)
comment:2 by , 6 years ago
Infuriatingly difficult one to track down. I've repeated the first dozen or so steps from ISOLDE's bulk fitting tutorial close to a hundred times over and seen it crash exactly twice - and never when I was actually looking for it. The second was actually in the middle of a demonstration, which was a bit embarrassing... May have to try going through it with Valgrind.
comment:3 by , 6 years ago
Owner: | changed from | to
---|
comment:4 by , 6 years ago
I reproduced this crash. It is almost surely because find_close_points() can at most handle 231 / 3 < 700 million points because it is using 32-bit signed integer indices into the xyz coordinate array. Since this example has 1 billion points it crashes.
The find close points code should either check the limit and raise an error if it is handed too many points instead of crashing, or probably should really use 64-bit integers so it can handle any size.
That said the clipper/util.py that is calling this
data = numpy.array(volume.data.matrix(), order='C')
close_i = find_close_points(model.atoms.scene_coords, grid_coords, mask_radius)[1]
close_vals = data.ravel()[close_i]
is staggeringly inefficient. It is finding the map values at the closest grid points by making a 12 Gbyte list of the grid points and then running an extremely slow calculation to find the nearest ones to the atoms. There is a volume interpolate_values() method that can find the values at the nearest grid points I'd guess about 1000x faster by simply finding where each atom sits in the grid and getting the value at that grid point.
follow-up: 5 comment:5 by , 6 years ago
Hmm... so that means this is almost certainly different to the other intermittent crash I see. Bummer. Point taken about inefficiency - but the aim of the method isn't just to get the gradients at the *closest* grid points, but at *all* grid points within a cutoff distance. This is to account for the fact that the model may still be quite bad, with many atoms outside of density - so simply finding the gradient at each atom could be quite misleading. That being said, you've made me realise that I've already tackled almost exactly this problem in optimising Clipper's map re-zoning, and in fact at the point this method is called I already have a binary mask covering the region of interest. Will rewrite to take advantage of that. On 2020-03-10 05:31, ChimeraX wrote:
follow-up: 6 comment:6 by , 6 years ago
OK, I've done some refactoring to guess_suitable_contour() to remove the dependence on find_close_points(). It's still not exactly *fast*, but about 3x faster than before. Seems fast enough, given that this method is generally only run once for each map. Yes, it would be *much* faster to just interpolate the gradients at each atomic position, but I want this method to give a robust result even when the starting model is *really* bad. On 2020-03-10 09:36, ChimeraX wrote:
follow-up: 7 comment:7 by , 6 years ago
FWIW, I've run ChimeraX.valgrind through all the steps involved in seeing the crash when it happens, and it gives an apparently clean bill of health. For that matter, I haven't been able to trigger a crash on my machine (other than the big-map case) for the last two days. Mysterious. On 2020-03-10 09:36, ChimeraX wrote:
comment:8 by , 6 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Fixed.
I made find_close_points() handle large map sizes using 64-bit indices instead of 32-bit.
Your example case takes 60 seconds to find 20 million grid points. I don't think this very slow method is giving a better result -- for a large structure like this 1 million atom example faster interpolation 1 million more or less random samples where the molecule is placed.
follow-up: 9 comment:9 by , 6 years ago
OK, I went ahead and did some more optimisation - new code at https://github.com/tristanic/chimerax-clipper/blob/8e4f494eefd5d8b0f799ddbe9aa863cd96aa3694/src/util.py#L89. Basically, I now calculate a 3A radius mask around the model on a 1.5A grid, then use the interpolated map values where mask==1. Brings this scenario down to 3.7 seconds. Turns out I was also using essentially the same logic to guess a suitable MDFF coupling constant for the map (just using gradients rather than absolute values). I've just reworked it to just use the atom coordinates, randomly perturbed by up to 0.25A along each axis. Might still switch to using the mask-based approach, but the result seems about the same as before for a few different scenarios. Anyway, thanks for pointing this out. Never really came up before, because for "ordinary" models/maps the old approach was still fast enough to not generally be an issue. On 2020-03-11 01:51, ChimeraX wrote: