#2249 closed defect (can't reproduce)
Occasional segmentation fault in MatchMaker?
Reported by: | Tristan Croll | Owned by: | pett |
---|---|---|---|
Priority: | moderate | Milestone: | |
Component: | Structure Comparison | Version: | |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Notify when closed: | Platform: | Linux64 (X11) | |
Project: | ChimeraX |
Description
I'm running a script over a very large number of models (re-running an analysis on the CASP13 results in response to a reviewer), and getting the occasional, apparently-random segmentation fault. Managed to catch it once with GDB (top of the traceback copied below)... looks like something might be racing the garbage collection and occasionally losing? In any case, it doesn't seem to be related to any specific file.
Below is the method it appears to crash in. I just added the logging in an attempt to determine the "problem" file, but since I added it it's been running over a few hundred alignments without issue. Might be that the act of logging is slowing things down or just the right way, or may be that the crash is just that rare (the same script ran over a few thousand models in a separate ChimeraX instance without a hitch).
logfile = open('align.log', 'wt') def align(session, target, model, cutoff_distance=3, logfile=logfile): logfile.write('Aligning {} to {}\n'.format(model.name, target.name)) logfile.flush() from chimerax.match_maker.match import match, defaults result = match(session, defaults['chain_pairing'], (target, [model]), defaults['matrix'], defaults['alignment_algorithm'], defaults['gap_open'], defaults['gap_extend'], cutoff_distance=cutoff_distance, always_raise_errors=True)[0] # Returned arrays of aligned atoms are paired, but are in random order with # respect to the actual models. Re-sort them before returning import numpy model_atoms = result[0] target_atoms = result[1] sort_order = numpy.argsort(target_atoms.residues.numbers) model_atoms = model_atoms[sort_order] target_atoms = target_atoms[sort_order] return (model_atoms, target_atoms, *result[2:])
Program received signal SIGSEGV, Segmentation fault. PyObject_CallFinalizer (self=self@entry=0x7fff3e7d5b50) at Objects/object.c:288 288 Objects/object.c: No such file or directory. Missing separate debuginfos, use: debuginfo-install ucsf-chimerax-daily-2019.07.13-1.el7.x86_64 (gdb) bt #0 0x00007ffff794c49a in PyObject_CallFinalizer (self=self@entry=0x7fff3e7d5b50) at Objects/object.c:288 #1 0x00007ffff794c4c6 in PyObject_CallFinalizerFromDealloc (self=self@entry=0x7fff3e7d5b50) at Objects/object.c:303 #2 0x00007ffff795e8e1 in subtype_dealloc (self=0x7fff3e7d5b50) at Objects/typeobject.c:1207 #3 0x00007ffff7938c27 in free_keys_object (keys=0x7fff3e6555b0) at Objects/dictobject.c:559 #4 0x00007ffff79396b0 in dict_dealloc (mp=0x7fff3e620dc0) at Objects/dictobject.c:1913 #5 0x00007ffff79246f7 in func_dealloc (op=0x7fff3e676f80) at Objects/funcobject.c:537 #6 0x00007ffff7939687 in dict_dealloc (mp=0x7fff3e64d4b0) at Objects/dictobject.c:1905 #7 0x00007ffff795ec8d in subtype_dealloc (self=0x7fff3d6baed0) at Objects/typeobject.c:1263 #8 0x00007ffff7955114 in set_clear_internal (so=<optimized out>) at Objects/setobject.c:507 #9 0x00007ffff79551d9 in set_clear (so=<optimized out>) at Objects/setobject.c:1193 #10 0x00007ffff79071d0 in _PyMethodDef_RawFastCallKeywords (method=0x7ffff7d67e20 <set_methods+32>, self=self@entry=0x7fffb70bf6e0, args=args@entry=0x7fffe73f4f80, nargs=nargs@entry=0, kwnames=kwnames@entry=0x0) at Objects/call.c:633 #11 0x00007ffff7910319 in _PyMethodDescr_FastCallKeywords (descrobj=0x7ffff7fb0b90, args=args@entry=0x7fffe73f4f78, nargs=nargs@entry=1, kwnames=kwnames@entry=0x0) at Objects/descrobject.c:288 #12 0x00007ffff78df94a in _PyEval_EvalFrameDefault (kwnames=0x0, oparg=1, pp_stack=<synthetic pointer>) at Python/ceval.c:4593 #13 0x00007ffff78df94a in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3110 #14 0x00007ffff78d67c0 in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=2, globals=<optimized out>) at Objects/call.c:283 #15 0x00007ffff7906d99 in _PyFunction_FastCallKeywords (func=<optimized out>, stack=<optimized out>, nargs=<optimized out>, kwnames=<optimized out>) at Objects/call.c:415
Change History (6)
comment:1 by , 6 years ago
comment:2 by , 6 years ago
I'm also seeing a slow but steady memory leak - memory usage has gone from ~700 MB on initial startup to 8.6GB after working through about 16k models. Considering the complexity of the script I'm running there are many possible culprits. Once I'm done with the CASP analysis I'll try running with Valgrind to see if it can pick anything up.
comment:3 by , 6 years ago
Status: | assigned → accepted |
---|
Well. I have a vague idea where a memory leak could be (Chain/Sequence destruction), but I wouldn't expect it to be anywhere as big as you're seeing (500K/model). I'll look into that anyway.
And yeah, matchmaker does a million things so it's going to be hard to make any progress with no reproducibility.
--Eric
follow-up: 4 comment:4 by , 6 years ago
I suspect the memory leak may be in Matplotlib. I’m creating (and saving to .png) one fairly complex figure per file, and from what I gather online it’s pretty bad about releasing memory unless the figure is closed in *exactly* the right way. But as I said, will check with Valgrind once the manuscript’s safely resubmitted.
comment:5 by , 6 years ago
Resolution: | → can't reproduce |
---|---|
Status: | accepted → closed |
Okay, I did fix _a_ memory leak having to do with chains/sequences, but the leak only occurred if you actually showed a sequence/alignment in a viewer, which I don't think is applicable to the scenario in this ticket.
Would need much more reproducibility to have any hope of fixing the crash.
follow-up: 6 comment:6 by , 6 years ago
Agreed regarding the crash. Regarding the memory leak, I suspect it might be my fault - but will probably be awful to track down. Using ISOLDE in a continuous session for a couple of days on a ~16k residue model (4v8r) with live maps and validation and lots of stopping and starting of simulations, my memory usage steadily increased from ~4GB to ~16GB. It's not really big enough to be an urgent problem (it's *much* more modest when working with smaller models) but something I'll have to deal with at some point once some more pressing challenges are out of the way. On 2019-08-20 03:18, ChimeraX wrote:
For what it's worth, here's the larger loop that
align()
was being called from when it crashed.target
is a single-chain model,templates
is a list of models loaded from the PDB, already trimmed back to the chain named as the template. I suspect that this will be *really* hard to track down without a reliably reproducible test case - really just flagging it here so it's on the radar if similar issues crop up.