Opened 6 years ago

Closed 6 years ago

Last modified 6 years ago

#2249 closed defect (can't reproduce)

Occasional segmentation fault in MatchMaker?

Reported by: Tristan Croll Owned by: pett
Priority: moderate Milestone:
Component: Structure Comparison Version:
Keywords: Cc:
Blocked By: Blocking:
Notify when closed: Platform: Linux64 (X11)
Project: ChimeraX

Description

I'm running a script over a very large number of models (re-running an analysis on the CASP13 results in response to a reviewer), and getting the occasional, apparently-random segmentation fault. Managed to catch it once with GDB (top of the traceback copied below)... looks like something might be racing the garbage collection and occasionally losing? In any case, it doesn't seem to be related to any specific file.

Below is the method it appears to crash in. I just added the logging in an attempt to determine the "problem" file, but since I added it it's been running over a few hundred alignments without issue. Might be that the act of logging is slowing things down or just the right way, or may be that the crash is just that rare (the same script ran over a few thousand models in a separate ChimeraX instance without a hitch).

logfile = open('align.log', 'wt')

def align(session, target, model, cutoff_distance=3, logfile=logfile):
    logfile.write('Aligning {} to {}\n'.format(model.name, target.name))
    logfile.flush()
    from chimerax.match_maker.match import match, defaults
    result = match(session, defaults['chain_pairing'], (target, [model]),
        defaults['matrix'], defaults['alignment_algorithm'],
        defaults['gap_open'], defaults['gap_extend'],
        cutoff_distance=cutoff_distance, always_raise_errors=True)[0]

    # Returned arrays of aligned atoms are paired, but are in random order with
    # respect to the actual models. Re-sort them before returning
    import numpy
    model_atoms = result[0]
    target_atoms = result[1]
    sort_order = numpy.argsort(target_atoms.residues.numbers)
    model_atoms = model_atoms[sort_order]
    target_atoms = target_atoms[sort_order]
    return (model_atoms, target_atoms, *result[2:])
Program received signal SIGSEGV, Segmentation fault.
PyObject_CallFinalizer (self=self@entry=0x7fff3e7d5b50) at Objects/object.c:288
288	Objects/object.c: No such file or directory.
Missing separate debuginfos, use: debuginfo-install ucsf-chimerax-daily-2019.07.13-1.el7.x86_64
(gdb) bt
#0  0x00007ffff794c49a in PyObject_CallFinalizer (self=self@entry=0x7fff3e7d5b50) at Objects/object.c:288
#1  0x00007ffff794c4c6 in PyObject_CallFinalizerFromDealloc (self=self@entry=0x7fff3e7d5b50) at Objects/object.c:303
#2  0x00007ffff795e8e1 in subtype_dealloc (self=0x7fff3e7d5b50) at Objects/typeobject.c:1207
#3  0x00007ffff7938c27 in free_keys_object (keys=0x7fff3e6555b0) at Objects/dictobject.c:559
#4  0x00007ffff79396b0 in dict_dealloc (mp=0x7fff3e620dc0) at Objects/dictobject.c:1913
#5  0x00007ffff79246f7 in func_dealloc (op=0x7fff3e676f80) at Objects/funcobject.c:537
#6  0x00007ffff7939687 in dict_dealloc (mp=0x7fff3e64d4b0) at Objects/dictobject.c:1905
#7  0x00007ffff795ec8d in subtype_dealloc (self=0x7fff3d6baed0) at Objects/typeobject.c:1263
#8  0x00007ffff7955114 in set_clear_internal (so=<optimized out>) at Objects/setobject.c:507
#9  0x00007ffff79551d9 in set_clear (so=<optimized out>) at Objects/setobject.c:1193
#10 0x00007ffff79071d0 in _PyMethodDef_RawFastCallKeywords (method=0x7ffff7d67e20 <set_methods+32>, self=self@entry=0x7fffb70bf6e0, args=args@entry=0x7fffe73f4f80, nargs=nargs@entry=0, kwnames=kwnames@entry=0x0) at Objects/call.c:633
#11 0x00007ffff7910319 in _PyMethodDescr_FastCallKeywords (descrobj=0x7ffff7fb0b90, args=args@entry=0x7fffe73f4f78, nargs=nargs@entry=1, kwnames=kwnames@entry=0x0) at Objects/descrobject.c:288
#12 0x00007ffff78df94a in _PyEval_EvalFrameDefault (kwnames=0x0, oparg=1, pp_stack=<synthetic pointer>) at Python/ceval.c:4593
#13 0x00007ffff78df94a in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3110
#14 0x00007ffff78d67c0 in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=2, globals=<optimized out>)
    at Objects/call.c:283
#15 0x00007ffff7906d99 in _PyFunction_FastCallKeywords (func=<optimized out>, stack=<optimized out>, nargs=<optimized out>, kwnames=<optimized out>) at Objects/call.c:415

Change History (6)

comment:1 by Tristan Croll, 6 years ago

For what it's worth, here's the larger loop that align() was being called from when it crashed. target is a single-chain model, templates is a list of models loaded from the PDB, already trimmed back to the chain named as the template. I suspect that this will be *really* hard to track down without a reliably reproducible test case - really just flagging it here so it's on the radar if similar issues crop up.

def find_and_trim_matching_templates(session, target, templates, model_to_parents, cutoff_distance = 3,
        match_threshold = 0.4):
    from chimerax.core.commands import save as cxsave
    import os
    aligned_template_dirname = 'aligned_and_trimmed_templates'
    if not os.path.exists(aligned_template_dirname):
        os.mkdir(aligned_template_dirname)
    for t in reversed(templates):
        t.residues.ribbon_displays = False
        t.atoms[t.atoms.element_names == 'H'].delete()
        t.atoms.displays=True
        try:
            overall_result = align(session, target, t, cutoff_distance=None)
            best_result = align(session, target, t, cutoff_distance=cutoff_distance)
        except:
            templates.remove(t)
            try:
                print(t.id_string)
                session.models.close([t])
            except:
                print ('Failed on template {} before alignment'.format(t))
                return t
            continue
        aligned_atoms = best_result[0]
        if len(aligned_atoms) < len(target.residues)*match_threshold:
            templates.remove(t)
            try:
                print('Closing template {} due to too few residues overlapping with target'.format(t.id_string))
                session.models.close([t])
                continue
            except:
                print ('Failed on template {} after pruning'.format(t))
                return t
        # Delete everything in the template more than 20 residues outside the aligned region
        all_aligned_atoms = overall_result[0]
        aligned_residue_numbers = all_aligned_atoms.residues.numbers
        rmin = min(aligned_residue_numbers)-20
        rmax = max(aligned_residue_numbers)+20
        import numpy
        t_nums = t.residues.numbers
        delete_mask = numpy.logical_or(t_nums<rmin, t_nums>rmax)
        t.residues[delete_mask].delete()
        t.alignment_to_target = overall_result
        cxsave.save(session, os.path.join(aligned_template_dirname, t.name)+'.pdb', models=[t], rel_model=target)

comment:2 by Tristan Croll, 6 years ago

I'm also seeing a slow but steady memory leak - memory usage has gone from ~700 MB on initial startup to 8.6GB after working through about 16k models. Considering the complexity of the script I'm running there are many possible culprits. Once I'm done with the CASP analysis I'll try running with Valgrind to see if it can pick anything up.

comment:3 by pett, 6 years ago

Status: assignedaccepted

Well. I have a vague idea where a memory leak could be (Chain/Sequence destruction), but I wouldn't expect it to be anywhere as big as you're seeing (500K/model). I'll look into that anyway.

And yeah, matchmaker does a million things so it's going to be hard to make any progress with no reproducibility.

--Eric

in reply to:  4 ; comment:4 by tic20@…, 6 years ago

I suspect the memory leak may be in Matplotlib. I’m creating (and saving to .png) one fairly complex figure per file, and from what I gather online it’s pretty bad about releasing memory unless the figure is closed in *exactly* the right way. But as I said, will check with Valgrind once the manuscript’s safely resubmitted.
 

 


comment:5 by pett, 6 years ago

Resolution: can't reproduce
Status: acceptedclosed

Okay, I did fix _a_ memory leak having to do with chains/sequences, but the leak only occurred if you actually showed a sequence/alignment in a viewer, which I don't think is applicable to the scenario in this ticket.

Would need much more reproducibility to have any hope of fixing the crash.

in reply to:  6 ; comment:6 by Tristan Croll, 6 years ago

Agreed regarding the crash.

Regarding the memory leak, I suspect it might be my fault - but will 
probably be awful to track down. Using ISOLDE in a continuous session 
for a couple of days on a ~16k residue model (4v8r) with live maps and 
validation and lots of stopping and starting of simulations, my memory 
usage steadily increased from ~4GB to ~16GB. It's not really big enough 
to be an urgent problem (it's *much* more modest when working with 
smaller models) but something I'll have to deal with at some point once 
some more pressing challenges are out of the way.

On 2019-08-20 03:18, ChimeraX wrote:
Note: See TracTickets for help on using tickets.