Opened 6 years ago

Closed 6 years ago

#2931 closed enhancement (fixed)

Built in method to profile trigger firings

Reported by: Tristan Croll Owned by: Eric Pettersen
Priority: moderate Milestone:
Component: Performance Version:
Keywords: Cc: chimera-programmers
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

I'm once again doing some profiling (trying to work out why adding an amino acid residue to a large model gets *really* slow when ISOLDE is active - ~6s to extend a chain by one residue in a megadalton complex). About 1 second of this is covered if I %prun the method itself (source code at https://github.com/tristanic/isolde/blob/af87760f3793fe9bb2e42ce0b73c8c91094ec9af/isolde/src/atomic/building/build_utils.py#L110), and the rest is presumably in methods firing in response to the changes trigger. To track down some similar performance issues in #2355 I hacked the below into triggerset.py... I'm wondering if it would be worth including something like this permanently - perhaps extended to allow accumulating a profile over the next n triggers, or even to profile all triggers for a set period of time? Seems it would be quite useful as the ChimeraX ecosystem becomes more complex.

class _Trigger:
    """Keep track of handlers to invoke when activated"""

    def __init__(self, name, usage_cb, default_one_time, remove_bad_handlers):
        self._name = name
        from .orderedset import OrderedSet
        self._handlers = OrderedSet()	# Fire handlers in order they were registered.
        self._pending_add = set()
        self._pending_del = set()
        self._locked = 0
        self._blocked = 0
        self._need_activate = set()
        self._need_activate_data = []
        self._usage_cb = usage_cb
        self._default_one_time = default_one_time
        self._remove_bad_handlers = remove_bad_handlers
        self._profile_next_run = False
        self._profile_params = None

    def profile_next_run(self, sort_by='time', num_entries=20):
        self._profile_next_run = True
        self._profile_params = {'sort_by': sort_by, 'num_entries': num_entries}

    def activate(self, data):
        if not self._profile_next_run:
            self._activate(data)
            return
        import cProfile, pstats, io
        params = self._profile_params
        pr = cProfile.Profile()
        pr.enable()
        self._activate(data)
        pr.disable()
        s = io.StringIO()
        ps = pstats.Stats(pr, stream=s).sort_stats(params['sort_by'])
        ps.print_stats(params['num_entries'])
        print('Profile for last activation of {} trigger:'.format(self._name))
        print(s.getvalue())
        self._profile_next_run = False


    def _activate(self, data):
        # original, unchanged activate method

Attachments (1)

triggerset.py.diff (2.4 KB ) - added by Tristan Croll 6 years ago.

Download all attachments as: .zip

Change History (8)

comment:1 by Eric Pettersen, 6 years ago

Component: UnassignedPerformance
Status: assignedaccepted

comment:2 by Eric Pettersen, 6 years ago

Cc: chimera-programmers added
Resolution: fixed
Status: acceptedclosed

This change seems useful. I have applied it and it will be in the daily build once the 0.92 release is out.

Just for reference, though the full-code listing you gave is useful, it would be good if you also supplied a diff file. That way I can see the exact differences without having to eyeball back and forth between original code and new code, and I can easily apply the changes with 'patch' instead of typing them by hand.

--Eric

comment:3 by Tristan Croll, 6 years ago

Normally I would, but this came up at about 4pm on a Friday and just made me remember that old snippet - but I didn't actually have it in place in the code. Thanks for adding it! I'll see about adding a few convenience functions to access it from the main TriggerSet API - as written I believe it's only accessible using hidden methods.

by Tristan Croll, 6 years ago

Attachment: triggerset.py.diff added

comment:4 by Tristan Croll, 6 years ago

Resolution: fixed
Status: closedreopened

Just attached a diff with an extra method exposing it to the TriggerSet API. Just call triggers.profile_trigger(name).

comment:5 by Tristan Croll, 6 years ago

... and after all that, I can't reproduce whatever it was that was making adding residues so slow on my colleague's machine. In a much larger system (~3 MDa) with a similar number of chains, in the 0.92 release build adding a residue to a chain takes about 1.3 seconds, almost all of which is in the method call itself (as long as the ribbon isn't drawn, the changes trigger only amounts to a couple dozen milliseconds). Will see if the delay persists when he updates his ChimeraX, and if so will profile that particular case.

comment:6 by Tristan Croll, 6 years ago

Here's the top few entries for a %prun% of the call to add_amino_acid_residue(), for what it's worth:

         106791 function calls (96161 primitive calls) in 1.763 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        8    0.663    0.083    0.684    0.086 molobject.py:1596(polymers)
        1    0.267    0.267    0.267    0.267 molarray.py:665(delete)
       43    0.228    0.005    0.229    0.005 molarray.py:598(num_selected)
       45    0.100    0.002    0.101    0.002 molc.py:202(set_prop)
        4    0.081    0.020    0.083    0.021 symmetry.py:100(whole_residue_sym_sphere)

Suggests at least one thing I could improve immediately: avoid adding the OXT, HXT and H2 atoms from the amino acid template in the first place, so I don't have to then delete them. I'm also curious about the 8 calls to polymers - I'm not calling this anywhere in this method directly, but is something I'm doing triggering extraneous calls? Will look into it.

comment:7 by Eric Pettersen, 6 years ago

Resolution: fixed
Status: reopenedclosed

Have added the code from your diff.

There are various C++ things that call the C++ polymers() as a side effect, but the calls you're seeing are going through molobject.py and are therefore being generated somewhere in the Python layer. You could put traceback.print_stack() in molobject.polymers() to see what's calling it.

Note: See TracTickets for help on using tickets.