Opened 8 years ago

Closed 7 years ago

#1024 closed enhancement (not a bug)

Add infrastructure such as locks to support C++ thread use

Reported by: Tristan Croll Owned by: Tom Goddard
Priority: moderate Milestone:
Component: Performance Version:
Keywords: Cc: pett, Conrad Huang, Greg Couch
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

At the moment this is mostly for the purpose of planning ahead rather than something I plan to use heavily immediately, but I've been playing around with how performance of ISOLDE (particularly all the live validation) might be improved with C++ threading - since it's pretty clear that Python threading is never going to cut it for VR-level performance.

Basically, the general idea is to kick off (a) thread(s) at the point coordinates are updated from the simulation, and defer application of the results to a later frame. I came up with the attached method which works nicely in my test case (pushing rotamer validation to a std::thread - see the bits labelled /*TESTING/ in the attached rota.h/cpp if you want to see the internal logic). Basically, it takes handles to three functions (one to start the thread, one to check if it's done, and one to finalise it and apply the results), and arguments to the initialisation and finalisation functions. It then calls the initialiser, and adds a callback to the 'new frame' trigger which runs the check function and if it returns true, runs the finalisation and removes its own handler. I can have the HIV capsid rotating on screen with full lighting, and run rotamer validation (which takes about half a second) without even a momentary glitch in framerate.

At present this approach would obviously have to be used with a lot of care. Since there are no mutex's on anything, there's nothing to stop atoms being changed (or, more disastrously, deleted) while such a thread is running. That being said, I'm fairly confident I can use it safely in ISOLDE (in the specific context of running simulations) with no changes to the ChimeraX core. In that case I know that coordinates won't be updated more than once per 100ms (at present, at least), each individual validation task is <5ms, and I'll only ever be reading from ChimeraX objects, not writing to them.

Still, I'm wondering: is there any long-term plan to make the core ChimeraX objects thread-safe? It would be great to have a multiple-reader/single-writer mutex arrangement in place for peace of mind.

Attachments (3)

delayed_reaction.py (1.8 KB ) - added by Tristan Croll 8 years ago.
rota.h (4.3 KB ) - added by Tristan Croll 8 years ago.
rota.cpp (7.7 KB ) - added by Tristan Croll 8 years ago.

Download all attachments as: .zip

Change History (11)

by Tristan Croll, 8 years ago

Attachment: delayed_reaction.py added

by Tristan Croll, 8 years ago

Attachment: rota.h added

by Tristan Croll, 8 years ago

Attachment: rota.cpp added

comment:1 by Tom Goddard, 8 years ago

Cc: pett Conrad Huang Greg Couch added

I agree C++ threads can be useful. If we end up trying to provide good support for virtual reality headsets then the graphics rendering probably will need to go into a separate C++ thread so that it can maintain 90 frames/second rendering while all the other ChimeraX code runs in the main thread and periodically passes the graphics thread drawing changes.

I think even your rotamer thread needs some locking. It doesn't matter if coordinates update only every 100 msec and rotamer calc takes 5 msec, you will still have a 1 in 20 chance that the coordinates will be changed in the middle of rotamer validation -- probably won't cause a crash, just some incorrect angles from combining old coordinates with new coordinates. An alternative would be to give the rotamer thread a copy of the coordinates.

I don't think it is reasonable to think every operation in ChimeraX that changes atomic data will do locking. The code complexity is one problem. Another problem is the performance when no extra threads are active may be much worse. I don't have much experience with threads but I think this is a common theme, that introducing thread support reduces performance.

So I think you should add your own locking so your OpenMM coordinate changes cooperate with your other threads using coordinates such as rotamer validation. This isn't 100% save since other ChimeraX code could change the coordinates, or delete the whole structure. You might also monitor for deletion so the rotamer thread can be stopped instead of seg faulting.

in reply to:  5 comment:2 by tic20@…, 8 years ago

The idea here is that the validation thread would be started on my ‘completed simulation step’ trigger (which is fired just after the coordinates are updated), so there would be the full 100ms to play with. I’ll eventually have on the order of half a dozen validation tasks (rota, Rama, chirals, nucleic acids, sugars, ...) and possibly some to do with updating restraint visualisations. The idea I had in mind was to launch each into its own thread on the trigger, then stagger their return over a few frames so that no one frame gets the full load. 

I think I’d only use the threaded versions during simulations where smooth performance really matters - outside of those I’d just have the single-threaded versions under the control of the “changes” trigger. Within simulations, well... if anyone’s silly enough to delete atoms from a running simulation, then the resulting crash should be a valuable lesson. Even then they might get away without a segfault - that would only happen if the deletion happens to coincide with the ~5% of the time the threads would be running. Still, what do you think of the idea of giving each Structure a deletion_allowed (or editing_allowed) flag?



 
 
Tristan Croll
Research Fellow
Cambridge Institute for Medical Research
University of Cambridge CB2 0XY
 

 


in reply to:  6 comment:3 by goddard@…, 8 years ago

I don’t think a Structure.deletion_allowed attribute is useful.  If your code will cause a crash if a structure is deleted during a calculation, then you should get the delete trigger and make sure to prevent that crash.  The idea is that your code should defend itself, not all the other ChimeraX code should look at a deletion_allowed flag to accommodate your code.  It is not that we are lazy, just that your code should be modular and use existing mechanisms if they exist (like catching a structure delete) to assure it runs correctly.


in reply to:  7 comment:4 by tic20@…, 8 years ago

My code as it stands should do a *pretty* good job of defending itself: the managers are all using the DestructionObserver framework to delete /make safe any of their managed objects that depend on deleted atoms/residues/dihedrals. So it’s all perfectly safe in single-threaded operation. But if (as in this case) I fire off a short-lived thread to run a single loop over a set of atoms, then for the few ms while that loop is running deletion of any of those atoms will cause a crash and I don’t think there’s anything I can do about it. In practice, under the conditions I’d be using I think it should be a non-issue - a strongly worded warning not to edit the structure during a running simulation should be enough, but it’s there.

 
 
Tristan Croll
Research Fellow
Cambridge Institute for Medical Research
University of Cambridge CB2 0XY
 

 


in reply to:  8 ; comment:5 by tic20@…, 8 years ago

Thought of a possible workaround from my end. If I define a function 
num_threads_running(), and a wrapper function as follows:

def wait_for_threads(func):
     def new_func(*args, **kwargs):
         from time import sleep
         while num_threads_running():
             sleep(1e-3)
         return func(*args, **kwargs)

... then on starting the validation framework I could wrap any class 
methods which would cause disaster if started while a thread is running 
(i.e. those that can delete atoms - Atoms.delete, Structure.delete 
etc.), and restore the original methods on closing the validation 
manager. Seem reasonable?

On 2018-02-08 20:08, ChimeraX wrote:

in reply to:  9 ; comment:6 by goddard@…, 8 years ago

I would not pursue that approach.  Generally you don’t want to dynamically swap out Python methods — this is asking for a debugging nightmare.  I see now the problem you have is that Python triggers that say an Atom or Structure was deleted are too late since they happen after the C++ object was deleted.  You need to know if some C++ object you are using is about to be deleted so you can stop your calculation.  I don’t think we have that.  I would be inclined to copy the atom coordinates and have the thread use the copy to make it perfectly safe.  Code that works 99.9% of the time is the worst kind, you get to spend hours or tens of hours debugging an obscure case.  Once you put many 99.9% solutions into your code you end up hitting the problems way to often and your development productivity plummets.

in reply to:  10 ; comment:7 by tic20@…, 8 years ago

OK, I can see the logic in that. A bit of rearrangement involved, but 
worth it for the improved stability.

On 2018-02-08 21:41, ChimeraX wrote:

comment:8 by Tom Goddard, 7 years ago

Resolution: not a bug
Status: assignedclosed
Summary: C++ thread safety?Add infrastructure such as locks to support C++ thread use

I think it is impractical to have ChimeraX atomic data changes require obtaining a lock. Instead, in order to use threads, data such as atomic coordinates should be copied so only the thread has access to them.

I don't see any changes proposed by this ticket so I am closing it.

Note: See TracTickets for help on using tickets.