Context Navigation

← Previous Ticket
Next Ticket →

#803 closed defect (fixed)

Multiprocessing "spawn" method does not work

Reported by:	Tristan Croll	Owned by:	tic20@…
Priority:	major	Milestone:
Component:	Core	Version:
Keywords:		Cc:	Greg Couch, Tom Goddard
Blocked By:		Blocking:
Notify when closed:		Platform:	all
Project:	ChimeraX

Description

(No idea who best to assign this to, so giving it to Tom for now).

I'm exploring the multiprocessing module as a way to get some parallelism into ISOLDE - specifically, it looks like it should be quite straightforward to get the OpenMM simulation running in a subprocess so I can keep it out of the graphics loop, only updating coordinates when they're ready. There are two general methods for generating the subprocess: 'fork' (just create an exact copy of the current Python process in memory) and 'spawn' (start a new Python instance to communicate with via pipes). The 'fork' method works straightforwardly - but is only available on Unix machines and each subprocess carries all the memory overhead of whatever's loaded in ChimeraX. Both good reasons to avoid it.

The 'spawn' method is more universal and lightweight but is currently broken for ChimeraX - but can be made to work quite straightforwardly. The new processes it attempts to start crash with the unrecognised flag '-s' (and 'E' is also there and would probably cause trouble as well). These arise in multiprocessing.spawn.get_command_line() lines 88-89, where it calls subprocess._args_from_interpreter_flags() and adds the results to the executable call, thinking it's calling Python directly rather than through the ChimeraX wrapper. I guess it would be simple enough to have ChimeraX ignore those flags, or alternatively replace the offending lines in spawn.py:

        #opts = util._args_from_interpreter_flags()
        #return [_python_exe] + opts + ['-c', prog, '--multiprocessing-fork']
        return [_python_exe] + ['-c', prog, '--multiprocessing-fork']

Once that's done, the following toy example works as expected:

import multiprocessing as mp
from multiprocessing import sharedctypes
from math import ceil

# If using the spawn method, foo needs to be defined somewhere that 
# ChimeraX knows where to find and re-import it. 
def foo(bfactors,start,end,out):
    for i in range(1000):
        c = bfactors.copy()
        c.sort()
    out[start:end] = bfactors
    

def run_multiproc_test(atoms, nproc):
    
    # Have to explicitly re-import foo here
    from chimerax.isolde.multiproc_test import foo
    ctx = mp.get_context('spawn')
    import numpy
    natoms = len(atoms)
    arr = sharedctypes.Array('d', natoms, lock = False)
    ret = numpy.empty(natoms, numpy.float32)
    stride = int(ceil(natoms/nproc))
    proclist = []
    bfactors = atoms.bfactors
    for i in range(nproc):
        start = stride*i
        end = start+stride
        if end > natoms:
            end = natoms
        p = ctx.Process(target=foo, args=(bfactors[start:end],start,end,arr))
        proclist.append(p)
        p.start()
    
    for p in proclist:
        p.join()
    ret[:] = arr
    return ret

... except that I get the following spammed to the terminal window:

Tool "File History" failed to start
Traceback (most recent call last):
  File "/home/tic20/apps/chimerax/lib/python3.6/site-packages/chimerax/core/tools.py", line 422, in start_tools
    bi.start_tool(session, tool_name)
  File "/home/tic20/apps/chimerax/lib/python3.6/site-packages/chimerax/core/toolshed/info.py", line 487, in start_tool
    % tool_name)
chimerax.core.toolshed.ToolshedError: tool "File History" is not supported without a GUI
Tool "Density Map Toolbar" failed to start
Traceback (most recent call last):
  File "/home/tic20/apps/chimerax/lib/python3.6/site-packages/chimerax/core/tools.py", line 422, in start_tools
    bi.start_tool(session, tool_name)
  File "/home/tic20/apps/chimerax/lib/python3.6/site-packages/chimerax/core/toolshed/info.py", line 487, in start_tool
    % tool_name)
chimerax.core.toolshed.ToolshedError: tool "Density Map Toolbar" is not supported without a GUI

(which also appears when running any other non-gui module such as pip install).

Oh, and if running run_multiproc_test() from the console, you'll need to first do:

import sys
mod = sys.modules['__main__']
mod.__spec__ = None

... since for whatever reason IPython's interactive shell DummyMod doesn't have spec defined.

Attachments (3)

shared_array_test.py (3.3 KB ) - added by Tristan Croll 8 years ago.: Thread-safe shared Numpy arrays
shared_array_test-1.py (3.7 KB ) - added by tic20@… 8 years ago.: Added by email2trac
test_multiproc.py (1.3 KB ) - added by Tristan Croll 8 years ago.: Illustration of issue with multiprocessing and OpenCL/CUDA

Download all attachments as: .zip

Change History (44)

comment:1 by Eric Pettersen, 8 years ago

Cc:	Greg Couch added
Component:	Unassigned → Core

comment:2 by Tom Goddard, 8 years ago

Wny not run OpenMM in a separate thread in the same process as the ChimeraX graphics? Separate processes complicate communication.

ChimeraX core/threadq.py has a little support for running calculations in a separate thread with thread-safe queues to request calculations and return results. This is used for example by the molecular surface calculation where chain surfaces are computed using a pool of threads in core/commands/surface.py (look for import of threadq). Maybe this will give a starting point for how to use threads although your case is a little different. An important limitation is that Python can't run in two threads at once. So the trick is that OpenMM has to release the Python interpreter lock when it drops into C++ code (not using any Python) -- you will need to check if the OpenMM Python wrapper releases the lock. Numpy and ChimeraX C++ surface calc routines are examples of code that releases the Python global interpreter lock so that it can run in threads while the main graphics thread continues running.

in reply to: 3 ; follow-up: 3 comment:3 by tic20@…, 8 years ago

Hi Tom,

I thought about that approach, but in the end came down in favour of the one I'm taking. It's a prime candidate for Python's brand of "gross" multiprocessing: one heavy, long-running, self-contained process that once started is effectively independent. Yes, there's some back-and-forth communication required, but that's not too difficult: a few shared-memory scalars and arrays where speed is important, and the slower but more flexible multiprocessing.Manager for handling command passing. It's a bit of a job to rearrange things now that ISOLDE's gotten quite big, but it's conceptually straightforward and should end up making things quite a bit cleaner. Also, there's the not-inconsiderable advantage that it would make it possible to implement a client-server arrangement where the simulation no longer has to be on the same machine.

Cheers,

Tristan

 
 
Tristan Croll
Research Fellow
Cambridge Institute for Medical Research
University of Cambridge CB2 0XY

comment:4 by Tom Goddard, 8 years ago

Owner:	changed from Tom Goddard to tic20@…

Ok, the way to find the best approach is to try them as you are doing. I don't have suggestions on the problems you encountered and don't think anyone here knows this better than you. Ignoring command-line options is not a good solution -- when people start ChimeraX from the command-line with invalid options and it doesn't do what they expect because the options were ignored that is very bad. The code could start though and write a warning about unrecognized options to the shell. The message about a toolbar needing a GUI sounds like the new ChimeraX process started in nogui mode (--nogui flag), and yet it tried to show a toolbar. Maybe you have some start-up script that is running and trying to show the density map toolbar (even in nogui mode)? You can test if ChimeraX is in nogui mode in Python using "session.ui.is_gui".

comment:5 by Tom Goddard, 8 years ago

Cc:	Tom Goddard added

in reply to: 6 ; follow-up: 6 comment:6 by tic20@…, 8 years ago

Ah - yes. I'd completely forgotten about the contents of my chimerax_start directory!

 
 
Tristan Croll
Research Fellow
Cambridge Institute for Medical Research
University of Cambridge CB2 0XY

comment:7 by Tristan Croll, 8 years ago

Some progress, a slightly more realistic example, and some observations.

we don't actually want it to be calling the ChimeraX executable to spawn new threads. What those -s and -E switches are supposed to do is turn off all automatic importing, to create an absolutely minimal Python environment into which multiprocessing loads only the modules and data necessary to run the desired function. That means you really want to call ChimeraX's python executable directly, to avoid all the ChimeraX-specific setup overhead. Luckily, there's a (seemingly undocumented) function in multiprocessing.spawn, set_executable(), designed for just this function. I know Chimerax has the variable chimerax.app_bin_dir available, but would it be possible to add chimerax.python_executable to make it easier to do this across platforms?

the multiprocessing.Array type is *really* slow if used directly:

import multiprocessing as mp
import ctypes
import numpy

mp_arr = mp.Array(ctypes.c_double, 500000)
data_arr = numpy.random.rand(500000).astype(float)
%timeit mp_arr[:] = data_arr
  28.8 ms ± 225 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

... but the following works wonders:

shared_numpy_arr = numpy.frombuffer(mp_arr.get_obj()).reshape((250000,2))
%timeit shared_numpy_arr[:] = data_arr.reshape((250000,2))
  186 µs ± 969 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

mp_arr[0]
  0.3699979636088202

The downside of this is that while the mp.Array object itself is thread safe, the numpy array is circumventing that. If you're running a function where thread safety is important, you'll need to share both mp_arr and shared_numpy_arr (or just share mp_arr and create a new numpy array in the worker thread), and when editing do:

with mp_arr.get_lock():
    shared_numpy_arr[:] = whatever

In the example I've hacked together below, I haven't bothered to do this since none of the threads ever write to the same array indices.

The Pool functionality is a little flaky. Make a mistake in its initialization function, and you'll only find out about it when your worker threads all hang.

The start-up overhead is actually not that bad, but it's enough to kill off the idea of using it for simple one-off functions. But if you have a situation where you can start up the pool and then keep re-using it, then there is scope for some reasonable speed-up. On-the-fly calculation of sliding averages while animating through a trajectory, perhaps. In the example I'm just running a loop over chimerax.core.geometry.interpolate_points to give it some moderately-large task (still too small to make the multiprocessing really effective, to be honest). With it sitting in my isolde module (example for 3j3y):

m = session.models.list()[0]
a = m.atoms
starting_coords = a.coords
final_coords = starting_coords+10
from chimerax.isolde import multiproc_test
num_points = 50
nproc = 5
method = 'spawn' # other option is 'fork'
cycles = 5 # To get timing of subsequent runs after the overhead of starting the pool
interp_arr = multiproc_test.run_multiproc_test(starting_coords, final_coords, num_points, nproc, ncycles = cycles, context = method)

  Cycle 0 took 2.20410418510437 seconds
  Cycle 1 took 0.9888038635253906 seconds
  Cycle 2 took 1.006852149963379 seconds
  Cycle 3 took 0.9972550868988037 seconds
  Cycle 4 took 0.9754586219787598 seconds
  Finishing took 0.16719269752502441 seconds


from chimerax.core.geometry import interpolate_points
%timeit interpolate_points(starting_coords, final_coords, 0.5)
  38.4 ms ± 746 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

38.4*50
  1920

... so very roughly a 2-fold speedup from 5 cores. Not great, but better than nothing. Give the worker threads bigger jobs to do, and the speed-up should get better. For smaller structures it's still substantially slower than the single-threaded version even on the repeat runs. For my purposes with ISOLDE it should work nicely: just one thread to deal with, containing the simulation. Lots of shared variables to handle, but that I think I can do.

Cheers,

Tristan

Code itself:

import multiprocessing as mp
from multiprocessing import sharedctypes, Pool, spawn
import ctypes
from math import ceil

try:
    from chimerax import app_bin_dir
    import os
    spawn.set_executable(os.path.join(app_bin_dir, 'python3.6'))
except:
    # We're not in ChimeraX any more, Toto!
    pass

def error_callback(e):
    print(e)

def _pool_init(c_arr, start_coord_array, end_coord_array, n_points, n_coords):
    '''
    Creates and sets up the shared variables needed by the threads. Only
    runs within the threads themselves, so the global variables aren't 
    too evil. Not sure if there's a way around using them.
    '''
    import numpy
    global shared_arr
    global start_c
    global end_c
    shared_arr = numpy.frombuffer(c_arr.get_obj()).reshape((n_points, n_coords, 3))
    start_c = numpy.frombuffer(start_coord_array.get_obj()).reshape((n_coords, 3))
    end_c = numpy.frombuffer(end_coord_array.get_obj()).reshape((n_coords, 3))
    
def _interpolate_worker(interp_fracs,frames, proc_id):
    global shared_arr
    from chimerax.core.geometry import interpolate_points
    import numpy
    global start_c
    global end_c
    target = numpy.empty((len(frames), len(start_c),3), numpy.double)
    
    for (frac, frame) in zip(interp_fracs, frames):
        shared_arr[frame] = interpolate_points(start_c, end_c, frac)
    return True
    

def run_multiproc_test(start_coords, end_coords, num_interpolation_points, nproc, ncycles=2, context = 'spawn'):
    from time import time, sleep
    start_time = time()    
    ctx = mp.get_context(context)
    import numpy
    n_coords = len(start_coords)
    assert n_coords == len(end_coords)
    n_points = num_interpolation_points
    c_arr = mp.Array(ctypes.c_double, n_points*n_coords*3)
    start_coord_array = mp.Array(ctypes.c_double, n_coords*3)
    end_coord_array = mp.Array(ctypes.c_double, n_coords*3)
    sca = numpy.frombuffer(start_coord_array.get_obj()).reshape((n_coords,3))
    sca[:] = start_coords
    eca = numpy.frombuffer(end_coord_array.get_obj()).reshape((n_coords,3))
    eca[:] = end_coords
    frames = numpy.array(range(num_interpolation_points),dtype='int')
    fracs = frames / n_points
    stride = int(ceil(n_points/nproc))
    with Pool(processes = nproc, initializer = _pool_init, 
              initargs = (c_arr,start_coord_array,end_coord_array, n_points, n_coords)) as p:
        for cycle in range(ncycles):
            results = []
            for i in range(nproc):
                start = stride*i
                end = start+stride
                if end > n_points:
                    end = n_points
                results.append(p.apply_async(_interpolate_worker,
                    args=(fracs[start:end],frames[start:end], i),
                    error_callback=error_callback))
            count = 0
            while True:
                done = True
                for result in results:
                    if not result.ready():
                        done = False
                if done:
                    break
                count += 1
                if count > 50000:
                    print('Timeout!')
                    break
                sleep(1e-4)
            print('Cycle {} took {} seconds'.format(cycle, time() - start_time))
            start_time = time()
            
    
    
    ret = numpy.frombuffer(c_arr.get_obj()).reshape((n_points, n_coords, 3))
    print('Finishing took {} seconds'.format(time()-start_time))
    return ret

by Tristan Croll, 8 years ago

Attachment:	shared_array_test.py added

Thread-safe shared Numpy arrays

comment:8 by Tristan Croll, 8 years ago

I'm not sure if this style of multiprocessing will be sensible in many core ChimeraX situations, but I can imagine it might be a useful addition to the toolset for plugins. In any case, I put together the attached little handy class, SharedNumpyArray, to combine the strengths of both multiprocessing.Array and numpy.ndarray. It behaves exactly like a numpy array for the most part, but is generated from a multiprocessing.Array and inherits its Lock object and methods. If you run python3.6 shared_array_test.py it will run a quick example, but in brief:

import numpy
import multiprocessing as mp
mp_array = mp.Array('d', 50)
shared_ndarray = SharedNumpyArray(mp_array).reshape((5,10))

# thread-safe
with shared_ndarray.get_lock():
    do_something(shared_ndarray)

# not thread-safe
do_something(shared_ndarray)

Either way, everything is as fast as standard numpy.

in reply to: 10 comment:9 by goddard@…, 8 years ago

When you get ISOLDE on toolshed with OpenMM in a separate process I will be interested to try it.  I have long wanted to try ISOLDE with the VR headsets but the graphics has to run fast (90 frames/second) so it needs OpenMM in a separate thread or process.  If the multiprocessing and shared numpy array work well in the distributed ISOLDE we can put it in the core for other developers to use.

in reply to: 11 ; follow-up: 9 comment:10 by tic20@…, 8 years ago

I'm pretty confident that can be made to happen, and I'm fairly sure I'm 
getting close. It's such a significant overhaul that it's difficult to 
test partway through, though.

For the record, the SharedNumpyArray wasn't working quite correctly yet. 
Turns out that nobody ever thought to provide the ability to query the 
dtype of an existing multiprocessing.Array, and it behaves like the 
following:

{{{
import multiprocessing as mp
import ctypes
arr = mp.Array(ctypes.c_float, 10) # An array of 10 float32 values
type(arr[0])
   float # returns a 64-bit Python float
}}}

... which completely broke my workaround for setting the type of the 
numpy array. The solution was to "subclass" multiprocessing.Array. I put 
"subclass" in quotes because multiprocessing.sharedctypes is weird as 
hell - Array() is actually a method, which calls another method which 
calls another method which calls another method which actually creates 
the object, and each method adds a little to it. I'm sure it made sense 
to somebody at some point. The attached version now appears to work 
correctly for a range of different ctypes.


On 2017-08-21 22:05, ChimeraX wrote:

shared_array_test-1.py

by tic20@…, 8 years ago

Attachment:	shared_array_test-1.py added

Added by email2trac

in reply to: 13 ; follow-up: 10 comment:11 by goddard@…, 8 years ago

Ha, ha.  Any kind of parallelization is likely to be full of implementation surprises.  I’m glad you are blazing a trail!

in reply to: 14 comment:12 by tic20@…, 8 years ago

So, first the good news: I have a threaded simulation running, maintaining a nice 30-50 fps for 5200 atoms.

The bad news: I never *did* have the spawn method working, and there are some enormous obstacles to overcome to make it work:

1) I'm now not sure if it's even possible to use the shared memory types with spawning. When I try to pass them to the pool initializer function it insists on trying to pickle them, and the Lock objects are not picklable. 

2) possible even more problematic is the "if __name__ == '__main__'" requirement: the newly spawned processes don't *know* they're the spawned ones and try to spawn processes of their own, effectively setting off a fork bomb. The standard workaround is to put the spawn code under the control of the above if statement, which guarantees that it only runs in the master thread - but that only works if __name__ actually *is* '__main__', which it of course isn't for a module.

So the upshot is that I can't see any way to make this work in Windows, and my searches on the web found nothing but other people also failing.

So until I find a better way (and the time to implement it), I'm afraid I'm going to have to limit ISOLDE to Unix environments. Such is life.

 
 
Tristan Croll
Research Fellow
Cambridge Institute for Medical Research
University of Cambridge CB2 0XY

in reply to: 15 ; follow-up: 11 comment:13 by goddard@…, 8 years ago

Ok.  Not too surprising, interprocess communcation is a nightmare.  That is why I suggest using threads.  The main two drawbacks of threads are that you have to deal with the Python global interpretter lock — if OpenMM SWIG bindings does not release it then that will have to be improved.  Maybe OpenMM already releases the GIL — production-ready code doing a compute intensive process run from Python should be releasing the GIL to allow threading.  Fixing this seems like a small issue.  The second drawback is that you won’t be running the OpenMM on a separate machine if you are using just a separate thread.  But since the focus is on interactive performance, I think having to deal witn network latency to another machine is such a headache that I would not design the code such that that is a requirement.

in reply to: 16 ; follow-up: 12 comment:14 by tic20@…, 8 years ago

Yep - looks like that's the ultimate way forward. For now, though, I think it would be best if get this version finalised then focus on other aspects for a while. At least once it's finished the framework should be a good basis for design of a C++ threaded version.

 
 
Tristan Croll
Research Fellow
Cambridge Institute for Medical Research
University of Cambridge CB2 0XY

in reply to: 17 ; follow-up: 13 comment:15 by tic20@…, 8 years ago

One thing that will be needed to make this happen will be the ability to include the OpenMM headers and link its libraries at compile time. I recall Conrad was working on support for this in the bundle build system, but don't know where it's up to.

 
 
Tristan Croll
Research Fellow
Cambridge Institute for Medical Research
University of Cambridge CB2 0XY

in reply to: 18 ; follow-up: 14 comment:16 by Conrad Huang, 8 years ago

The build system just adds the ChimeraX header and library 
platform-specific locations to the generated setup script and uses the 
standard setuptools functions.  The trick is to get the headers and 
libraries (especially on Windows) into the right places for the "prereq" 
builds such as OpenMM.  Maybe TomG, who got OpenMM to work properly, can 
chime in here?

Conrad

On 8/25/2017 1:41 PM, Tristan Croll wrote:

follow-up: 15 comment:17 by Greg Couch, 8 years ago

FYI, Python 3.6.2 just came out on 2017-07-17 and has a few multiprocessing bug fixes: fix race condition on cleanup fix exception handling in feeder fix bug in pools raising an exception at the very first of an iterable release references to tasks

in reply to: 20 ; follow-up: 16 comment:18 by goddard@…, 8 years ago

I don’t understand — why would we need to compile OpenMM?  OpenMM is included in the main ChimeraX distribution.  It should release the Python GIL to allow threading.  If it does not then it will be much simpler to submit a pull request to github OpenMM to fix this.  If it is using SWIG to make the Python wrapper than “swig -threads …” releases the Python GIL for every wrapped C++ method.

	https://github.com/pandegroup/openmm

in reply to: 21 comment:19 by tic20@…, 8 years ago

I didn't mean OpenMM would need to be compiled. Unless I'm missing 
something (which is entirely possible - a lot of this is quite new to 
me), I would need to provide some C functions that call the OpenMM 
simulation object in a thread - which would require me to #include 
OpenMM.h. It would be ideal to be able to do so while still using the 
standard bundle build system.

On 2017-08-28 19:25, ChimeraX wrote:

in reply to: 22 ; follow-up: 18 comment:20 by goddard@…, 8 years ago

I don’t think you need any C functions.  You would just use OpenMM through the normal Python OpenMM function calls.

follow-up: 19 comment:21 by Tristan Croll, 8 years ago

Not quite ready to go up on toolshed, but I do have enough working that you should be able to experiment with it in VR if I send you an advance copy. Many things are still broken, but you can start a live simulation (with or without crystallographic maps), and tug atoms.

The interesting thing about the multiprocess implementation is that it leads to the slightly counter-intuitive result that the graphics framerate is actually slower for smaller simulations (~50fps for a 5200-atom simulation where OpenMM takes about 50ms to generate new coordinates, ~35-40fps for 1200 atoms where OpenMM takes about 16ms) - the redraws on updating coordinates are still the major bottleneck, but nothing like as much as before. To get sufficient performance for VR it might be necessary to throttle the simulation thread back to something in the vicinity of 15-20 coordinate updates per second.

in reply to: 24 ; follow-up: 20 comment:22 by goddard@…, 8 years ago

Yes for VR I would expect to be limiting the MD update rate.  As usual I am swamped with projects to work on, so I don’t want a broken version of anything.  I’ll wait until you have got multi-process ISOLDE stable and on the toolshed.

comment:23 by Tristan Croll, 8 years ago

This just got a bit more complicated. The upshot is that GPU threading and CPU threading are difficult to choreograph together, and neither the OpenCL nor CUDA APIs appear to make it particularly easy. Each seems to expect there to only be one GPU Context per process, and that Context can never be shared between threads. Everything works fine as long as the main ChimeraX process never calls OpenCL or CUDA prior to creating the simulation process - the thread just creates its own Context which is destroyed on termination. But if the new process inherits an existing Context, then the OpenMM initialisation fails. That's OK in Linux at present - no OpenCL or CUDA context is created during a "vanilla" ChimeraX session - but not so much on the Mac. I think this is because Apple's OpenCL implementation blurs the lines between OpenGL and OpenCL to allow them to share GPU memory... hard to say for sure, since it's all closed source. In any case, given a simulation-ready input.pdb the below code runs successfully in Linux but fails on my MacBook Air. Uncomment the do_sim() line to first run a simulation in the main process before creating the fork, and Linux will fail as well.

This is going to be a problem with *any* threaded implementation, and looks like it will take some quite low-level adjustment of the OpenMM code to fix. Until then, I'm afraid that simulations on the Mac will need to be limited to CPU only.

by Tristan Croll, 8 years ago

Attachment:	test_multiproc.py added

Illustration of issue with multiprocessing and OpenCL/CUDA

follow-up: 22 comment:24 by Tristan Croll, 8 years ago

Addendum: when I say "fails on my MacBook" I mean within the ChimeraX environment. If I run it as /Applications/ChimeraX/Contents/bin/python3.6 test_multiproc.py then it succeeds.

in reply to: 28 comment:25 by goddard@…, 8 years ago

I know on Mac each thread can have its own OpenGL context, although I have not tested this.  Here is some useful info on this topic, particularly the section "OpenGL Restricts Each Context to a Single Thread”

https://developer.apple.com/library/content/documentation/GraphicsImaging/Conceptual/OpenGL-MacProgGuide/opengl_threading/opengl_threading.html <https://developer.apple.com/library/content/documentation/GraphicsImaging/Conceptual/OpenGL-MacProgGuide/opengl_threading/opengl_threading.html>

Not sure about the multithread OpenCL situation on Mac.  Of course it may be difficult to figure this out and not worth the time.  Isn’t the aim of your multiprocess/multithread additions to improve interactive performance?  If so then allowing only CPU OpenMM on Mac goes in the wrong direction since the GPU version was 3x faster in past ISOLDE versions.  It seems to me that it would be good to try to allow OpenMM to work in the main ChimeraX thread as formerly or in a separate thread where that works.  Hopefully that is not too complex to allow both modes.  If you redesign so only the more complex multithread is supported I think you are heading for debugging nightmares when the multithread code goes haywire and you can’t fallback to the single-thread implementation to determine that it really is a multithreading problem.

in reply to: 29 comment:26 by tic20@…, 8 years ago

It should actually be fairly easy to adapt my new implementation back to an optional single-threaded mode while still maintaining essentially the same architecture. Will work it out once I've cleaned up all the detritus from this round of rebuilding (lots of now-defunct code to remove).

In terms of usability, the primary aim was to ensure that the performance of all the standard ChimeraX interactions (rotating, zooming, panning, menus etc.) is independent of simulation performance. That I've achieved - everything remains smooth and pleasant to work with whether the simulation is running at 2 frames per second or 20. While obviously the latter is preferable, much of what I've been doing lately has been aimed at reducing/removing the need for direct tugging on atoms (the only thing for which simulation frame rate is really critical). So I would call the multiprocess implementation a net win even if simulation performance is reduced (and it's simply beautiful in Linux where the simulation performance is substantially enhanced). Of course, working things out to get to its full potential on all platforms is an important goal, but not as important right now as making it really shine on *one* platform.

In Linux everything is back to working order, by the way. I should be able to put a build on the Toolshed by the weekend. For the Mac, the main thing holding it back (other than the above) is the Clipper build. The PyPI build is essentially ready to go live, but the final go-ahead is dependent on the core Clipper-Python team.

Cheers,

Tristan

Tristan Croll
Research Fellow
Cambridge Institute for Medical Research
University of Cambridge CB2 0XY

in reply to: 30 comment:27 by goddard@…, 8 years ago

That is all great progress, and hard won.  What is the situation on Windows?

in reply to: 31 ; follow-up: 25 comment:28 by tic20@…, 8 years ago

Windows is going to require a completely different approach, as far as I can tell. Unless someone can show me otherwise, I think Python multiprocessing/threads will be out of the question, and I'd be better off working through the OpenMM C++ API to create the simulation in a C++ thread, to work the same way across all platforms. That will be some way down the track, though. I have lots more to learn before attempting to tackle that, and I need to have something out there and legitimately usable in the meantime.

 
 
Tristan Croll
Research Fellow
Cambridge Institute for Medical Research
University of Cambridge CB2 0XY

in reply to: 32 ; follow-up: 26 comment:29 by goddard@…, 8 years ago

Ok, sounds reasonable.

in reply to: 33 ; follow-up: 27 comment:30 by tic20@…, 8 years ago

I've just put a new Linux bundle up on the toolshed. Free mode (no maps) 
and Crystallography mode are working well. Single-particle EM mode has a 
few bugs, but I haven't really bothered to address them yet because I'm 
planning to overhaul that anyway to make the look/feel essentially the 
same as the crystallography mode.

If you want a Mac version to play with, I can send it to you offline, 
alongside the Clipper wheel files. I don't want to put it up on Toolshed 
until the new version of Clipper goes live. I've had to make a few 
compromises to make the Mac build reasonably usable - namely limiting it 
to CPU-only and dropping back to vacuum conditions rather than implicit 
solvent. Dropping the implicit solvent is necessary for two reasons: not 
only is it by far the most computationally heavy task in the simulation, 
but it turns out to be currently broken on the OpenMM CPU platform 
(calculates incorrect values that cause the protein to unfold). The good 
news is that with it removed the CPU-only performance is at least 
reasonably usable for small simulations, and given a reasonably decent 
map and some judicious use of secondary structure restraints the vacuum 
conditions are still good enough to be useful. On the GPU in Linux I get 
wonderfully smooth performance even when all atoms of a 5200-atom 
structure are mobile (without maps) - and even a 17000-atom simulation 
is perfectly usable (after a few slight tweaks I made after uploading 
the bundle, I get 30-40fps graphics, 5-10 fps simulation). Already 
amazing, and there's still so much scope for further gains!

Cheers,

Tristan


On 2017-09-05 19:20, ChimeraX wrote:

in reply to: 34 ; follow-up: 28 comment:31 by goddard@…, 8 years ago

Ok.  Can you put in the ISOLDE toolshed page description links to a small PDB or mmCIF model and density map as example data?  We don’t have good ways to add hydrogens to prepare an atomic model for using OpenMM so this example data would be handy.  When example data is up I can give the Linux ISOLDE a spin.

in reply to: 35 ; follow-up: 29 comment:32 by tic20@…, 8 years ago

Even better idea: why don't I bundle a small example with it and add a "load example data" button?

If you just want to play with a model sans map, 1pmx works as-downloaded.

 
 
Tristan Croll
Research Fellow
Cambridge Institute for Medical Research
University of Cambridge CB2 0XY

in reply to: 36 ; follow-up: 30 comment:33 by goddard@…, 8 years ago

Yeah, the load example button sounds useful.

follow-up: 31 comment:34 by Tristan Croll, 8 years ago

OK, done. If you install the latest version you'll find a "load demo" button at the top of the widget, which will load up a small, 2.9-Angstrom resolution crystal structure from the mid-2000s (2b9r, for the record). I've done some basic preparation work to it (removed a bunch of entirely unjustified water molecules, added missing sidechains, added hydrogens, and cleared up the worst clashes (backbone passing through rings, etc.). It's still pretty horrible, though (as most structures of that era and resolution are), with lots of room for improvement. It should be selected and ready, so all you'll need to do is click "Go" on the ISOLDE widget - and you should definitely do this with the whole model selected before doing anything else.

A couple of things to note with regard to getting the best performance for VR:

validation tasks (Ramachandran look-ups and checking/annotation of cis/trans/twisted peptide bonds) are still in the main process. They're only a few ms each and only run once every 20 simulation frames (and staggered so Ramachandran happens at 0/20/40/... and peptide bond checking happens at 10/30/50/..., so they shouldn't be too much of an issue - but it shouldn't be that difficult at all to move most of their work into the simulation process if necessary.
you'll get a much more noticeable pause every second or two - this is due to my running surface_zone() on the maps at regular intervals to update their masks to the current atom positions. You'll almost certainly want to turn this off for VR. If isolde is the tool as it appears in session.tools.list, then:

isolde.isolde.params.rounds_per_map_remask = SOME_VERY_LARGE_NUMBER

should do the trick.

with a simulation running,

isolde.isolde.tug_atom_to(atom, xyz_in_angstroms_as_numpy_array)

and

isolde.isolde.stop_tugging(atom)

... should be all you need for basic interactions. Just make sure the atom is a heavy, non-hydrogen one.

if you ctrl-scroll with the mouse, each map will select in turn with its name displayed on the status bar. Scrolling the mouse (without modifier) changes the contour on the last selected map. I highly recommend setting the sharpened map to a solid, transparent surface with a contour around 2.5 sigma (I've been meaning to make this the default, but haven't gotten to it yet).

in reply to: 38 ; follow-up: 32 comment:35 by tic20@…, 8 years ago

Ho hum. Forgetting what's in my own code.
{{{isolde.isolde.params.remask_maps_during_sim = False}}}
is much simpler, but needs to be set before you start a simulation.

On 2017-09-10 21:44, ChimeraX wrote:

in reply to: 39 ; follow-up: 33 comment:36 by goddard@…, 8 years ago

Thanks for the update.  The only viable VR platform currently is Windows.  The new release of Mac OS this Fall will also support SteamVR, but it will require a new iMac to have adequate graphics and we don’t have one in the lab.  Linux is in the worst state, there is some developer SteamVR with many issues, one of them being "OpenGL applications are currently too slow to use interactively; only the Vulkan Submit path is optimal.” as described on the Linux SteamVR development site

	https://github.com/ValveSoftware/SteamVR-for-Linux <https://github.com/ValveSoftware/SteamVR-for-Linux>

But I can give Linux ISOLDE a try without just to see the current state on Linux.

in reply to: 40 comment:37 by tic20@…, 8 years ago

Just trying to summarise the key points about multiprocessing itself...

- The 'spawn' method, while technically applicable to all platforms, 
looks like a nightmare to work with. Probably not worth it, on the 
whole.

- The 'fork' method is quite easy and is fast/flexible with easy use of 
shared memory, but is limited to Unix OSes. GPU support is flaky - in 
Linux, OpenCL and CUDA both work as long as they haven't been previously 
used in the parent process (even if the pre-existing Context has been 
destroyed... not clear why). On the Mac, it seems to be a no-go. Still, 
within these limits it works well enough on Linux for now.

- I don't think Python threads will be the best long-term solution for 
ISOLDE. Yes, it would improve performance compared to a fully 
single-threaded implementation, but working with the OpenMM Python API 
there would still be enough purely-Python tasks to make it 
near-impossible to ever get to VR speeds. Better to just do it properly 
in the first place and come to terms with the OpenMM C++ API. For the 
time being I'll get a single-threaded option back in place to provide at 
least reasonable Windows/Mac support while I get the remaining key 
features sorted out.

On 2017-09-11 18:36, ChimeraX wrote:

in reply to: 41 ; follow-up: 35 comment:38 by goddard@…, 8 years ago

Nice summary. Your plan sounds good.

I don’t understand the justification for Python threads not working ("with the OpenMM Python API there would still be enough purely-Python tasks to make it near-impossible to ever get to VR speeds”).  If there are slow tasks done in Python then those need to be optimized, often that means moving the Python to C/C++.

in reply to: 42 ; follow-up: 36 comment:39 by tic20@…, 8 years ago

I probably just need to think/learn more about it, then.

Another interesting little bit of flakiness that crops up under very 
specific circumstances: if I start ISOLDE, then start the IPython 
console and do:
`isolde = session.tools.list[-2]`

... then start a simulation, everything will be fine. But after that 
point, if I close the console while a simulation is running or close the 
console then start a new simulation, then ChimeraX will crash with one 
of the following two errors:

{{{
[5430:5430:0916/162411.069077:ERROR:browser_main_loop.cc(272)] Gdk: 
ChimeraX: Fatal IO error 11 (Resource temporarily unavailable) on X 
server :0.
}}}
or the particularly perplexing:
{{{
[6265:6265:0916/163324.419828:ERROR:browser_main_loop.cc(272)] Gdk: 
ChimeraX: Fatal IO error 0 (Success) on X server :0.
}}}

... with the graphical windows still hanging around as zombies that have 
to be killed with xkill.

On 2017-09-15 18:21, ChimeraX wrote:

in reply to: 43 ; follow-up: 37 comment:40 by tic20@…, 8 years ago

... well holy heck. It turns out that the multiprocessing module has an 
undocumented feature.

{{{
from multiprocessing.pool import ThreadPool
}}}

gives you a class with an identical API to multiprocessing.Pool, but 
uses threads under the surface. A two-line change to my code, and it 
just magically works on my Mac. And yes, all the key OpenMM functions do 
indeed release the GIL, so performance is remarkably good for such a 
lightweight machine.

On 2017-09-15 18:21, ChimeraX wrote:

follow-up: 38 comment:41 by Tristan Croll, 8 years ago

Resolution:	→ fixed
Status:	assigned → closed

Note: See TracTickets for help on using tickets.

Download in other formats: