Opened 8 years ago
Closed 8 years ago
#803 closed defect (fixed)
Multiprocessing "spawn" method does not work
Reported by: | Tristan Croll | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | Core | Version: | |
Keywords: | Cc: | Greg Couch, Tom Goddard | |
Blocked By: | Blocking: | ||
Notify when closed: | Platform: | all | |
Project: | ChimeraX |
Description
(No idea who best to assign this to, so giving it to Tom for now).
I'm exploring the multiprocessing module as a way to get some parallelism into ISOLDE - specifically, it looks like it should be quite straightforward to get the OpenMM simulation running in a subprocess so I can keep it out of the graphics loop, only updating coordinates when they're ready. There are two general methods for generating the subprocess: 'fork' (just create an exact copy of the current Python process in memory) and 'spawn' (start a new Python instance to communicate with via pipes). The 'fork' method works straightforwardly - but is only available on Unix machines and each subprocess carries all the memory overhead of whatever's loaded in ChimeraX. Both good reasons to avoid it.
The 'spawn' method is more universal and lightweight but is currently broken for ChimeraX - but can be made to work quite straightforwardly. The new processes it attempts to start crash with the unrecognised flag '-s' (and 'E' is also there and would probably cause trouble as well). These arise in multiprocessing.spawn.get_command_line()
lines 88-89, where it calls subprocess._args_from_interpreter_flags()
and adds the results to the executable call, thinking it's calling Python directly rather than through the ChimeraX wrapper. I guess it would be simple enough to have ChimeraX ignore those flags, or alternatively replace the offending lines in spawn.py:
#opts = util._args_from_interpreter_flags() #return [_python_exe] + opts + ['-c', prog, '--multiprocessing-fork'] return [_python_exe] + ['-c', prog, '--multiprocessing-fork']
Once that's done, the following toy example works as expected:
import multiprocessing as mp from multiprocessing import sharedctypes from math import ceil # If using the spawn method, foo needs to be defined somewhere that # ChimeraX knows where to find and re-import it. def foo(bfactors,start,end,out): for i in range(1000): c = bfactors.copy() c.sort() out[start:end] = bfactors def run_multiproc_test(atoms, nproc): # Have to explicitly re-import foo here from chimerax.isolde.multiproc_test import foo ctx = mp.get_context('spawn') import numpy natoms = len(atoms) arr = sharedctypes.Array('d', natoms, lock = False) ret = numpy.empty(natoms, numpy.float32) stride = int(ceil(natoms/nproc)) proclist = [] bfactors = atoms.bfactors for i in range(nproc): start = stride*i end = start+stride if end > natoms: end = natoms p = ctx.Process(target=foo, args=(bfactors[start:end],start,end,arr)) proclist.append(p) p.start() for p in proclist: p.join() ret[:] = arr return ret
... except that I get the following spammed to the terminal window:
Tool "File History" failed to start Traceback (most recent call last): File "/home/tic20/apps/chimerax/lib/python3.6/site-packages/chimerax/core/tools.py", line 422, in start_tools bi.start_tool(session, tool_name) File "/home/tic20/apps/chimerax/lib/python3.6/site-packages/chimerax/core/toolshed/info.py", line 487, in start_tool % tool_name) chimerax.core.toolshed.ToolshedError: tool "File History" is not supported without a GUI Tool "Density Map Toolbar" failed to start Traceback (most recent call last): File "/home/tic20/apps/chimerax/lib/python3.6/site-packages/chimerax/core/tools.py", line 422, in start_tools bi.start_tool(session, tool_name) File "/home/tic20/apps/chimerax/lib/python3.6/site-packages/chimerax/core/toolshed/info.py", line 487, in start_tool % tool_name) chimerax.core.toolshed.ToolshedError: tool "Density Map Toolbar" is not supported without a GUI
(which also appears when running any other non-gui module such as pip install).
Oh, and if running run_multiproc_test() from the console, you'll need to first do:
import sys mod = sys.modules['__main__'] mod.__spec__ = None
... since for whatever reason IPython's interactive shell DummyMod doesn't have spec defined.
Attachments (3)
Change History (44)
comment:1 by , 8 years ago
Cc: | added |
---|---|
Component: | Unassigned → Core |
comment:2 by , 8 years ago
follow-up: 3 comment:3 by , 8 years ago
Hi Tom, I thought about that approach, but in the end came down in favour of the one I'm taking. It's a prime candidate for Python's brand of "gross" multiprocessing: one heavy, long-running, self-contained process that once started is effectively independent. Yes, there's some back-and-forth communication required, but that's not too difficult: a few shared-memory scalars and arrays where speed is important, and the slower but more flexible multiprocessing.Manager for handling command passing. It's a bit of a job to rearrange things now that ISOLDE's gotten quite big, but it's conceptually straightforward and should end up making things quite a bit cleaner. Also, there's the not-inconsiderable advantage that it would make it possible to implement a client-server arrangement where the simulation no longer has to be on the same machine. Cheers, Tristan Tristan Croll Research Fellow Cambridge Institute for Medical Research University of Cambridge CB2 0XY
comment:4 by , 8 years ago
Owner: | changed from | to
---|
Ok, the way to find the best approach is to try them as you are doing. I don't have suggestions on the problems you encountered and don't think anyone here knows this better than you. Ignoring command-line options is not a good solution -- when people start ChimeraX from the command-line with invalid options and it doesn't do what they expect because the options were ignored that is very bad. The code could start though and write a warning about unrecognized options to the shell. The message about a toolbar needing a GUI sounds like the new ChimeraX process started in nogui mode (--nogui flag), and yet it tried to show a toolbar. Maybe you have some start-up script that is running and trying to show the density map toolbar (even in nogui mode)? You can test if ChimeraX is in nogui mode in Python using "session.ui.is_gui".
comment:5 by , 8 years ago
Cc: | added |
---|
follow-up: 6 comment:6 by , 8 years ago
Ah - yes. I'd completely forgotten about the contents of my chimerax_start directory! Tristan Croll Research Fellow Cambridge Institute for Medical Research University of Cambridge CB2 0XY
comment:7 by , 8 years ago
Some progress, a slightly more realistic example, and some observations.
- we don't actually want it to be calling the ChimeraX executable to spawn new threads. What those -s and -E switches are supposed to do is turn off all automatic importing, to create an absolutely minimal Python environment into which multiprocessing loads only the modules and data necessary to run the desired function. That means you really want to call ChimeraX's python executable directly, to avoid all the ChimeraX-specific setup overhead. Luckily, there's a (seemingly undocumented) function in multiprocessing.spawn,
set_executable()
, designed for just this function. I know Chimerax has the variablechimerax.app_bin_dir
available, but would it be possible to addchimerax.python_executable
to make it easier to do this across platforms?
- the multiprocessing.Array type is *really* slow if used directly:
import multiprocessing as mp import ctypes import numpy mp_arr = mp.Array(ctypes.c_double, 500000) data_arr = numpy.random.rand(500000).astype(float) %timeit mp_arr[:] = data_arr 28.8 ms ± 225 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
... but the following works wonders:
shared_numpy_arr = numpy.frombuffer(mp_arr.get_obj()).reshape((250000,2)) %timeit shared_numpy_arr[:] = data_arr.reshape((250000,2)) 186 µs ± 969 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) mp_arr[0] 0.3699979636088202
The downside of this is that while the mp.Array object itself is thread safe, the numpy array is circumventing that. If you're running a function where thread safety is important, you'll need to share both mp_arr
and shared_numpy_arr
(or just share mp_arr
and create a new numpy array in the worker thread), and when editing do:
with mp_arr.get_lock(): shared_numpy_arr[:] = whatever
In the example I've hacked together below, I haven't bothered to do this since none of the threads ever write to the same array indices.
- The Pool functionality is a little flaky. Make a mistake in its initialization function, and you'll only find out about it when your worker threads all hang.
- The start-up overhead is actually not that bad, but it's enough to kill off the idea of using it for simple one-off functions. But if you have a situation where you can start up the pool and then keep re-using it, then there is scope for some reasonable speed-up. On-the-fly calculation of sliding averages while animating through a trajectory, perhaps. In the example I'm just running a loop over
chimerax.core.geometry.interpolate_points
to give it some moderately-large task (still too small to make the multiprocessing really effective, to be honest). With it sitting in my isolde module (example for 3j3y):
m = session.models.list()[0] a = m.atoms starting_coords = a.coords final_coords = starting_coords+10 from chimerax.isolde import multiproc_test num_points = 50 nproc = 5 method = 'spawn' # other option is 'fork' cycles = 5 # To get timing of subsequent runs after the overhead of starting the pool interp_arr = multiproc_test.run_multiproc_test(starting_coords, final_coords, num_points, nproc, ncycles = cycles, context = method) Cycle 0 took 2.20410418510437 seconds Cycle 1 took 0.9888038635253906 seconds Cycle 2 took 1.006852149963379 seconds Cycle 3 took 0.9972550868988037 seconds Cycle 4 took 0.9754586219787598 seconds Finishing took 0.16719269752502441 seconds from chimerax.core.geometry import interpolate_points %timeit interpolate_points(starting_coords, final_coords, 0.5) 38.4 ms ± 746 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 38.4*50 1920
... so very roughly a 2-fold speedup from 5 cores. Not great, but better than nothing. Give the worker threads bigger jobs to do, and the speed-up should get better. For smaller structures it's still substantially slower than the single-threaded version even on the repeat runs. For my purposes with ISOLDE it should work nicely: just one thread to deal with, containing the simulation. Lots of shared variables to handle, but that I think I can do.
Cheers,
Tristan
Code itself:
import multiprocessing as mp from multiprocessing import sharedctypes, Pool, spawn import ctypes from math import ceil try: from chimerax import app_bin_dir import os spawn.set_executable(os.path.join(app_bin_dir, 'python3.6')) except: # We're not in ChimeraX any more, Toto! pass def error_callback(e): print(e) def _pool_init(c_arr, start_coord_array, end_coord_array, n_points, n_coords): ''' Creates and sets up the shared variables needed by the threads. Only runs within the threads themselves, so the global variables aren't too evil. Not sure if there's a way around using them. ''' import numpy global shared_arr global start_c global end_c shared_arr = numpy.frombuffer(c_arr.get_obj()).reshape((n_points, n_coords, 3)) start_c = numpy.frombuffer(start_coord_array.get_obj()).reshape((n_coords, 3)) end_c = numpy.frombuffer(end_coord_array.get_obj()).reshape((n_coords, 3)) def _interpolate_worker(interp_fracs,frames, proc_id): global shared_arr from chimerax.core.geometry import interpolate_points import numpy global start_c global end_c target = numpy.empty((len(frames), len(start_c),3), numpy.double) for (frac, frame) in zip(interp_fracs, frames): shared_arr[frame] = interpolate_points(start_c, end_c, frac) return True def run_multiproc_test(start_coords, end_coords, num_interpolation_points, nproc, ncycles=2, context = 'spawn'): from time import time, sleep start_time = time() ctx = mp.get_context(context) import numpy n_coords = len(start_coords) assert n_coords == len(end_coords) n_points = num_interpolation_points c_arr = mp.Array(ctypes.c_double, n_points*n_coords*3) start_coord_array = mp.Array(ctypes.c_double, n_coords*3) end_coord_array = mp.Array(ctypes.c_double, n_coords*3) sca = numpy.frombuffer(start_coord_array.get_obj()).reshape((n_coords,3)) sca[:] = start_coords eca = numpy.frombuffer(end_coord_array.get_obj()).reshape((n_coords,3)) eca[:] = end_coords frames = numpy.array(range(num_interpolation_points),dtype='int') fracs = frames / n_points stride = int(ceil(n_points/nproc)) with Pool(processes = nproc, initializer = _pool_init, initargs = (c_arr,start_coord_array,end_coord_array, n_points, n_coords)) as p: for cycle in range(ncycles): results = [] for i in range(nproc): start = stride*i end = start+stride if end > n_points: end = n_points results.append(p.apply_async(_interpolate_worker, args=(fracs[start:end],frames[start:end], i), error_callback=error_callback)) count = 0 while True: done = True for result in results: if not result.ready(): done = False if done: break count += 1 if count > 50000: print('Timeout!') break sleep(1e-4) print('Cycle {} took {} seconds'.format(cycle, time() - start_time)) start_time = time() ret = numpy.frombuffer(c_arr.get_obj()).reshape((n_points, n_coords, 3)) print('Finishing took {} seconds'.format(time()-start_time)) return ret
comment:8 by , 8 years ago
I'm not sure if this style of multiprocessing will be sensible in many core ChimeraX situations, but I can imagine it might be a useful addition to the toolset for plugins. In any case, I put together the attached little handy class, SharedNumpyArray, to combine the strengths of both multiprocessing.Array and numpy.ndarray. It behaves exactly like a numpy array for the most part, but is generated from a multiprocessing.Array and inherits its Lock object and methods. If you run python3.6 shared_array_test.py
it will run a quick example, but in brief:
import numpy import multiprocessing as mp mp_array = mp.Array('d', 50) shared_ndarray = SharedNumpyArray(mp_array).reshape((5,10)) # thread-safe with shared_ndarray.get_lock(): do_something(shared_ndarray) # not thread-safe do_something(shared_ndarray)
Either way, everything is as fast as standard numpy.
comment:9 by , 8 years ago
When you get ISOLDE on toolshed with OpenMM in a separate process I will be interested to try it. I have long wanted to try ISOLDE with the VR headsets but the graphics has to run fast (90 frames/second) so it needs OpenMM in a separate thread or process. If the multiprocessing and shared numpy array work well in the distributed ISOLDE we can put it in the core for other developers to use.
follow-up: 9 comment:10 by , 8 years ago
I'm pretty confident that can be made to happen, and I'm fairly sure I'm getting close. It's such a significant overhaul that it's difficult to test partway through, though. For the record, the SharedNumpyArray wasn't working quite correctly yet. Turns out that nobody ever thought to provide the ability to query the dtype of an existing multiprocessing.Array, and it behaves like the following: {{{ import multiprocessing as mp import ctypes arr = mp.Array(ctypes.c_float, 10) # An array of 10 float32 values type(arr[0]) float # returns a 64-bit Python float }}} ... which completely broke my workaround for setting the type of the numpy array. The solution was to "subclass" multiprocessing.Array. I put "subclass" in quotes because multiprocessing.sharedctypes is weird as hell - Array() is actually a method, which calls another method which calls another method which calls another method which actually creates the object, and each method adds a little to it. I'm sure it made sense to somebody at some point. The attached version now appears to work correctly for a range of different ctypes. On 2017-08-21 22:05, ChimeraX wrote:
follow-up: 10 comment:11 by , 8 years ago
Ha, ha. Any kind of parallelization is likely to be full of implementation surprises. I’m glad you are blazing a trail!
comment:12 by , 8 years ago
So, first the good news: I have a threaded simulation running, maintaining a nice 30-50 fps for 5200 atoms. The bad news: I never *did* have the spawn method working, and there are some enormous obstacles to overcome to make it work: 1) I'm now not sure if it's even possible to use the shared memory types with spawning. When I try to pass them to the pool initializer function it insists on trying to pickle them, and the Lock objects are not picklable. 2) possible even more problematic is the "if __name__ == '__main__'" requirement: the newly spawned processes don't *know* they're the spawned ones and try to spawn processes of their own, effectively setting off a fork bomb. The standard workaround is to put the spawn code under the control of the above if statement, which guarantees that it only runs in the master thread - but that only works if __name__ actually *is* '__main__', which it of course isn't for a module. So the upshot is that I can't see any way to make this work in Windows, and my searches on the web found nothing but other people also failing. So until I find a better way (and the time to implement it), I'm afraid I'm going to have to limit ISOLDE to Unix environments. Such is life. Tristan Croll Research Fellow Cambridge Institute for Medical Research University of Cambridge CB2 0XY
follow-up: 11 comment:13 by , 8 years ago
Ok. Not too surprising, interprocess communcation is a nightmare. That is why I suggest using threads. The main two drawbacks of threads are that you have to deal with the Python global interpretter lock — if OpenMM SWIG bindings does not release it then that will have to be improved. Maybe OpenMM already releases the GIL — production-ready code doing a compute intensive process run from Python should be releasing the GIL to allow threading. Fixing this seems like a small issue. The second drawback is that you won’t be running the OpenMM on a separate machine if you are using just a separate thread. But since the focus is on interactive performance, I think having to deal witn network latency to another machine is such a headache that I would not design the code such that that is a requirement.
follow-up: 12 comment:14 by , 8 years ago
Yep - looks like that's the ultimate way forward. For now, though, I think it would be best if get this version finalised then focus on other aspects for a while. At least once it's finished the framework should be a good basis for design of a C++ threaded version. Tristan Croll Research Fellow Cambridge Institute for Medical Research University of Cambridge CB2 0XY
follow-up: 13 comment:15 by , 8 years ago
One thing that will be needed to make this happen will be the ability to include the OpenMM headers and link its libraries at compile time. I recall Conrad was working on support for this in the bundle build system, but don't know where it's up to. Tristan Croll Research Fellow Cambridge Institute for Medical Research University of Cambridge CB2 0XY
follow-up: 14 comment:16 by , 8 years ago
The build system just adds the ChimeraX header and library platform-specific locations to the generated setup script and uses the standard setuptools functions. The trick is to get the headers and libraries (especially on Windows) into the right places for the "prereq" builds such as OpenMM. Maybe TomG, who got OpenMM to work properly, can chime in here? Conrad On 8/25/2017 1:41 PM, Tristan Croll wrote:
follow-up: 15 comment:17 by , 8 years ago
FYI, Python 3.6.2 just came out on 2017-07-17 and has a few multiprocessing bug fixes: fix race condition on cleanup fix exception handling in feeder fix bug in pools raising an exception at the very first of an iterable release references to tasks
follow-up: 16 comment:18 by , 8 years ago
I don’t understand — why would we need to compile OpenMM? OpenMM is included in the main ChimeraX distribution. It should release the Python GIL to allow threading. If it does not then it will be much simpler to submit a pull request to github OpenMM to fix this. If it is using SWIG to make the Python wrapper than “swig -threads …” releases the Python GIL for every wrapped C++ method. https://github.com/pandegroup/openmm
comment:19 by , 8 years ago
I didn't mean OpenMM would need to be compiled. Unless I'm missing something (which is entirely possible - a lot of this is quite new to me), I would need to provide some C functions that call the OpenMM simulation object in a thread - which would require me to #include OpenMM.h. It would be ideal to be able to do so while still using the standard bundle build system. On 2017-08-28 19:25, ChimeraX wrote:
follow-up: 18 comment:20 by , 8 years ago
I don’t think you need any C functions. You would just use OpenMM through the normal Python OpenMM function calls.
follow-up: 19 comment:21 by , 8 years ago
Not quite ready to go up on toolshed, but I do have enough working that you should be able to experiment with it in VR if I send you an advance copy. Many things are still broken, but you can start a live simulation (with or without crystallographic maps), and tug atoms.
The interesting thing about the multiprocess implementation is that it leads to the slightly counter-intuitive result that the graphics framerate is actually slower for smaller simulations (~50fps for a 5200-atom simulation where OpenMM takes about 50ms to generate new coordinates, ~35-40fps for 1200 atoms where OpenMM takes about 16ms) - the redraws on updating coordinates are still the major bottleneck, but nothing like as much as before. To get sufficient performance for VR it might be necessary to throttle the simulation thread back to something in the vicinity of 15-20 coordinate updates per second.
follow-up: 20 comment:22 by , 8 years ago
Yes for VR I would expect to be limiting the MD update rate. As usual I am swamped with projects to work on, so I don’t want a broken version of anything. I’ll wait until you have got multi-process ISOLDE stable and on the toolshed.
comment:23 by , 8 years ago
This just got a bit more complicated. The upshot is that GPU threading and CPU threading are difficult to choreograph together, and neither the OpenCL nor CUDA APIs appear to make it particularly easy. Each seems to expect there to only be one GPU Context per process, and that Context can never be shared between threads. Everything works fine as long as the main ChimeraX process never calls OpenCL or CUDA prior to creating the simulation process - the thread just creates its own Context which is destroyed on termination. But if the new process inherits an existing Context, then the OpenMM initialisation fails. That's OK in Linux at present - no OpenCL or CUDA context is created during a "vanilla" ChimeraX session - but not so much on the Mac. I think this is because Apple's OpenCL implementation blurs the lines between OpenGL and OpenCL to allow them to share GPU memory... hard to say for sure, since it's all closed source. In any case, given a simulation-ready input.pdb the below code runs successfully in Linux but fails on my MacBook Air. Uncomment the do_sim()
line to first run a simulation in the main process before creating the fork, and Linux will fail as well.
This is going to be a problem with *any* threaded implementation, and looks like it will take some quite low-level adjustment of the OpenMM code to fix. Until then, I'm afraid that simulations on the Mac will need to be limited to CPU only.
by , 8 years ago
Attachment: | test_multiproc.py added |
---|
Illustration of issue with multiprocessing and OpenCL/CUDA
follow-up: 22 comment:24 by , 8 years ago
Addendum: when I say "fails on my MacBook" I mean within the ChimeraX environment. If I run it as /Applications/ChimeraX/Contents/bin/python3.6 test_multiproc.py
then it succeeds.
comment:25 by , 8 years ago
I know on Mac each thread can have its own OpenGL context, although I have not tested this. Here is some useful info on this topic, particularly the section "OpenGL Restricts Each Context to a Single Thread” https://developer.apple.com/library/content/documentation/GraphicsImaging/Conceptual/OpenGL-MacProgGuide/opengl_threading/opengl_threading.html <https://developer.apple.com/library/content/documentation/GraphicsImaging/Conceptual/OpenGL-MacProgGuide/opengl_threading/opengl_threading.html> Not sure about the multithread OpenCL situation on Mac. Of course it may be difficult to figure this out and not worth the time. Isn’t the aim of your multiprocess/multithread additions to improve interactive performance? If so then allowing only CPU OpenMM on Mac goes in the wrong direction since the GPU version was 3x faster in past ISOLDE versions. It seems to me that it would be good to try to allow OpenMM to work in the main ChimeraX thread as formerly or in a separate thread where that works. Hopefully that is not too complex to allow both modes. If you redesign so only the more complex multithread is supported I think you are heading for debugging nightmares when the multithread code goes haywire and you can’t fallback to the single-thread implementation to determine that it really is a multithreading problem.
comment:26 by , 8 years ago
It should actually be fairly easy to adapt my new implementation back to an optional single-threaded mode while still maintaining essentially the same architecture. Will work it out once I've cleaned up all the detritus from this round of rebuilding (lots of now-defunct code to remove). In terms of usability, the primary aim was to ensure that the performance of all the standard ChimeraX interactions (rotating, zooming, panning, menus etc.) is independent of simulation performance. That I've achieved - everything remains smooth and pleasant to work with whether the simulation is running at 2 frames per second or 20. While obviously the latter is preferable, much of what I've been doing lately has been aimed at reducing/removing the need for direct tugging on atoms (the only thing for which simulation frame rate is really critical). So I would call the multiprocess implementation a net win even if simulation performance is reduced (and it's simply beautiful in Linux where the simulation performance is substantially enhanced). Of course, working things out to get to its full potential on all platforms is an important goal, but not as important right now as making it really shine on *one* platform. In Linux everything is back to working order, by the way. I should be able to put a build on the Toolshed by the weekend. For the Mac, the main thing holding it back (other than the above) is the Clipper build. The PyPI build is essentially ready to go live, but the final go-ahead is dependent on the core Clipper-Python team. Cheers, Tristan Tristan Croll Research Fellow Cambridge Institute for Medical Research University of Cambridge CB2 0XY
comment:27 by , 8 years ago
That is all great progress, and hard won. What is the situation on Windows?
follow-up: 25 comment:28 by , 8 years ago
Windows is going to require a completely different approach, as far as I can tell. Unless someone can show me otherwise, I think Python multiprocessing/threads will be out of the question, and I'd be better off working through the OpenMM C++ API to create the simulation in a C++ thread, to work the same way across all platforms. That will be some way down the track, though. I have lots more to learn before attempting to tackle that, and I need to have something out there and legitimately usable in the meantime. Tristan Croll Research Fellow Cambridge Institute for Medical Research University of Cambridge CB2 0XY
follow-up: 27 comment:30 by , 8 years ago
I've just put a new Linux bundle up on the toolshed. Free mode (no maps) and Crystallography mode are working well. Single-particle EM mode has a few bugs, but I haven't really bothered to address them yet because I'm planning to overhaul that anyway to make the look/feel essentially the same as the crystallography mode. If you want a Mac version to play with, I can send it to you offline, alongside the Clipper wheel files. I don't want to put it up on Toolshed until the new version of Clipper goes live. I've had to make a few compromises to make the Mac build reasonably usable - namely limiting it to CPU-only and dropping back to vacuum conditions rather than implicit solvent. Dropping the implicit solvent is necessary for two reasons: not only is it by far the most computationally heavy task in the simulation, but it turns out to be currently broken on the OpenMM CPU platform (calculates incorrect values that cause the protein to unfold). The good news is that with it removed the CPU-only performance is at least reasonably usable for small simulations, and given a reasonably decent map and some judicious use of secondary structure restraints the vacuum conditions are still good enough to be useful. On the GPU in Linux I get wonderfully smooth performance even when all atoms of a 5200-atom structure are mobile (without maps) - and even a 17000-atom simulation is perfectly usable (after a few slight tweaks I made after uploading the bundle, I get 30-40fps graphics, 5-10 fps simulation). Already amazing, and there's still so much scope for further gains! Cheers, Tristan On 2017-09-05 19:20, ChimeraX wrote:
follow-up: 28 comment:31 by , 8 years ago
Ok. Can you put in the ISOLDE toolshed page description links to a small PDB or mmCIF model and density map as example data? We don’t have good ways to add hydrogens to prepare an atomic model for using OpenMM so this example data would be handy. When example data is up I can give the Linux ISOLDE a spin.
follow-up: 29 comment:32 by , 8 years ago
Even better idea: why don't I bundle a small example with it and add a "load example data" button? If you just want to play with a model sans map, 1pmx works as-downloaded. Tristan Croll Research Fellow Cambridge Institute for Medical Research University of Cambridge CB2 0XY
follow-up: 31 comment:34 by , 8 years ago
OK, done. If you install the latest version you'll find a "load demo" button at the top of the widget, which will load up a small, 2.9-Angstrom resolution crystal structure from the mid-2000s (2b9r, for the record). I've done some basic preparation work to it (removed a bunch of entirely unjustified water molecules, added missing sidechains, added hydrogens, and cleared up the worst clashes (backbone passing through rings, etc.). It's still pretty horrible, though (as most structures of that era and resolution are), with lots of room for improvement. It should be selected and ready, so all you'll need to do is click "Go" on the ISOLDE widget - and you should definitely do this with the whole model selected before doing anything else.
A couple of things to note with regard to getting the best performance for VR:
- validation tasks (Ramachandran look-ups and checking/annotation of cis/trans/twisted peptide bonds) are still in the main process. They're only a few ms each and only run once every 20 simulation frames (and staggered so Ramachandran happens at 0/20/40/... and peptide bond checking happens at 10/30/50/..., so they shouldn't be too much of an issue - but it shouldn't be that difficult at all to move most of their work into the simulation process if necessary.
- you'll get a much more noticeable pause every second or two - this is due to my running surface_zone() on the maps at regular intervals to update their masks to the current atom positions. You'll almost certainly want to turn this off for VR. If
isolde
is the tool as it appears insession.tools.list
, then:
isolde.isolde.params.rounds_per_map_remask = SOME_VERY_LARGE_NUMBER
should do the trick.
- with a simulation running,
isolde.isolde.tug_atom_to(atom, xyz_in_angstroms_as_numpy_array)
and
isolde.isolde.stop_tugging(atom)
... should be all you need for basic interactions. Just make sure the atom is a heavy, non-hydrogen one.
- if you ctrl-scroll with the mouse, each map will select in turn with its name displayed on the status bar. Scrolling the mouse (without modifier) changes the contour on the last selected map. I highly recommend setting the sharpened map to a solid, transparent surface with a contour around 2.5 sigma (I've been meaning to make this the default, but haven't gotten to it yet).
follow-up: 32 comment:35 by , 8 years ago
Ho hum. Forgetting what's in my own code. {{{isolde.isolde.params.remask_maps_during_sim = False}}} is much simpler, but needs to be set before you start a simulation. On 2017-09-10 21:44, ChimeraX wrote:
follow-up: 33 comment:36 by , 8 years ago
Thanks for the update. The only viable VR platform currently is Windows. The new release of Mac OS this Fall will also support SteamVR, but it will require a new iMac to have adequate graphics and we don’t have one in the lab. Linux is in the worst state, there is some developer SteamVR with many issues, one of them being "OpenGL applications are currently too slow to use interactively; only the Vulkan Submit path is optimal.” as described on the Linux SteamVR development site https://github.com/ValveSoftware/SteamVR-for-Linux <https://github.com/ValveSoftware/SteamVR-for-Linux> But I can give Linux ISOLDE a try without just to see the current state on Linux.
comment:37 by , 8 years ago
Just trying to summarise the key points about multiprocessing itself... - The 'spawn' method, while technically applicable to all platforms, looks like a nightmare to work with. Probably not worth it, on the whole. - The 'fork' method is quite easy and is fast/flexible with easy use of shared memory, but is limited to Unix OSes. GPU support is flaky - in Linux, OpenCL and CUDA both work as long as they haven't been previously used in the parent process (even if the pre-existing Context has been destroyed... not clear why). On the Mac, it seems to be a no-go. Still, within these limits it works well enough on Linux for now. - I don't think Python threads will be the best long-term solution for ISOLDE. Yes, it would improve performance compared to a fully single-threaded implementation, but working with the OpenMM Python API there would still be enough purely-Python tasks to make it near-impossible to ever get to VR speeds. Better to just do it properly in the first place and come to terms with the OpenMM C++ API. For the time being I'll get a single-threaded option back in place to provide at least reasonable Windows/Mac support while I get the remaining key features sorted out. On 2017-09-11 18:36, ChimeraX wrote:
follow-up: 35 comment:38 by , 8 years ago
Nice summary. Your plan sounds good. I don’t understand the justification for Python threads not working ("with the OpenMM Python API there would still be enough purely-Python tasks to make it near-impossible to ever get to VR speeds”). If there are slow tasks done in Python then those need to be optimized, often that means moving the Python to C/C++.
follow-up: 36 comment:39 by , 8 years ago
I probably just need to think/learn more about it, then. Another interesting little bit of flakiness that crops up under very specific circumstances: if I start ISOLDE, then start the IPython console and do: `isolde = session.tools.list[-2]` ... then start a simulation, everything will be fine. But after that point, if I close the console while a simulation is running or close the console then start a new simulation, then ChimeraX will crash with one of the following two errors: {{{ [5430:5430:0916/162411.069077:ERROR:browser_main_loop.cc(272)] Gdk: ChimeraX: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. }}} or the particularly perplexing: {{{ [6265:6265:0916/163324.419828:ERROR:browser_main_loop.cc(272)] Gdk: ChimeraX: Fatal IO error 0 (Success) on X server :0. }}} ... with the graphical windows still hanging around as zombies that have to be killed with xkill. On 2017-09-15 18:21, ChimeraX wrote:
follow-up: 37 comment:40 by , 8 years ago
... well holy heck. It turns out that the multiprocessing module has an undocumented feature. {{{ from multiprocessing.pool import ThreadPool }}} gives you a class with an identical API to multiprocessing.Pool, but uses threads under the surface. A two-line change to my code, and it just magically works on my Mac. And yes, all the key OpenMM functions do indeed release the GIL, so performance is remarkably good for such a lightweight machine. On 2017-09-15 18:21, ChimeraX wrote:
follow-up: 38 comment:41 by , 8 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Wny not run OpenMM in a separate thread in the same process as the ChimeraX graphics? Separate processes complicate communication.
ChimeraX core/threadq.py has a little support for running calculations in a separate thread with thread-safe queues to request calculations and return results. This is used for example by the molecular surface calculation where chain surfaces are computed using a pool of threads in core/commands/surface.py (look for import of threadq). Maybe this will give a starting point for how to use threads although your case is a little different. An important limitation is that Python can't run in two threads at once. So the trick is that OpenMM has to release the Python interpreter lock when it drops into C++ code (not using any Python) -- you will need to check if the OpenMM Python wrapper releases the lock. Numpy and ChimeraX C++ surface calc routines are examples of code that releases the Python global interpreter lock so that it can run in threads while the main graphics thread continues running.