Context Navigation

← Previous Ticket
Next Ticket →

#925 closed defect (fixed)

Severe degradation in threaded performance on Linux

Reported by:	Tristan Croll	Owned by:	tic20@…
Priority:	moderate	Milestone:
Component:	Core	Version:
Keywords:		Cc:
Blocked By:		Blocking:
Notify when closed:		Platform:	all
Project:	ChimeraX

Description

This seems to have happened at least a few months ago, but I've only recently had the time to start trying to diagnose it. The results are quite baffling, but I'm putting them out there in hopes that they might ring some bells.

In brief, ISOLDE's current design allows switching between multiprocessing (using os.fork, working on Linux only) and threading with a one-line change (swapping out multiprocessing.pool.Pool for multiprocessing.pool.ThreadPool). When I first switched over to threads the results were excellent - (very) slightly slower than multiprocessing, but still allowing graphics framerates on the order of 20-40 fps with simulations running. That's still true on my Macbook Air - for simulations of 2425 and 8346 residues (achieving approx. 5 and 2 coordinate updates/sec respectively) the graphics framerate averaged over 100 frames is about 50 fps. On my desktop (Xeon E5-2687W, GTX1080, CentOS 7) the 8346-atom simulation is now getting a paltry 3.4 fps, while on my laptop (i7-6700HQ, GTX1070, Fedora 24) it's around 8 fps.

The slowdown appears to be specifically associated with threading - if I switch the desktop over to multiprocessing the framerate immediately jumps to 17.1 fps. It also doesn't appear to be due to the simulation thread (the Python calls in there are minimal, and if I increase it from 20 to 500 simulation steps between updates of the ChimeraX coordinates the threaded framerate jumps to 44 fps). I've tried systematically removing all the callbacks I run on updating coordinates, and nothing seems to make a difference. So I'm somewhat at a loss. Any clues as to what might have changed?

To sum up: at present, the exact same code running on today's ChimeraX builds is getting ~15x better graphics performance on the MacBook Air than on heavy-duty Linux machines. Just weird.

Change History (6)

comment:1 by Tom Goddard, 8 years ago

Owner:	changed from Tom Goddard to tic20@…

So on your Macbook Air you get 2 coordinate updates per second and 50 graphics frames per second for 8346 residues using threads. On the more powerful linux desktop and laptop linux machines you get 3.4 graphics frames per second and 8 graphics frames per second with threads. And on the desktop if you switch from threads to multiprocessing you get 17.1 graphics frames per second. And going back to threads on the desktop if you increase the simulation steps per coordinate update from 20 to 500 that actually increase the frame rate from 3.4 graphics frames per second to 44 graphics frames per second.

A few possibilities come to mind. 1) Both OpenMM and ChimeraX graphics are trying to use the GPU and Linux is not handling the sharing of the GPU gracefully. To test this maybe you can run the simulation on the CPU and see if the graphics framerate goes up. 2) Somehow the threads are blocking each other much more on Linux, seems unlikely, but maybe the Python thread implementation is a little different on Mac and Linux and interacts poorly with linux thread scheduling. 3) Somehow X-windows is the culprit -- I recall 6 months ago you said status message display was slowing everything down significantly on linux but not much on mac. Status message rendering in ChimeraX has since switched from using Qt to using OpenGL. I'm not suggesting it has anything to do with status messages, rather frequent context switches to the X windows process may be causing slow downs.

Lastly a general but important point: I think you reduce the value of your work by developing on Linux. The majority of people who would benefit from your software will use Windows, some Mac, and lastly Linux. By not developing on your user's most common platforms you reduce the usability of your software. Your problem in this bug is likely a result of the shoddy support for graphics on Linux desktops. Linux is used on servers worldwide but the desktop use is a fringe market (3% of desktop computing). The numbers are likely better for your potential users but I think you are misleading yourself if you think it is the primary platform for your application.

in reply to: 2 ; follow-up: 2 comment:2 by tic20@…, 8 years ago

Point taken regarding OS, but at least for the short term my hands are 
somewhat tied by the fact that almost none of the "core" structural 
biology people seem at all interested in developing for Windows. In 
particular, I can't do a thing until Clipper-Python is ported over, and 
despite words to the contrary from various people it seems if I want 
anything done on that I still need to do it for myself. My short-term 
goal is to get enough senior structural biologists (the sort who get the 
job of assessing grant applications) excited to maximise my chances of 
getting funding - at which point I might actually be able to hire 
someone to help with that.

Anyway, I think I understand a little better what's going on now. It's 
not poor sharing of the GPU - quite the opposite, in fact. The GPU is 
doing *too* good a job, and the reason the Mac does better is 
paradoxically because the GPU is slow (and appears to halt OpenCL jobs 
in favour of graphics). It looks to me like a version of the problem 
described on slide 50 at 
http://www.dabeaz.com/python/UnderstandingGIL.pdf. Basically, the 
problem is that Python 3.2+ only switches threads every 5ms, or when the 
GIL is explicitly released. This works fine when the code that releases 
the GIL is expected to run for much more than 5ms, but appears to become 
somewhat disastrous when you have two or more threads each making lots 
of short calls that each release the GIL in amongst a bunch of Python. 
If, say, the ChimeraX thread makes a call using ctypes that *should* 
last 0.1ms, it'll yield the GIL to the simulation thread - and if that 
happens to be doing some of the Python wrapping the actual GIL-released 
simulation, ChimeraX won't get it back until that 5ms has elapsed. With 
everything happening in ChimeraX that could add up *very* quickly.

Anyway, once the problem's diagnosed the solution appears fairly simple. 
I set a target loop time of 100ms on the simulation thread and sleep it 
for whatever time is left over once the simulation steps are done, and 
my graphics frame rate immediately jumped to 15fps. Increasing the steps 
between GUI updates from 20 to 50 (still short enough to be 
interactively usable) brought me up to >30fps. So, back in business.

On 2017-11-21 20:23, ChimeraX wrote:

in reply to: 3 ; follow-up: 3 comment:3 by tic20@…, 8 years ago

Not sure if such things exist, unfortunately - at least not yet. OpenMM does have Windows builds, so that should hopefully not be too challenging. PHENIX has finally started in earnest on migration to Python 3 so it may well become a viable replacement for Clipper. Watch this space, I suppose.

 
 
Tristan Croll
Research Fellow
Cambridge Institute for Medical Research
University of Cambridge CB2 0XY

in reply to: 4 ; follow-up: 4 comment:4 by goddard@…, 8 years ago

Good!  Yes, threading has pitfalls, especially Python threading, and I have little experience with it.  Good to know to be wary of fast switching of Python threads.

Regarding running on Windows, I understand you depend on third party libraries like Clipper and OpenMM and if they don’t support Windows well you are in trouble.  But you should seriously consider using less capable libraries that run on Windows if they let you reach many more users.

in reply to: 5 ; follow-up: 5 comment:5 by tic20@…, 8 years ago

Still running sluggishly on my desktop (albeit faster than before - closer to 10fps rather than 3). That’s despite individual CPU and GPU benchmarks both yielding very comparable results on each machine. Going to have to put it down to the OS differences - there’s a lot of water under the bridge between kernel 3.10 (CentOS 7) and 4.8 (Fedora 24)! Ah, well - CentOS technically shouldn’t be able to run ChimeraX anyway, and only does on this machine because I compiled a not-officially-supported version of GCC...

The upshot of all of this is not particularly surprising, I suppose: while Python’s great for prototyping, ultimately I’m going to have to do all my threading at the C++ level.

 
 
Tristan Croll
Research Fellow
Cambridge Institute for Medical Research
University of Cambridge CB2 0XY

comment:6 by Tristan Croll, 8 years ago

Resolution:	→ fixed
Status:	assigned → closed

This old ticket is well and truly obsolete now.

Note: See TracTickets for help on using tickets.

Download in other formats: