Opened 6 years ago
Closed 6 years ago
#2546 closed defect (not a bug)
Wrong OpenCL driver used on machine with Nvidia and Intel GPUs
| Reported by: | Tristan Croll | Owned by: | Tristan Croll |
|---|---|---|---|
| Priority: | major | Milestone: | |
| Component: | Structure Editing | Version: | |
| Keywords: | Cc: | Tom Goddard | |
| Blocked By: | Blocking: | ||
| Notify when closed: | Platform: | Windows 10 | |
| Project: | ChimeraX |
Description
Bug report from Joel Mackay in Sydney. On his Windows machine attempting to start a simulation in ISOLDE crashes ChimeraX. I'll attach the crash log he sent me, but it boils down to the system providing the Intel OpenCL driver (igdrclneo64.dll) ahead of the Nvidia one. There are many discussions of the same problem online - see https://www.addictivetips.com/windows-tips/force-app-to-use-dedicated-gpu-windows/ and https://www.daz3d.com/forums/discussion/243216/getting-a-fatal-error-on-launch-solved for example. Apparently one solution that amazingly seems to work without breaking anything is to just delete that DLL, but that's hardly a general solution. It seems it's possible for programs (especially games) to register themselves as requiring the high-powered GPU if available... might be an idea for ChimeraX to do so?
Otherwise, I can see if I can figure out how to make OpenMM specifically be more picky about which driver it uses.
Attachments (1)
Change History (12)
by , 6 years ago
| Attachment: | chimera_crash.evtx added |
|---|
comment:1 by , 6 years ago
| Component: | Core → Structure Editing |
|---|---|
| Owner: | changed from to |
comment:2 by , 6 years ago
| Cc: | added |
|---|---|
| Owner: | changed from to |
In the ChimeraX Windows launcher C program we have the following declaration of symbol NvOptimusEnablement which is supposed to tell Windows to use Nvidia graphics on dual Intel / Nvidia graphics laptops that use Optimus graphics switching. This may not have any relation to how the GPU for OpenCL use is chosen but perhaps there is a similar mechanism.
I'll let you investigate how to choose OpenCL on dual GPU systems. I have not used OpenCL so I don't even know the basics for setting up an OpenCL context. First thing I would look at is whether OpenCL can list and choose the GPU to use. Vulkan works that way, but OpenGL does not have an API to list or choose GPUs. Since OpenCL is much more modern it might have a way to choose. If it does then one route, and probably the best is that OpenMM should choose the right GPU. If OpenCL can't choose when creating a context then maybe there is some hack like the OpenGL / Optimus symbol.
/*
- Make Nvidia Optimus GPU switching choose high performance graphics.
- http://developer.download.nvidia.com/devzone/devcenter/gamegraphics/files/OptimusRenderingPolicies.pdf */
#ifdef _WIN32
_declspec(dllexport) DWORD NvOptimusEnablement = 0x00000001;
#endif
comment:4 by , 6 years ago
Hmm... while OpenMM does have a mechanism to tell it which GPU to use, it doesn't have any querying tools to help *choose* which GPU to use. The necessary information can be collected quite trivially using PyOpenCL, but it's only provided precompiled on the PyPI for Linux. Sigh...
comment:5 by , 6 years ago
There *are* binaries for all three platforms on Conda Forge, though: https://anaconda.org/conda-forge/pyopencl. Any thoughts?
comment:6 by , 6 years ago
Since OpenMM can choose the GPU for OpenCL it might be sensible to add to OpenMM a routine (in C/C++) that queries OpenCL for available devices and provides whatever info you think is needed to make a decision about which GPU to use. Another thought is the OpenCL platform and device ids are integers starting from 0. What happens if you just choose 0,0 then can you ask OpenMM about the GPU it got, like whether it is Nvidia or Intel or AMD? Then try platform/device = 1,0. You'll probably get an exception if a GPU does not exist. PyOpenCL looks reasonable. We could possibly add Anaconda package installing in prereqs. But do we really want to distribute that? Maybe it should be included in the OpenMM bundle we've been talking about.
follow-up: 6 comment:7 by , 6 years ago
Being able to install from Conda would be a huge advantage. Quite a few major projects are heading that way (CCTBX and iirc CCP-EM, for a start). Having CCTBX directly available in ChimeraX would make quite a few people very happy.
follow-up: 7 comment:8 by , 6 years ago
OpenMM is also distributed via Conda. Right now we do some painful nonsense where we install by hand from Conda, then make tar balls for the actual ChimeraX build to use to include OpenMM. I don't understand your CCTBX comment. How would someone use CCTBX in ChimeraX unless some GUI or command interface was created? It seems unlikely that people are going to write their own Python to use it. So getting it installed via Conda seems a minor part of being able to make use of it in ChimeraX.
follow-up: 8 comment:9 by , 6 years ago
I’m not talking about end-users. Lots of methods developers who are familiar with CCTBX (I’m the only one in the lab here who *doesn’t* use it, as a matter of fact) who like the idea of ChimeraX as a front end. Airlie McCoy’s interested in interfacing to Phaser. Arjen Jacobi’s LocScale uses it, and I know he’s keen on the idea of a ChimeraX-LocScale. In general, it would enable a direct rather than arms-length interface to the Phenix suite.
follow-up: 10 comment:11 by , 6 years ago
| Resolution: | → not a bug |
|---|---|
| Status: | assigned → closed |
Turns out the information I read about OpenMM not making any effort to choose a GPU was out of date: reading through the code I found a quite elaborate routine that checks the capabilities of each listed OpenCL device and picks the fastest one. It of course has to rely on the GPUs actually accurately reporting their capabilities (apparently that can be a problem), but I don't think I could improve on it much. Apparently there's no way to uniquely identify a GPU with OpenCL without resorting to some poorly/un-documented vendor-specific code, making it hard to store a "preferred GPU" variable to persist across sessions.
Meanwhile, reading more indicates that the root cause of the crash is that the Intel OpenCL library for Windows is pretty horribly broken, and the only real recourse is to uninstall it. Will see about making a note somewhere in my documentation.
Crash log