Opened 6 years ago

Closed 6 years ago

#2546 closed defect (not a bug)

Wrong OpenCL driver used on machine with Nvidia and Intel GPUs

Reported by: Tristan Croll Owned by: Tristan Croll
Priority: major Milestone:
Component: Structure Editing Version:
Keywords: Cc: Tom Goddard
Blocked By: Blocking:
Notify when closed: Platform: Windows 10
Project: ChimeraX

Description

Bug report from Joel Mackay in Sydney. On his Windows machine attempting to start a simulation in ISOLDE crashes ChimeraX. I'll attach the crash log he sent me, but it boils down to the system providing the Intel OpenCL driver (igdrclneo64.dll) ahead of the Nvidia one. There are many discussions of the same problem online - see https://www.addictivetips.com/windows-tips/force-app-to-use-dedicated-gpu-windows/ and https://www.daz3d.com/forums/discussion/243216/getting-a-fatal-error-on-launch-solved for example. Apparently one solution that amazingly seems to work without breaking anything is to just delete that DLL, but that's hardly a general solution. It seems it's possible for programs (especially games) to register themselves as requiring the high-powered GPU if available... might be an idea for ChimeraX to do so?

Otherwise, I can see if I can figure out how to make OpenMM specifically be more picky about which driver it uses.

Attachments (1)

chimera_crash.evtx (68.0 KB ) - added by Tristan Croll 6 years ago.
Crash log

Download all attachments as: .zip

Change History (12)

by Tristan Croll, 6 years ago

Attachment: chimera_crash.evtx added

Crash log

comment:1 by Eric Pettersen, 6 years ago

Component: CoreStructure Editing
Owner: changed from Eric Pettersen to Tom Goddard

comment:2 by Tom Goddard, 6 years ago

Cc: Tom Goddard added
Owner: changed from Tom Goddard to Tristan Croll

In the ChimeraX Windows launcher C program we have the following declaration of symbol NvOptimusEnablement which is supposed to tell Windows to use Nvidia graphics on dual Intel / Nvidia graphics laptops that use Optimus graphics switching. This may not have any relation to how the GPU for OpenCL use is chosen but perhaps there is a similar mechanism.

I'll let you investigate how to choose OpenCL on dual GPU systems. I have not used OpenCL so I don't even know the basics for setting up an OpenCL context. First thing I would look at is whether OpenCL can list and choose the GPU to use. Vulkan works that way, but OpenGL does not have an API to list or choose GPUs. Since OpenCL is much more modern it might have a way to choose. If it does then one route, and probably the best is that OpenMM should choose the right GPU. If OpenCL can't choose when creating a context then maybe there is some hack like the OpenGL / Optimus symbol.

/*

#ifdef _WIN32
_declspec(dllexport) DWORD NvOptimusEnablement = 0x00000001;
#endif

comment:3 by Tom Goddard, 6 years ago

This really is an OpenMM issue since it is OpenMM that is using OpenCL.

comment:4 by Tristan Croll, 6 years ago

Hmm... while OpenMM does have a mechanism to tell it which GPU to use, it doesn't have any querying tools to help *choose* which GPU to use. The necessary information can be collected quite trivially using PyOpenCL, but it's only provided precompiled on the PyPI for Linux. Sigh...

comment:5 by Tristan Croll, 6 years ago

There *are* binaries for all three platforms on Conda Forge, though: https://anaconda.org/conda-forge/pyopencl. Any thoughts?

in reply to:  7 comment:6 by Tom Goddard, 6 years ago

Since OpenMM can choose the GPU for OpenCL it might be sensible to add to OpenMM a routine (in C/C++) that queries OpenCL for available devices and provides whatever info you think is needed to make a decision about which GPU to use.

Another thought is the OpenCL platform and device ids are integers starting from 0.  What happens if you just choose 0,0 then can you ask OpenMM about the GPU it got, like whether it is Nvidia or Intel or AMD?  Then try platform/device = 1,0.  You'll probably get an exception if a GPU does not exist.

PyOpenCL looks reasonable.  We could possibly add Anaconda package installing in prereqs. But do we really want to distribute that?  Maybe it should be included in the OpenMM bundle we've been talking about.

in reply to:  8 ; comment:7 by Tristan Croll, 6 years ago

Being able to install from Conda would be a huge advantage. Quite a few major projects are heading that way (CCTBX and iirc CCP-EM, for a start). Having CCTBX directly available in ChimeraX would make quite a few people very happy.
 

 


in reply to:  9 ; comment:8 by Tom Goddard, 6 years ago

OpenMM is also distributed via Conda.  Right now we do some painful nonsense where we install by hand from Conda, then make tar balls for the actual ChimeraX build to use to include OpenMM.

I don't understand your CCTBX comment.  How would someone use CCTBX in ChimeraX unless some GUI or command interface was created?  It seems unlikely that people are going to write their own Python to use it.  So getting it installed via Conda seems a minor part of being able to make use of it in ChimeraX.

in reply to:  10 ; comment:9 by Tristan Croll, 6 years ago

I’m not talking about end-users. Lots of methods developers who are familiar with CCTBX (I’m the only one in the lab here who *doesn’t* use it, as a matter of fact) who like the idea of ChimeraX as a front end. Airlie McCoy’s interested in interfacing to Phaser. Arjen Jacobi’s LocScale uses it, and I know he’s keen on the idea of a ChimeraX-LocScale. In general, it would enable a direct rather than arms-length interface to the Phenix suite.

 
 
 

 


in reply to:  11 ; comment:10 by Tom Goddard, 6 years ago

Ok, makes sense.

comment:11 by Tristan Croll, 6 years ago

Resolution: not a bug
Status: assignedclosed

Turns out the information I read about OpenMM not making any effort to choose a GPU was out of date: reading through the code I found a quite elaborate routine that checks the capabilities of each listed OpenCL device and picks the fastest one. It of course has to rely on the GPUs actually accurately reporting their capabilities (apparently that can be a problem), but I don't think I could improve on it much. Apparently there's no way to uniquely identify a GPU with OpenCL without resorting to some poorly/un-documented vendor-specific code, making it hard to store a "preferred GPU" variable to persist across sessions.

Meanwhile, reading more indicates that the root cause of the crash is that the Intel OpenCL library for Windows is pretty horribly broken, and the only real recourse is to uninstall it. Will see about making a note somewhere in my documentation.

Note: See TracTickets for help on using tickets.