Opened 6 years ago

Closed 3 years ago

#2906 closed enhancement (wontfix)

Building OpenMM installer in devtoolset-7

Reported by: Tristan Croll Owned by: Tristan Croll
Priority: moderate Milestone:
Component: Build System Version:
Keywords: Cc: Greg Couch, Tom Goddard, Eric Pettersen
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

I've just compiled working OpenMM 7.5 installer in devtoolset-7, installed it into ChimeraX (built in devtoolset-7 on my machine) and built ISOLDE against it - all appears to work fine (although the ChimeraX build has a tendency to segfault on exit - probably something I've done wrong). I'm out of time today, but will summarise what I've done tomorrow. Build is based on the nvidia/cuda:10.2-devel-centos7 Docker image, with modified versions of the scripts at https://github.com/openmm/openmm/tree/master/devtools/packaging/scripts/linux. With a little more modification their use of miniconda could easily be removed in favour of using ChimeraX's Python.

Change History (15)

comment:1 by Eric Pettersen, 6 years ago

Component: UnassignedBuild System

in reply to:  2 ; comment:2 by Tristan Croll, 6 years ago

I'm afraid I'm unfamiliar with how exactly you're statically linking the 
required libstdc++ symbols when building ChimeraX itself (and I don't 
really speak Makefile all that well), but it appears I'm going to need 
to use the same approach when building OpenMM. With everything build in 
devtoolset-7 but running in the native CentOS 7 environment, "vanilla" 
ChimeraX starts and closes fine, but if I start ChimeraX, start ISOLDE 
then close ChimeraX I get the following traceback, which I *think* 
suggests it's attempting to use the std::string destructor from the 
system libstdc++ on a GCC 7.3 string. Actually loading a model and 
running simulations in ISOLDE using OpenCL or CUDA works just fine - 
it's just this crash at the end that concerns me.


*** Error in 
`/home/tic20/chimerax-git/chimerax/ChimeraX.app/bin/ChimeraX': double 
free or corruption (!prev): 0x00000000057bd4f0 ***

#0  0x00007ffff74d9337 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:55
#1  0x00007ffff74daa28 in __GI_abort () at abort.c:90
#2  0x00007ffff751be87 in __libc_message (do_abort=do_abort@entry=2, 
fmt=fmt@entry=0x7ffff762e3b8 "*** Error in `%s': %s: 0x%s ***\n")
     at ../sysdeps/unix/sysv/linux/libc_fatal.c:196
#3  0x00007ffff7524679 in _int_free (ar_ptr=0x7ffff786a760 <main_arena>, 
ptr=<optimized out>, str=0x7ffff762e4e0 "double free or corruption 
(!prev)", action=3)
     at malloc.c:4967
#4  0x00007ffff7524679 in _int_free (av=0x7ffff786a760 <main_arena>, 
p=<optimized out>, have_lock=0) at malloc.c:3843
#5  0x00007fffe6c38b63 in std::basic_string<char, 
std::char_traits<char>, std::allocator<char> >::~basic_string() 
(__a=..., this=<optimized out>)
     at 
/usr/src/debug/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/basic_string.h:539
#6  0x00007fffe6c38b63 in std::basic_string<char, 
std::char_traits<char>, std::allocator<char> >::~basic_string() 
(this=<optimized out>, __in_chrg=<optimized out>)
     at 
/usr/src/debug/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/basic_string.h:539
#7  0x00007ffff74dcc99 in __run_exit_handlers (status=status@entry=0, 
listp=0x7ffff786a6c8 <__exit_funcs>, 
run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#8  0x00007ffff74dcce7 in __GI_exit (status=status@entry=0) at exit.c:99
#9  0x00007ffff7a26d39 in Py_Exit (sts=sts@entry=0) at 
Python/pylifecycle.c:2292
#10 0x00007ffff78d93be in handle_system_exit () at 
Python/pythonrun.c:636
#11 0x00007ffff7a30b17 in PyErr_PrintEx () at Python/pythonrun.c:715
#12 0x00007ffff7a30b17 in PyErr_PrintEx 
(set_sys_last_vars=set_sys_last_vars@entry=1) at Python/pythonrun.c:646
#13 0x00007ffff7a30b2a in PyErr_Print () at Python/pythonrun.c:542
#14 0x00007ffff7a5122d in pymain_run_module (modname=<optimized out>, 
set_argv0=set_argv0@entry=1) at Modules/main.c:323
#15 0x00007ffff7a56e19 in pymain_main (pymain=0x7fffffffd700) at 
Modules/main.c:2865
#16 0x00007ffff7a56e19 in pymain_main 
(pymain=pymain@entry=0x7fffffffd700) at Modules/main.c:3029
#17 0x00007ffff7a572d2 in Py_Main (argc=<optimized out>, argv=<optimized 
out>) at Modules/main.c:3052
#18 0x00000000004008b7 in main ()


On 2020-03-02 18:42, ChimeraX wrote:

in reply to:  3 ; comment:3 by Tristan Croll, 6 years ago

I suspect I’ve been going around in circles all day, but this seems relevant: https://community.rti.com/forum-topic/double-free-error-exiting-certain-applications. I think the issue is that libOpenMM will need to link the ChimeraX library providing the stdc++ symbols and remove the link to libstdc++. Will try tomorrow.
 

 

comment:4 by Greg Couch, 6 years ago

This is a great experiment, and I'd like for you to figure it out, but I wonder why you aren't using the OpenMM C API? Looking at the OpenMM binaries on anaconda, there are multiple variations, one for several different versions of CUDA. Using those "official" binaries simplifies updates and a OpenMM installation tool could install the one that matched the version of CUDA that is installed on the system. And if the C API is used, then which C++ compiler was used to build OpenMM wouldn't matter.

That said, you should look at how devtoolset's libstdc++.so is implemented. libstdc++.so is acutally a loader script that layers additional functionality on top of the system's libstdc++.so:

/* GNU ld script
   Use the shared library, but some functions are only in
   the static library, so try that secondarily.  */
OUTPUT_FORMAT(elf64-x86-64)
INPUT ( /usr/lib64/libstdc++.so.6 -lstdc++_nonshared )

So the your analysis of the bug could be correct if the bug in the nonshared, i.e., static, part. But I would have expected Red Hat to handled std::string correctly in this case. Or maybe the bug is that ISOLDE is statically linking libstdc++?

comment:5 by Greg Couch, 6 years ago

Forgot to mention that ChimeraX only dynamically links against libstdc++.

in reply to:  6 ; comment:6 by Tristan Croll, 6 years ago

I did think about the C-API approach, but there are a few things I need to do that fairly unavoidably requires using the C++ objects. The other approach I *could* use rests on the fact that I don’t currently have any libraries that simultaneously link both OpenMM and ChimeraX. So I *could* technically precompile my OpenMM-requiring libraries using its matching compiler before moving on to the rest of the build... but that makes for a much less neat build system than the current one.

We’ve talked back and forth on this a few times, but in the long term it would probably best to have OpenMM in its own bundle to allow upgrading independent of the ChimeraX core. This exercise seems like a good step in that direction.
 

 


comment:7 by Tristan Croll, 6 years ago

Tracked down the cause: OpenMM uses runtime dynamic loading via dlopen to load its plugin libraries, and is calling it with the RTLD_GLOBAL flag (see https://stackoverflow.com/questions/11005881/static-library-loaded-twice). Once upon a time RTLD_GLOBAL was necessary to allow dynamic casting and passing of exceptions between the libraries, but that hasn't been the case since GCC 4.5 (https://github.com/aalexand/sharedlib_typeinfo). If I switch to the RTLD_LOCAL flag instead everything appears to work fine and the crash goes away. I've put this in as a pull request on the OpenMM GitHub - since the latest code requires at least GCC 5, it should be a no-brainer.

in reply to:  8 ; comment:8 by Tristan Croll, 6 years ago

I've just spent the last couple of hours using this build to work with a 
real case (a particularly nasty low-resolution crystal dataset) and 
everything seems perfectly stable.

On 2020-03-04 11:25, ChimeraX wrote:

in reply to:  9 ; comment:9 by Tristan Croll, 6 years ago

Peter Eastman on the OpenMM team just accepted my pull request. How 
about this: when OpenMM 7.5 is released, we use the official builds for 
Mac and Windows, and I can provide a devtoolset-7 build for Linux? My 
workstation is already CentOS 7 with all the prerequisites, and I can 
also put together a singularity recipe for later use.

On 2020-03-04 14:47, ChimeraX wrote:

comment:10 by Greg Couch, 6 years ago

Sounds good. I am still worried about how to handle multiple versions of CUDA.

in reply to:  11 ; comment:11 by Tristan Croll, 6 years ago

I wouldn’t worry too much about that. I only set CUDA as the default in Linux anyway. Yes, it’s slightly faster in simulation rate - but it requires the presence of a compiler and is a bit slower getting started compared to OpenCL. For Linux users who don’t have the correct CUDA version, it’ll silently fall back to OpenCL.
 

 


comment:12 by Tristan Croll, 6 years ago

A repeat make install in the ChimeraX top-level directory after git pull fails for me in vdocs:

/home/tic20/chimerax-git/chimerax/ChimeraX.app/bin/ChimeraX --nogui --silent --exit --script '_vdoc.py build'
user/index.html: already exists and is not a symlink
make[1]: *** [Makefile:8: install] Error 1
make[1]: Leaving directory '/home/tic20/chimerax-git/chimerax/vdocs'
make: *** [Makefile:38: install] Error 2

I can see the reason: _vdoc.generate_user_index() creates a real index.html in the code directory, which in subsequent runs trips up _vdoc.check_symlink() which expects everything in the directory to be a symlink.

comment:13 by Tristan Croll, 6 years ago

By the way: if you want to bump up to devtoolset-7 earlier for ChimeraX 0.93 just let me know - I can do the minimal required patch to the OpenMM 7.4 source and provide a matching build.

comment:14 by Tristan Croll, 5 years ago

I've had a bit more of a look at OpenMM's C API, and it turns out Greg's suggestion may well be possible. It'll take a bit of work (and care - I'm not particularly experienced in C, so will have to be extra careful with memory management), but I *should* be able to get there.

comment:15 by Tristan Croll, 3 years ago

Resolution: wontfix
Status: assignedclosed

Closing out this ancient ticket, since things have well and truly moved on.

Note: See TracTickets for help on using tickets.