Opened 5 years ago

Closed 5 years ago

#3354 closed defect (fixed)

Test update of bundles with binary (e.g. C++) components

Reported by: Scooter Morris Owned by: pett
Priority: critical Milestone: 1.2
Component: Tool Shed Version:
Keywords: Cc: chimera-programmers, Tristan Croll
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

Atomic is a good example

Change History (16)

comment:1 by pett, 5 years ago

Component: UnassignedTool Shed

comment:2 by pett, 5 years ago

Cc: chimera-programmers added
Owner: changed from Conrad Huang to pett

The proposal is to split such modules into two modules so that they can have separate API version numbers. Most other bundles will depend on the Python-half API. A few will depend on both or the binary-half API.

comment:3 by pett, 5 years ago

Cc: Tristan Croll added

comment:4 by pett, 5 years ago

An alternative from ticket #1321 is that for bundles with CLibrary tags in their bundle_info.xml files, an additional "fake" bundle is created for each such tag. The fake bundle name will be true_bundle_name-CLibrary_name with a default version of 1.0.0. A different version can be specified with the "version" attribute of the CLibrary tag.

This seems like it would be complicated to handle in the Toolshed.

comment:5 by Tom Goddard, 5 years ago

We have lots bundles that include C++ that are mostly for optimization and are private like map._map, geometry._geometry, stl._stl.... But we have almost no C++ bundle code that is linked by C++ code in other bundles. I know ISOLDE does this to access ChimeraX atomic C++ data structures. Are there other cases? There is one very common case -- all of my C++ bundle code uses numpy arrays as input and that is converted to C++ using the "arrays" C++ bundle by directly linking to the C++ libarrays.

If I understand it only the C++ code that is linked by other bundles would need to be separated out into its own bundle as suggested in comment 2. The arrays bundle is already just the C++. The atomic data structures C++ would need to be put in a separate bundle Are there any other cases requiring a separate bundle?

If arrays or the atomic data structures C++ libraries change in an incompatible way then many other bundles (all of my C++ which links to libarrays) would need to be updated. For the arrays bundle I pretty often add new parsing methods for numpy array arguments to its API (requiring a minor version bump). Those do not break API compatibility. So all the dependent bundles would not need updating. But any change to the C++ array class would need a major version increase to arrays requiring all of my other C++ bundles to be updated.

comment:6 by Tom Goddard, 5 years ago

Another aspect of the binary bundle problem is that most of these are python modules and depend on the Python version. If a user installs a third-party binary bundle in ChimeraX 1.0 based on Python 3.7 and then updates to ChimeraX 1.1 which uses Python 3.8, then that bundle will no longer be compatible and must be updated. Maybe the solution for that is binary bundles depend on ChimeraX 1.0 and are not compatible with 1.1.

If we release updates to standard bundles that have C++ code that uses the Python API they also would have to depend on 1.0 and the dependency on core should say it is not compatible with 1.1.

comment:7 by pett, 5 years ago

This split is only for bundles that provide both a Python interface and a direct public C++ interface (i.e. there is an "installedIncludeDir" attribute in their bundle_info.xml). By my count that is exactly 4 bundles: arrays, atomic, atom_search, and pdb. It's not for bundles with private C++ optimization code.

comment:8 by pett, 5 years ago

As for the Python dependency, one possibility is to ship a fake "Python" bundle whose only purpose is to declare an API version number that binary bundles can declare a dependency on. That way, if consecutive versions of ChimeraX use the same Python version, binary bundles will know they are compatible with both.

comment:9 by pett, 5 years ago

I'm not really sure how the Python half should declare it's dependency on the version number of the binary half, since there are certain major version number changes in the binary half (e.g. data layout changes) that it may not care about at all.

comment:10 by Scooter Morris, 5 years ago

Milestone: 1.11.2
Priority: blockercritical

comment:11 by pett, 5 years ago

Transferring our email discussion into this ticket so that is preserved and is findable...

Greg:
For bundles that compile C++ against C++ code in other bundles, I propose that we modify semantic versioning a tiny bit from Major.Minor.Bugfix, and use Major.C++API.PythonAPI.Bugfix. That way, bundles, that depend on a particular C++API, would use a "~=major.C++API.0" dependency to be incompatible with a C++API + 1 release, but would be compatible with a PythonAPI + 1 release.

Should the PythonAPI monotonically increase? That is, not reset to zero when the C++API increases? Unfortunately, a "~=Major.*.PythonAPI" dependency does not work to just depend on the PythonAPI version, so that's a developer's choice.

With the above proposal, all of the bundles that have build dependencies on other bundles, would have that dependency updated to the new form. For example, a dependency on Atomic would become "~=1.0.0".

Eric:
I don’t know that this addresses enough of the issues to be worth the confusion it will engender or the effort involved. Any C++ change that affects the data layout (new instance vars, new virtual methods) will force you to change the major version number of your package if you are seriously using the version number to track C++ compatibility. The proposal makes it impossible for bundles that care only about the Python API to ignore C++ API changes.

We really need two versions numbers associated with a package to correctly handle this.

Eric:
The obvious corollary is that perhaps bundles that offer a public C++ API need to be split into two bundles.

Tom:
You seem to have hit on the simple solution, have separate C++ and Python bundles. This is what we did in Chimera. It is a bit ugly since usually the C++ code is just an optimization and conceptually belongs in the same bundle as the Python code. But I think it is practical and probably a good tradeoff.

Eric:

Given that we've decided that 1.2 is sort of our "infrastructure release", I'd like to make some progress on handling binary bundles better. This has been on pause because Conrad had a tentative alternative proposal that I don't think is actually workable but maybe I wasn't grasping all the details, so I wanted to hear more from Conrad on it. But with his status so nebulous now, maybe we should proceed anyway?
My proposal, anyway, was to split binary bundles into two, so that each could have their own version numbers, and other bundles that actually need to depend on the binary side can declare reasonable dependencies. A follow-on idea is to possibly have a fake "Python" bundle so that the core major version number doesn't have to be increased when we go to a new Python, and Python-lib-creating bundles could declare dependencies on the Python bundle version.
Conrad's idea was to somehow have multiple API version numbers within the same bundle, which I don't think is workable in the Toolshed/pip's installation system, but like I said maybe there's a trick I missed.
So, two questions. 1) Wait for Conrad? 2) Carry the discussion forward in email or devote a (probably substantial) part of a future Zoom group meeting to it?

Scooter:

Well, nebulous or not, it's clear that when Conrad is going to be able to come back, it will only be for one day/week, and given the time gap, there are going to be a lot of things that will probably require his more immediate attention. Furthermore, you all are going to have to be the ones to maintain it. So, I would vote for #2 -- go ahead and carry the discussion forward in e-mail and when there seems to be a solid proposal, we'll devote a future Zoom group meeting to it. Does that make sense to everyone else?

Tom:

Is atomic the only bundle that would need its C++ library put into a separate bundle? The only other bundle C++ library I know of that other bundles compile against is arrays (C++ / numpy array support) and that is C++ with no Python so nothing to split. If it is just one bundle, atomic that is effected I don't think we need a long deliberation about it and your plan of splitting it sounds good.

When we go from Python 3.7 to 3.8 I do not think we need a major version change of core, nor do we need bundles that compile against C Python to list a dependency on the python version because the Python 3.7 or 3.8 dependency is in the wheel name, for example

ChimeraX_Map-1.0.1-cp37-cp37m-macosx_10_15_x86_64.whl

I assume that pip install enforces that this CPython 3.7 bundle will not be installed in a CPython 3.8.

Eric:

In addition to arrays and atomic, atom_search and pdb make C libraries available to other bundles. So three bundles would need splitting.
Your point about the Python version being embedded in the wheel name is a good one. So I think that resolves that issue.
I was soliciting feedback about carrying forward the discussion because there are still issues with the proposal that need resolution. Here are the ones that occur to me:

1) Naming scheme for the C++ bundle.

We should probably have a consistent naming scheme for the C++ half of bundles. Hopefully something that doesn't totally confuse regular users when they see the bundle in the Toolshed or the Updates interface. Is the '+' character legal in a bundle name? If so, then ChimeraX-C++RegularBundleName might work. If not, I'm not sure what name would be clear enough. Maybe ChimeraX-RegularBundleNameLibrary?

2) Dependency between the Python and C++ halves of the bundle.

Declaring the version dependency of the Python half on the C++ half might be tricky because the C++ half can change major version numbers without actually affecting compatibility with the Python half (e.g. a class's data layout changes, but it's external API doesn't). Let's say the C++ half is at 2.1 and the Python half declares its dependency as ~=2.1. Then the C++ half goes to 3.0 but is still 2.1 compatible at the API level. Is there a way to declare the dependency that is basically ~=2.1 or ~=3.0? As far as I know, the ',' in a dependency declaration is and.

Tom:

I think ChimeraX-AtomicLibrary is a better bundle name than ChimeraX-C++Atomic.

I don't understand your dependency example. If AtomicLibrary goes from version 2.1 to 3.0 because of a class layout change with no function API changes then an Atomic wheel which is compiled against AtomicLibrary has to depend on 3.0 and will not work with 2.1. And an older Atomic wheel that depended on 2.1 will not work with the 3.0 version. This is a bit weird since the Atomic code didn't change at all, but one version of the wheel requires AtomicLibrary 2.1 and another version requires 3.0. This is like the dependency on Python 3.7 or 3.8 we just discussed. It is not dependency on the function API, instead it is a statement about how the Atomic wheel was built. Atomic could be compiled against Python 3.7 or 3.8 and against AtomicLibrary 2.1 or 3.0, giving different Atomic wheels from the exact same Atomic code. The Python version issue is handle by putting that version in the wheel name.

I guess when AtomicLibrary goes from 2.1 to 3.0, then the Atomic dependency on AtomicLibrary should change from ~=2.0 to ~=3.0, and the Atomic bundle gets a patch version number bump 1.6.1 to 1.6.2. The older Atomic 1.6.1 wheel only works with AtomicLibrary 2 and the newer Atomic 1.6.2 only works with AtomicLibrary 3.

Eric:

I think ChimeraX-AtomicLibrary is a better bundle name than ChimeraX-C++Atomic.

I agree. Mark that one off.

I'm assuming the Python half is pure Python, i.e. the wrapping also occurs in the C++ half, so the Python side is just using a regular "import" statement to access the wrapped types and functions and making some of them available directly or indirectly in its own public API. The Python half never links against the C++ library.

So for instance in the atomic module, everything in currently in atomic/src goes into the "Python half" and everything in atomic_cpp goes in the "C++ half", which therefore includes cymol.so, libmolc.dylib, _ribbons.so, etc.

Tom:
Ok, I was envisioning AtomicLibrary only contained libatomstruct.dylib and that the derived C++ (cymol.so, libmolc.dylib, _ribbon.so) would stay in Atomic.

So substitute in my previous email example ISOLDE and AtomicLibrary instead of Atomic and AtomicLibrary. ISOLDE will link against libatomstruct.dylib. The purpose of separating the library into its own bundle is to handle that situation where one bundle does link against the C++ in another bundle. So ISOLDE 1.6.1 might link against AtomicLibrary 2.1 and ISOLDE 1.6.2 might link against AtomicLibrary 3, and those ISOLDE wheels would have to have dependencies only on AtomicLibrary 2 in the first case and only on AtomicLibrary 3 in the second case.

Eric:
Continuing to use Atomic as an example, there is no issue with non-Atomic bundles linking against AtomicLibrary, regardless of whether the wrapping occurs in Atomic or AtomicLibrary. Non-Atomic bundles will always depend on AtomicLibrary and will never be compatible across major version number changes. If the wrapping occurs in the Library side, then Atomic can be compatible across some major version number changes. So the question is which side should the wrapping occur on? Here's how I see it:

Wrapping on Library side

This make Atomic pure Python, and possibly installable into a greater number of versions of ChimeraX, since it's compatible with more versions of AtomicLibrary. However, declaring the dependency compatibility across major version numbers seems to be a conundrum.

Wrapping on non-Library side

Atomic becomes link-dependent on AtomicLibrary and therefore not as widely compatible with various versions of ChimeraX. The dependency declaration is straightforward.

My considered opinion is that we go with wrapping on the non-library side, since the dependency declaration really seems to be impossible, and the increased compatibility, while nice, is not a big enough win to justify trying to solve the dependency issue. People will just have to upgrade more bundles or get newer builds.

Tom:
Sounds reasonable.

comment:12 by pett, 5 years ago

Tristan,

Do you make use of the file ctypes_pyinst.h? It is used in wrapping, and so wouldn't normally be put in AtomicLibrary, but if it doesn't live there then Atomic would need to provide an "include" directory...

--Eric

in reply to:  13 ; comment:13 by Tristan Croll, 5 years ago

Nope - I don't directly #include that anywhere.
________________________________
From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
Sent: 18 September 2020 17:59
Cc: chimera-programmers@cgl.ucsf.edu <chimera-programmers@cgl.ucsf.edu>; pett@cgl.ucsf.edu <pett@cgl.ucsf.edu>; scooter@cgl.ucsf.edu <scooter@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk>
Subject: Re: [ChimeraX] #3354: Test update of bundles with binary (e.g. C++) components

#3354: Test update of bundles with binary (e.g. C++) components
-------------------------------------+----------------------------
          Reporter:  Scooter Morris  |      Owner:  Eric Pettersen
              Type:  defect          |     Status:  assigned
          Priority:  critical        |  Milestone:  1.2
         Component:  Tool Shed       |    Version:
        Resolution:                  |   Keywords:
        Blocked By:                  |   Blocking:
Notify when closed:                  |   Platform:  all
           Project:  ChimeraX        |
-------------------------------------+----------------------------

Comment (by Eric Pettersen):

 Tristan,
         Do you make use of the file ctypes_pyinst.h?  It is used in
 wrapping, and so wouldn't normally be put in AtomicLibrary, but if it
 doesn't live there then Atomic would need to provide an "include"
 directory...

 --Eric

--
Ticket URL: <https://plato.cgl.ucsf.edu/trac/ChimeraX/ticket/3354#comment:12>
ChimeraX <http://www.rbvi.ucsf.edu/chimerax/>
ChimeraX Issue Tracker

comment:14 by pett, 5 years ago

Okay, I've split Atomic into Atomic and AtomicLibrary. Buckle your seatbelts!

Still have as yet to do atom_search and pdb.

comment:15 by pett, 5 years ago

Okay, also split off atom_search_lib from atom_search. Only pdb to go.

comment:16 by pett, 5 years ago

Resolution: fixed
Status: assignedclosed

Okay, split pdb into pdb_lib and pdb. That's all of them.

Note: See TracTickets for help on using tickets.