Opened 8 months ago

Last modified 7 months ago

#16848 assigned enhancement

Update from numpy 1 to numpy 2.

Reported by: Tom Goddard Owned by: Tom Goddard
Priority: moderate Milestone:
Component: Core Version:
Keywords: Cc: chimerax-programmers
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

This ticket is to track whatever we find about incompatibilities between numpy 1 and numpy 2.

Fortunately all the C++ level numpy API use is I think in one place in the "arrays" bundle. Actually we should check if other parts of the C++ code directly use the numpy API. Then there is the separate issue of breaking numpy changes at the Python level. I remember reading the long numpy 2 changes document and being slightly scared. For instance numpy 2 changed the default casting of arrays in subtle ways like (numpy-float32-array) + 1.0 is a float64 array in numpy 1 but is a float32 array in numpy 2. Some of our C++ code only handles float32 or only float64 so this can lead to errors.

A very basic issue with numpy 1 vs numpy 2 that I wish I understood is how can you expect every single package you use that depends on numpy to use a single version (1 or 2)? Many PyPi packages pin the major version of numpy. So with an app like ChimeraX that imports dozens of numpy dependent PyPi packages and others that are pulled in by extensions (e.g. pyKVFinder) how can you possibly hope to avoid one package requiring numpy 1 and another requiring numpy 2? Since there have been 20 years of Python modules relying on numpy 1, it is hard to imagine any but the simplest numpy applications being able to use numpy 2 because all those legacy numpy 1 packages won't be updated to numpy 2. I must be missing something.

Attachments (1)

npy201.txt (41.6 KB ) - added by Zach Pearson 8 months ago.
Output of 'ruff check src --select NPY201' as of d4581f0b1

Download all attachments as: .zip

Change History (19)

comment:1 by Tom Goddard, 8 months ago

Before putting numpy 2 in a tech preview some careful study should be done to the changes from numpy 1 to numpy 2 since our C++ code heavily uses the numpy 1 C API. In December we put out a build I think just one that had numpy 2 and got bug reports such as #16520 where ribbons did not work correctly. Almost every ChimeraX C++ module I have written to optimize anything (e.g. ribbons) uses numpy 1 C API so we need to understand the numpy 2 API changes and compare against our numpy C++ use.

comment:2 by Tom Goddard, 8 months ago

Ticket #16558 involves an error installing PyKVFinder from PyPi when ChimeraX has numpy 2 installed because PyKVFinder uses a package nptyping which is incompatible with numpy 2.

Last edited 8 months ago by Tom Goddard (previous) (diff)

by Zach Pearson, 8 months ago

Attachment: npy201.txt added

Output of 'ruff check src --select NPY201' as of d4581f0b1

comment:4 by Zach Pearson, 8 months ago

When you whittle away all of the false-warning syntax errors in various files, what you're left with is this diff, at least for fixing what NPY201 picks up:

diff --git a/src/bundles/atomic/src/structure.py b/src/bundles/atomic/src/structure.py
index dc0d8f572..097356ac0 100644
--- a/src/bundles/atomic/src/structure.py
+++ b/src/bundles/atomic/src/structure.py
@@ -860,9 +860,9 @@ class Structure(Model, StructureData):
         else:
             r = atoms.residues
             rids = r.unique_ids
-            from numpy import unique, in1d
+            from numpy import unique, isin
             sel_rids = unique(rids[asel])
-            ares = in1d(rids, sel_rids)
+            ares = isin(rids, sel_rids)
             if ares.sum() > na:
                 # Promote to entire residues
                 level = 1005
@@ -870,7 +870,7 @@ class Structure(Model, StructureData):
             else:
                 ssids = r.secondary_structure_ids
                 sel_ssids = unique(ssids[asel])
-                ass = in1d(ssids, sel_ssids)
+                ass = isin(ssids, sel_ssids)
                 if ass.sum() > na:
                     # Promote to secondary structure
                     level = 1004
@@ -884,7 +884,7 @@ class Structure(Model, StructureData):
                         from numpy import array
                         cids = array(r.chain_ids)
                         sel_cids = unique(cids[asel])
-                        ac = in1d(cids, sel_cids)
+                        ac = isin(cids, sel_cids)
                         if ac.sum() > na:
                             # Promote to entire chains
                             level = 1002
@@ -1033,9 +1033,9 @@ class Structure(Model, StructureData):
         elif target_type == '/':
             # There is no "full_chain" property for atoms so we have
             # to do it the hard way
-            from numpy import in1d, invert
+            from numpy import isin, invert
             matched_chain_ids = atoms.filter(a).unique_chain_ids
-            mask = in1d(atoms.chain_ids, matched_chain_ids)
+            mask = isin(atoms.chain_ids, matched_chain_ids)
             if '<' in operator:
                 expand_by = atoms.filter(mask)
             else:
diff --git a/src/bundles/map_data/src/arrays.py b/src/bundles/map_data/src/arrays.py
index 218f9bc2a..57dbc97b7 100644
--- a/src/bundles/map_data/src/arrays.py
+++ b/src/bundles/map_data/src/arrays.py
@@ -245,7 +245,7 @@ def grid_indices(size, data_type):
   # the array contiguous.
   #
   shape = (size[2], size[1], size[0], 3)
-  from numpy import zeros, product, reshape
+  from numpy import zeros, prod, reshape
   indices = zeros(shape, data_type)
   for i in range(size[0]):
     indices[:,:,i,0] = i
@@ -253,7 +253,7 @@ def grid_indices(size, data_type):
     indices[:,j,:,1] = j
   for k in range(size[2]):
     indices[k,:,:,2] = k
-  volume = product(size)
+  volume = prod(size)
   indices = reshape(indices, (volume, 3))
   return indices

diff --git a/src/bundles/map_data/src/readarray.py b/src/bundles/map_data/src/readarray.py
index 9f980a0d1..82f52b9e0 100644
--- a/src/bundles/map_data/src/readarray.py
+++ b/src/bundles/map_data/src/readarray.py
@@ -227,10 +227,10 @@ def allocate_array(size, value_type = float32, step = None, progress = None,
 #
 def report_memory_error(size, value_type, progress):

-    from numpy import dtype, product, float64
+    from numpy import dtype, prod, float64
     vtype = dtype(value_type)
     tsize = vtype.itemsize
-    bytes = product(size, dtype=float64)*float(tsize)
+    bytes = prod(size, dtype=float64)*float(tsize)
     mbytes = bytes / 2**20
     sz = ','.join(['%d' % s for s in size])
     e = ('Could not allocate %.0f Mbyte array of size %s and type %s.\n'
diff --git a/src/bundles/std_commands/src/align.py b/src/bundles/std_commands/src/align.py
index 784ab4e2f..04afcafe0 100644
--- a/src/bundles/std_commands/src/align.py
+++ b/src/bundles/std_commands/src/align.py
@@ -306,10 +306,10 @@ def move_atoms(atoms, to_atoms, tf, move):
 def extend_to_chains(atoms):
     '''Return atoms extended to all atoms in same chains (ie same chain id / structure).'''
     catoms = []
-    from numpy import in1d
+    from numpy import isin
     for s, a in atoms.by_structure:
         satoms = s.atoms
-        catoms.append(satoms.filter(in1d(satoms.chain_ids, a.unique_chain_ids)))
+        catoms.append(satoms.filter(isin(satoms.chain_ids, a.unique_chain_ids)))
     from chimerax.atomic import concatenate, Atoms
     return concatenate(catoms, Atoms)

diff --git a/src/bundles/std_commands/src/sym.py b/src/bundles/std_commands/src/sym.py
index e92938c39..164327ce9 100644
--- a/src/bundles/std_commands/src/sym.py
+++ b/src/bundles/std_commands/src/sym.py
@@ -443,8 +443,8 @@ class Assembly:

     def _partition_atoms(self, atoms, chain_ids):
         mmcif_cids = mmcif_chain_ids(atoms, self.from_mmcif)
-        from numpy import in1d, logical_not
-        mask = in1d(mmcif_cids, chain_ids)
+        from numpy import isin, logical_not
+        mask = isin(mmcif_cids, chain_ids)
         included_atoms = atoms.filter(mask)
         logical_not(mask,mask)
         excluded_atoms = atoms.filter(mask)
Last edited 8 months ago by Zach Pearson (previous) (diff)

comment:5 by Zach Pearson, 8 months ago

FYI, I'm working on getting our workflows working for a Python 3.13 tech preview but I don't intend to release it yet since Python 3.13 forces numpy>2. I just want to get the workflows working with more than one kind of build before the next release window in June.

comment:7 by Tom Goddard, 7 months ago

Adding longlong numpy support to the ChimeraX C++ numpy handling is unlikely to solve ChimeraX numpy 2 problems and I'd suggest not merging that pull request. None of the ChimeraX C++ code will work with longlong. Our many C++ routines that take numpy integer arrays will likely give an error that it doesn't handle longlong, or it will cast to long which is very inefficient for speed critical calculations like ribbon geometry, or it might just crash if the needed checking is not present.

comment:8 by Zach Pearson, 7 months ago

I think the issue is that on other platforms, longs and long longs are the same size, but on Windows longs are 32 bit and long longs are 64 bit. I guess since Numpy 2.0's release notes say the default size on Windows is int64 (long long) now "like other platforms", the default used to be 32 bit (long).

parse_int_nm has this code:

if (v.value_type() == Numeric_Array::Long_Int && allow_copy)

{

IArray vi = IArray(v.dimension(), v.sizes());
vi.set(Reference_Counted_Array::Array<long int>(v));
v = Numeric_Array(Numeric_Array::Int, vi);

}

which to me just looks like it's saying you can treat a long array as a normal integer array. What would the difference be with long longs?

comment:9 by Tom Goddard, 7 months ago

The ChimeraX C++ uses numpy int32 for almost all integer arrays for memory efficiency since the arrays can be large and the limiting factor for the size of models. The ribbon code is almost certainly not intentionally using any int64 arrays and so adding long long 64-bit integer support is probably not the solution of the ribbon error seen with numpy 2. More likely is that numpy 2 casting rules made an int64 array and our code that needs fixing is the place in Python where an int32 array got unintentionally upcast to int64.

The purpose of not adding long long to the C++ numpy code is to avoid creating maintenance burden for features that are not used.

comment:10 by Tom Goddard, 7 months ago

Regarding the code you show in comment 8, that is copying an int64 array to a int32 array. Mostly our code does not do that and instead generates an error saying an int32 array is required so that we can maintain good performance without accidentally copying arrays. I'd have to look into that example to see why we made an exception for it. The exceptions tend to be related to supporting volume data with more than 2 billion voxels.

comment:11 by Zach Pearson, 7 months ago

I'm not saying it's intentionally using it. I'm saying that on macOS:

compile this and run it:

#include <iostream>

int main(void) {
  std::cout << sizeof(long) << std::endl;
  std::cout << sizeof(long long) << std::endl;
  return 0;
}
>> import numpy
>> numpy.array([1]).dtype
dtype('int64')

For numpy 1 or 2. Any call to array that hasn't specified the data type has always been 64 bit, except on Windows!

We can certainly fix this by specifying dtype=np.int32 for all calls to array in Python. But no one's complained about memory usage in recent memory!

comment:12 by Tom Goddard, 7 months ago

Very little code in ChimeraX should be using int64 arrays. Despite the difference in long on Windows being 32-bit and long on Mac/Linux being 64-bit, ChimeraX on all platforms tries to use the same integer size for instance by specifying "int32". If the ribbon code is using 32-bit on Windows and 64-bit on Mac/Linux for something that is likely a bug that should be fixed.

The memory use is important to the speed and maximum data size of some ChimeraX operations. This is especially true when operating on large volume data.

comment:13 by Zach Pearson, 7 months ago

From that perspective, does the C++ as written have a bug? If long int is 64 bit on macOS shouldn't this line:

vi.set(Reference_Counted_Array::Array<long int>(v));

be

vi.set(Reference_Counted_Array::Array<int>(v));

If the intent is to copy 64-bit v into (we hope) 32-bit vi?

comment:14 by Zach Pearson, 7 months ago

All uses of Long_Int:

bundles git:(develop) Ⲗ rg Long_Int
segment/_segment/pysegment.cpp
38:       a.value_type() == Numeric_Array::Unsigned_Long_Int))

arrays/src/include/arrays/rcarray.h
191:            Long_Int, Unsigned_Long_Int, Float, Double };
230:    case RCANA::Long_Int: f<long> args; break; \
231:    case RCANA::Unsigned_Long_Int: f<unsigned long> args; break; \

arrays/_arrays/pythonarray.cpp
110:      dtype = (sizeof(int) == sizeof(long) ? Numeric_Array::Int : Numeric_Array::Long_Int); break;
112:      dtype = (sizeof(int) == sizeof(long) ? Numeric_Array::Unsigned_Int : Numeric_Array::Unsigned_Long_Int);   break;
1312:  if (v.value_type() == Numeric_Array::Long_Int && allow_copy)

arrays/_arrays/rcarray.cpp
323:    case Long_Int:      return sizeof(long int);
324:    case Unsigned_Long_Int: return sizeof(unsigned long int);
466:    case Long_Int: return "long int";
467:    case Unsigned_Long_Int: return "unsigned long int";

arrays/_arrays/rcarray.h
191:            Long_Int, Unsigned_Long_Int, Float, Double };
230:    case RCANA::Long_Int: f<long> args; break; \
231:    case RCANA::Unsigned_Long_Int: f<unsigned long> args; break; \

comment:15 by Zach Pearson, 7 months ago

 bundles git:(develop) Ⲗ rg -w long -tc -tcpp -th > long.txt
 bundles git:(develop) Ⲗ cat long.txt
mlp/_mlp/mlp.cpp:  long xs0 = xyz.stride(0), xs1 = xyz.stride(1);
mlp/_mlp/mlp.cpp:  long ps0 = pot.stride(0), ps1 = pot.stride(1), ps2 = pot.stride(2);
mlp/_mlp/mlp.cpp:  long fs0 = fi.stride(0);
profile_grids/_pg/pg.cpp:       std::vector<long> longs;
chem_group/_cg/cg.cpp:initiate_find_group(CG_Condition* group_rep, std::vector<long>* group_principals,
chem_group/_cg/cg.cpp:                  std::map<long, const Atom*> ring_atom_map;
chem_group/_cg/cg.cpp:  std::vector<long>  group_principals;
alignment_headers/al2co/al2co.c:char *cvector(long nl, long nh);
alignment_headers/al2co/al2co.c:int *ivector(long nl, long nh);
alignment_headers/al2co/al2co.c:double *dvector(long nl, long nh);
alignment_headers/al2co/al2co.c:int **imatrix(long nrl, long nrh, long ncl, long nch);
alignment_headers/al2co/al2co.c:double **dmatrix(long nrl, long nrh, long ncl, long nch);
alignment_headers/al2co/al2co.c:double ***d3tensor(long nrl,long nrh,long ncl,long nch,long ndl,long ndh);
alignment_headers/al2co/al2co.c:int  **identity_imat(long n);
alignment_headers/al2co/al2co.c:else if (flag==2){flag=3;stri[++j]=c;if(j>10){nrerror("string too long:FATAL");exit(0);}continue;}
alignment_headers/al2co/al2co.c:int    **identity_imat(long n){
alignment_headers/al2co/al2co.c:char *cvector(long nl, long nh){
alignment_headers/al2co/al2co.c:int *ivector(long nl, long nh){
alignment_headers/al2co/al2co.c:long *lvector(long nl, long nh){
alignment_headers/al2co/al2co.c:long int *v;
alignment_headers/al2co/al2co.c:v=(long int *)malloc((size_t) ((nh-nl+1+NR_END)*sizeof(long int)));
alignment_headers/al2co/al2co.c:double *dvector(long nl, long nh){
alignment_headers/al2co/al2co.c:int **imatrix(long nrl, long nrh, long ncl, long nch){
alignment_headers/al2co/al2co.c:long i, nrow=nrh-nrl+1,ncol=nch-ncl+1;
alignment_headers/al2co/al2co.c:double **dmatrix(long nrl, long nrh, long ncl, long nch){
alignment_headers/al2co/al2co.c:long i, nrow=nrh-nrl+1,ncol=nch-ncl+1;
alignment_headers/al2co/al2co.c:double ***d3tensor(long nrl,long nrh,long ncl,long nch,long ndl,long ndh){
alignment_headers/al2co/al2co.c:long i,j,nrow=nrh-nrl+1,ncol=nch-ncl+1,ndep=ndh-ndl+1;
mmcif/mmcif_cpp/readcif_cpp/readcif.cpp:        long page_size = sysconf(_SC_PAGESIZE);
mmcif/mmcif_cpp/readcif_cpp/readcif.cpp:                // TODO: if ";\" then fold long lines
mmcif/mmcif_cpp/readcif_cpp/readcif.h:    long long iv = 0;
core/core_cpp/pysupport/convert.h:    auto pyobj = PyLong_FromLong(static_cast<long>(integer));
core/core_cpp/pysupport/convert.h:inline long pyint_to_clong(PyObject* pyint, const char* item_description) {
mmcif/mmcif_cpp/mmcif.cpp:    std::map <long, std::pair<Atom*, char>> atom_lookup;
mmcif/mmcif_cpp/mmcif.cpp:        long seq_id;
mmcif/mmcif_cpp/mmcif.cpp:        ResidueKey(const string& e, long n, const ResName& m):
mmcif/mmcif_cpp/mmcif.cpp:                ^ hash<long>()(k.seq_id)
mmcif/mmcif_cpp/mmcif.cpp:        long seq_id;
mmcif/mmcif_cpp/mmcif.cpp:        PolySeq(long s, const ResName& m, bool h):
mmcif/mmcif_cpp/mmcif.cpp:    Residue* find_residue(int model_num, const ChainID& chain_id, long position, const ResName& name);
mmcif/mmcif_cpp/mmcif.cpp:ExtractMolecule::find_residue(int model_num, const ChainID& chain_id, long position, const ResName& name) {
mmcif/mmcif_cpp/mmcif.cpp:    long position;                // label_seq_id
mmcif/mmcif_cpp/mmcif.cpp:    long auth_position = INT_MAX; // auth_seq_id
mmcif/mmcif_cpp/mmcif.cpp:    long serial_num = 0;          // id
mmcif/mmcif_cpp/mmcif.cpp:    long user_position;           // auth_position if given else position
mmcif/mmcif_cpp/mmcif.cpp:    long atom_serial = 0;
mmcif/mmcif_cpp/mmcif.cpp:    long serial_num = 0;          // id
mmcif/mmcif_cpp/mmcif.cpp:    long position1, position2;              // ptnr[12]_label_seq_id
mmcif/mmcif_cpp/mmcif.cpp:    long auth_position1 = INT_MAX,          // ptnr1_auth_seq_id
mmcif/mmcif_cpp/mmcif.cpp:    long position1, position2;              // (beg|end)_label_seq_id
mmcif/mmcif_cpp/mmcif.cpp:    long position1, position2;              // (beg|end)_label_seq_id
mmcif/mmcif_cpp/mmcif.cpp:    long position1, position2;              // range_[12]_label_seq_id
mmcif/mmcif_cpp/mmcif.cpp:    long seq_id = 0;
mmcif/mmcif_cpp/readcif_cpp/counts.cpp:    unsigned long num_rows = 0;
core/core_cpp/mac_util/enablemultitouch_mac.h:extern "C" int enable_mac_multitouch(long window_id);
morph/_morph/morph.cpp:  long n = i.size();
morph/_morph/morph.cpp:  for (long j = 0 ; j < n ; j += 4)
morph/_morph/morph.cpp:  long n = i.size();
morph/_morph/morph.cpp:  for (long j = 0 ; j < n ; ++j)
morph/_morph/morph.cpp:  long n = i.size();
morph/_morph/morph.cpp:  for (long j = 0 ; j < n ; ++j)
core/core_cpp/mac_util/enablemultitouch.cpp:  long win_id;
core/core_cpp/chutil/CString.h:        error_msg << "\" too long, maximum ";
mmcif/mmcif_cpp/readcif_cpp/cifgrep/getopt.h:struct option              /* specification for a long form option...      */
mmcif/mmcif_cpp/readcif_cpp/cifgrep/getopt.h: * ...for the long form API only; keep this for compatibility.
mmcif/mmcif_cpp/readcif_cpp/cifgrep/getopt.h:{ /* When attempting to match a command line argument to a long form option,
mmcif/mmcif_cpp/readcif_cpp/cifgrep/getopt.h:   * messages, for invalid arguments to long form options ...
mmcif/mmcif_cpp/readcif_cpp/cifgrep/getopt.h:       * only display one hyphen, for implicit long form options,
mmcif/mmcif_cpp/readcif_cpp/cifgrep/getopt.h:   * long form options.
mmcif/mmcif_cpp/readcif_cpp/cifgrep/getopt.h:   * on resolution of a long form option.
mmcif/mmcif_cpp/readcif_cpp/cifgrep/getopt.h:   * ambiguously matched long form option string is valid as
mmcif/mmcif_cpp/readcif_cpp/cifgrep/getopt.h:     /* and it looks like a long format option ...
mmcif/mmcif_cpp/readcif_cpp/cifgrep/getopt.h:      * it to force explicit handling as a long format option.
mmcif/mmcif_cpp/readcif_cpp/cifgrep/getopt.h:   /* it's not an explicit long option, and `getopt_long_only' isn't active,
mmcif/mmcif_cpp/readcif_cpp/cifgrep/getopt.h:   /* the current argument is a long form option, (either explicitly,
mmcif/mmcif_cpp/readcif_cpp/cifgrep/getopt.h:     /* scan the list of defined long form options ...
mmcif/mmcif_cpp/readcif_cpp/cifgrep/getopt.h:            * partial long option isn't valid for short option
mmcif/mmcif_cpp/readcif_cpp/cifgrep/getopt.h:   /* if here, then we had what SHOULD have been a long form option,
mmcif/mmcif_cpp/readcif_cpp/cifgrep/getopt.h:       * as long, nor implicitly matched as such, and the argument isn't
segment/_segment/segsurf.cpp:  // long t = (long)std::chrono::duration_cast<std::chrono::nanoseconds>(end-begin).count();
atomic_lib/atomic_cpp/pyinstance_cpp/PythonInstance.declare.h:    long  get_py_int_attr(const char* attr_name, bool create=false) const;
atomic_lib/atomic_cpp/pyinstance_cpp/PythonInstance.declare.h:    long  get_py_int_attr(std::string& attr_name, bool create=false) const;
segment/_segment/pysegment.cpp:      (sizeof(unsigned long) == sizeof(unsigned int) &&
atomic_lib/atomic_cpp/pyinstance_cpp/PythonInstance.instantiate.h:long PythonInstance<C>::get_py_int_attr(std::string& attr_name, bool create) const
atomic_lib/atomic_cpp/pyinstance_cpp/PythonInstance.instantiate.h:long
atomic_lib/atomic_cpp/element_cpp/Element.h:    long  hash() const { return number(); }
atomic_lib/atomic_cpp/atomstruct_cpp/seq_assoc.cpp:            // long "covalent" bonds across gaps, some of the
atomic_lib/atomic_cpp/atomstruct_cpp/seq_assoc.cpp:                        //   between a normal-numbered segment and a long
atomic/atomic_cpp/molc.cpp:        PyTuple_SET_ITEM(tuple, 1, PyLong_FromLong(static_cast<long>(arv.num_errors)));
atomic/atomic_cpp/molc.cpp:            PyObject *ptype = PyLong_FromLong((long)pt);
atomic_lib/atomic_cpp/atomstruct_cpp/Ring.cpp:long
atomic_lib/atomic_cpp/atomstruct_cpp/Ring.cpp:    long value = 0;
atomic_lib/atomic_cpp/atomstruct_cpp/Ring.cpp:        long v = reinterpret_cast<long>(b);
atomic_lib/atomic_cpp/atomstruct_cpp/ChangeTracker.h:    long  num_deleted = 0;
atomic_lib/atomic_cpp/atomstruct_cpp/AtomTypes.cpp:            // takes too long to do a massive fused-ring system;
surface/_surface/triangulate.cpp:// technique that generates long skinny triangles.
surface/_surface/triangulate.cpp:  int k = static_cast<int>(reinterpret_cast<long>(vertex_data));
atomic_lib/atomic_cpp/atomstruct_cpp/Structure.cpp:                    long ci = ii + i2c_offset;
atomic_lib/atomic_cpp/atomstruct_cpp/Structure.cpp:                    if (ci >= (long)complete_size) {
atomic_lib/atomic_cpp/atomstruct_cpp/AtomicStructure.cpp:            // the estimated length can be too long...
atomic_lib/atomic_cpp/atomstruct_cpp/AtomicStructure.cpp:            // a _lot_ of errors can make it take a very long time to find the
atomic_lib/atomic_cpp/atomstruct_cpp/Ring.h:    long  hash() const;
surface/_surface/refinemesh.cpp:// edge.  The algorithm divides long edges, collapses short edges, and swaps
surface/_surface/refinemesh.cpp:  //  std::cerr << "split long edges\n";
pdb/pdbio_cpp/PDBio.cpp:    std::map<const Residue*, long> res_order;
pdb/pdbio_cpp/PDBio.cpp:    long i = 0;
alphafold/src/kmer_search/make_kmer_index.c:long num_kmers(int k);
alphafold/src/kmer_search/make_kmer_index.c:long kmer_counts(FILE *seqs_file, int k, int unique, long kmin, long kmax, unsigned int *counts);
alphafold/src/kmer_search/make_kmer_index.c:unsigned short *sequence_entry_sizes(FILE *seqs_file, long nseq);
alphafold/src/kmer_search/make_kmer_index.c:                       int unique, long kmin, long kmax, unsigned int *seqi, long seqi_len);
alphafold/src/kmer_search/make_kmer_index.c:void create_index_files(const char *sequences_path, int k, int unique, long max_memory)
alphafold/src/kmer_search/make_kmer_index.c:  long nk = num_kmers(k);
alphafold/src/kmer_search/make_kmer_index.c:  long kmin = 0, kmax = nk;
alphafold/src/kmer_search/make_kmer_index.c:  long num_sequences = kmer_counts(seqs_file, k, unique, kmin, kmax, counts);
alphafold/src/kmer_search/make_kmer_index.c:      long ni = 0;
alphafold/src/kmer_search/make_kmer_index.c:      for (long i = 0 ; i < nk  ; ++i)
alphafold/src/kmer_search/make_kmer_index.c:  long max_seqi = max_memory / sizeof(unsigned int);
alphafold/src/kmer_search/make_kmer_index.c:      long ni = 0;
alphafold/src/kmer_search/make_kmer_index.c:unsigned short *sequence_entry_sizes(FILE *seqs_file, long nseq)
alphafold/src/kmer_search/make_kmer_index.c:  for (long i = 0 ; i < nseq ; ++i)
alphafold/src/kmer_search/make_kmer_index.c:long kmer_counts(FILE *seqs_file, int k, int unique, long kmin, long kmax, unsigned int *counts)
alphafold/src/kmer_search/make_kmer_index.c:  long nseq = 0;
alphafold/src/kmer_search/make_kmer_index.c:                       int unique, long kmin, long kmax, unsigned int *seqi, long seqi_len)
alphafold/src/kmer_search/make_kmer_index.c:  long noff = kmax - kmin;
alphafold/src/kmer_search/make_kmer_index.c:  unsigned long *offsets = (unsigned long *) malloc(noff * sizeof(unsigned long));
alphafold/src/kmer_search/make_kmer_index.c:  unsigned long csum = 0;
alphafold/src/kmer_search/make_kmer_index.c:  for (long i = 0 ; i < noff ; ++i)
alphafold/src/kmer_search/make_kmer_index.c:  for (long i = 0 ; i < noff ; ++i)
alphafold/src/kmer_search/make_kmer_index.c:  long nseq = 0;
alphafold/src/kmer_search/make_kmer_index.c:      unsigned long o = offsets[kii] + counts[kii];
alphafold/src/kmer_search/make_kmer_index.c:long num_kmers(int k)
alphafold/src/kmer_search/make_kmer_index.c:  long nk = 1;
alphafold/src/kmer_search/make_kmer_index.c:      long nk = num_kmers(k);
alphafold/src/kmer_search/make_kmer_index.c:      long nk = num_kmers(k);
arrays/_arrays/rcarray.h:    case RCANA::Long_Int: f<long> args; break; \
arrays/_arrays/rcarray.h:    case RCANA::Unsigned_Long_Int: f<unsigned long> args; break; \
arrays/_arrays/rcarray.cpp:    case Long_Int:      return sizeof(long int);
arrays/_arrays/rcarray.cpp:    case Unsigned_Long_Int: return sizeof(unsigned long int);
arrays/_arrays/rcarray.cpp:    case Long_Int: return "long int";
arrays/_arrays/rcarray.cpp:    case Unsigned_Long_Int: return "unsigned long int";
arrays/_arrays/pythonarray.cpp:      // Make 32-bit integers always be int instead of long on 32-bit machines.
arrays/_arrays/pythonarray.cpp:      dtype = (sizeof(int) == sizeof(long) ? Numeric_Array::Int : Numeric_Array::Long_Int); break;
arrays/_arrays/pythonarray.cpp:      dtype = (sizeof(int) == sizeof(long) ? Numeric_Array::Unsigned_Int : Numeric_Array::Unsigned_Long_Int);   break;
arrays/_arrays/pythonarray.cpp:    case NPY_LONG: name = "long"; break;
arrays/_arrays/pythonarray.cpp:// Returns an array of long which will be 32-bit on Windows,
arrays/_arrays/pythonarray.cpp:    long *py_data = (long *)PyArray_DATA((PyArrayObject *)a);
arrays/_arrays/pythonarray.cpp:      vi.set(Reference_Counted_Array::Array<long int>(v));
arrays/_arrays/pythonarray.cpp:      PyErr_Format(PyExc_TypeError, "array type must be int or long int, got %s",
arrays/_arrays/pythonarray.h:// single void * value from Python long
pdb_lib/connect_cpp/connect.cpp:static std::map<Element::AS, unsigned long>  _saturationMap = {
pdb_lib/connect_cpp/connect.cpp:        // turn long inter-residue bonds into "missing structure" pseudobonds

comment:16 by Zach Pearson, 7 months ago

I wrote a test program to confirm the sizes of various data types on my Mac:

#include <iostream>
#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
#include <numpy/arrayobject.h>

int main(int argc, char *argv[]) {
  std::cout << "int: " << sizeof(int) << std::endl;
  std::cout << "long: " << sizeof(long) << std::endl;
  std::cout << "long long: " << sizeof(long long) << std::endl;
  std::cout << "npy_int: " << sizeof(npy_int) << std::endl;
  std::cout << "npy_long: " << sizeof(npy_long) << std::endl;
  std::cout << "npy_longlong: " << sizeof(npy_longlong) << std::endl;
  std::cout << "npy_int64: " << sizeof(npy_int64) << std::endl;
  return 0;
}

Which I then compiled: g++ -IChimeraX.app/Contents/lib/python3.11/site-packages/numpy/core/include -IChimeraX.app/Contents/include/python3.11 test.cpp

>> ./a.out
int: 4
long: 8
long long: 8
npy_int: 4
npy_long: 8
npy_longlong: 8
npy_int64: 8

The same on Windows gives:

>> ./a.out
int: 4
long: 4
long long: 8
npy_int: 4
npy_long: 4
npy_longlong: 8
npy_int64: 8
Last edited 7 months ago by Zach Pearson (previous) (diff)

comment:17 by Zach Pearson, 7 months ago

I was able to fix the bounding box traceback without the arrays patch, by setting the data type of the vertex and triangle arrays to int32. But I still got a traceback in the same place because the edge mask was left as a bare Python list. And I noticed I got a traceback for orthoplanes too, but not tilted slab. Tilted slab relies on some code from the surface bundle which hands it back correctly typed arrays, but the orthoplanes and and bounding box outlines don't -- they hand back bare lists. I plan to make the other two methods return correctly typed numpy arrays tomorrow, but I'm finally at a point where I can put this problem down for the night and not have it gnaw at me (I nerd sniped myself).

comment:18 by Zach Pearson, 7 months ago

I tracked the ribbon error down to the display_ranges. It's a bare Python list that gets passed to some C++ code in ribbon_cpp, which then passes it to the array parsing code in arrays/pythonarray.cpp (parse_int_n2_array is the specific call, on line 1154 of xsection.cpp). parse_int_n2_array calls parse_int_nm_array which calls array_from_python which calls PyArray_TYPE(a) on the python list. It must think, what would the data type be if I converted this Python list to a numpy array? Of course, the new default data type is NPY_LONGLONG on Windows, so it crashes out because we don't recognize it.

We could add

import numpy, int32
ranges = array(ranges, dtype=int32)

to _ribbon_geometry in ribbon.py or make the display_range a numpy array in the first place instead of a Python array.

Note: See TracTickets for help on using tickets.