Opened 8 months ago
Last modified 7 months ago
#16848 assigned enhancement
Update from numpy 1 to numpy 2.
Reported by: | Tom Goddard | Owned by: | Tom Goddard |
---|---|---|---|
Priority: | moderate | Milestone: | |
Component: | Core | Version: | |
Keywords: | Cc: | chimerax-programmers | |
Blocked By: | Blocking: | ||
Notify when closed: | Platform: | all | |
Project: | ChimeraX |
Description
This ticket is to track whatever we find about incompatibilities between numpy 1 and numpy 2.
Fortunately all the C++ level numpy API use is I think in one place in the "arrays" bundle. Actually we should check if other parts of the C++ code directly use the numpy API. Then there is the separate issue of breaking numpy changes at the Python level. I remember reading the long numpy 2 changes document and being slightly scared. For instance numpy 2 changed the default casting of arrays in subtle ways like (numpy-float32-array) + 1.0 is a float64 array in numpy 1 but is a float32 array in numpy 2. Some of our C++ code only handles float32 or only float64 so this can lead to errors.
A very basic issue with numpy 1 vs numpy 2 that I wish I understood is how can you expect every single package you use that depends on numpy to use a single version (1 or 2)? Many PyPi packages pin the major version of numpy. So with an app like ChimeraX that imports dozens of numpy dependent PyPi packages and others that are pulled in by extensions (e.g. pyKVFinder) how can you possibly hope to avoid one package requiring numpy 1 and another requiring numpy 2? Since there have been 20 years of Python modules relying on numpy 1, it is hard to imagine any but the simplest numpy applications being able to use numpy 2 because all those legacy numpy 1 packages won't be updated to numpy 2. I must be missing something.
Attachments (1)
Change History (19)
comment:1 by , 8 months ago
comment:2 by , 8 months ago
comment:3 by , 8 months ago
Numpy 2 migration guide
https://numpy.org/doc/stable/numpy_2_0_migration_guide.html
Numpy C ABI compatibility
https://numpy.org/doc/stable/dev/depending_on_numpy.html#numpy-2-abi-handling
by , 8 months ago
Attachment: | npy201.txt added |
---|
Output of 'ruff check src --select NPY201' as of d4581f0b1
comment:4 by , 8 months ago
When you whittle away all of the false-warning syntax errors in various files, what you're left with is this diff, at least for fixing what NPY201 picks up:
diff --git a/src/bundles/atomic/src/structure.py b/src/bundles/atomic/src/structure.py index dc0d8f572..097356ac0 100644 --- a/src/bundles/atomic/src/structure.py +++ b/src/bundles/atomic/src/structure.py @@ -860,9 +860,9 @@ class Structure(Model, StructureData): else: r = atoms.residues rids = r.unique_ids - from numpy import unique, in1d + from numpy import unique, isin sel_rids = unique(rids[asel]) - ares = in1d(rids, sel_rids) + ares = isin(rids, sel_rids) if ares.sum() > na: # Promote to entire residues level = 1005 @@ -870,7 +870,7 @@ class Structure(Model, StructureData): else: ssids = r.secondary_structure_ids sel_ssids = unique(ssids[asel]) - ass = in1d(ssids, sel_ssids) + ass = isin(ssids, sel_ssids) if ass.sum() > na: # Promote to secondary structure level = 1004 @@ -884,7 +884,7 @@ class Structure(Model, StructureData): from numpy import array cids = array(r.chain_ids) sel_cids = unique(cids[asel]) - ac = in1d(cids, sel_cids) + ac = isin(cids, sel_cids) if ac.sum() > na: # Promote to entire chains level = 1002 @@ -1033,9 +1033,9 @@ class Structure(Model, StructureData): elif target_type == '/': # There is no "full_chain" property for atoms so we have # to do it the hard way - from numpy import in1d, invert + from numpy import isin, invert matched_chain_ids = atoms.filter(a).unique_chain_ids - mask = in1d(atoms.chain_ids, matched_chain_ids) + mask = isin(atoms.chain_ids, matched_chain_ids) if '<' in operator: expand_by = atoms.filter(mask) else: diff --git a/src/bundles/map_data/src/arrays.py b/src/bundles/map_data/src/arrays.py index 218f9bc2a..57dbc97b7 100644 --- a/src/bundles/map_data/src/arrays.py +++ b/src/bundles/map_data/src/arrays.py @@ -245,7 +245,7 @@ def grid_indices(size, data_type): # the array contiguous. # shape = (size[2], size[1], size[0], 3) - from numpy import zeros, product, reshape + from numpy import zeros, prod, reshape indices = zeros(shape, data_type) for i in range(size[0]): indices[:,:,i,0] = i @@ -253,7 +253,7 @@ def grid_indices(size, data_type): indices[:,j,:,1] = j for k in range(size[2]): indices[k,:,:,2] = k - volume = product(size) + volume = prod(size) indices = reshape(indices, (volume, 3)) return indices diff --git a/src/bundles/map_data/src/readarray.py b/src/bundles/map_data/src/readarray.py index 9f980a0d1..82f52b9e0 100644 --- a/src/bundles/map_data/src/readarray.py +++ b/src/bundles/map_data/src/readarray.py @@ -227,10 +227,10 @@ def allocate_array(size, value_type = float32, step = None, progress = None, # def report_memory_error(size, value_type, progress): - from numpy import dtype, product, float64 + from numpy import dtype, prod, float64 vtype = dtype(value_type) tsize = vtype.itemsize - bytes = product(size, dtype=float64)*float(tsize) + bytes = prod(size, dtype=float64)*float(tsize) mbytes = bytes / 2**20 sz = ','.join(['%d' % s for s in size]) e = ('Could not allocate %.0f Mbyte array of size %s and type %s.\n' diff --git a/src/bundles/std_commands/src/align.py b/src/bundles/std_commands/src/align.py index 784ab4e2f..04afcafe0 100644 --- a/src/bundles/std_commands/src/align.py +++ b/src/bundles/std_commands/src/align.py @@ -306,10 +306,10 @@ def move_atoms(atoms, to_atoms, tf, move): def extend_to_chains(atoms): '''Return atoms extended to all atoms in same chains (ie same chain id / structure).''' catoms = [] - from numpy import in1d + from numpy import isin for s, a in atoms.by_structure: satoms = s.atoms - catoms.append(satoms.filter(in1d(satoms.chain_ids, a.unique_chain_ids))) + catoms.append(satoms.filter(isin(satoms.chain_ids, a.unique_chain_ids))) from chimerax.atomic import concatenate, Atoms return concatenate(catoms, Atoms) diff --git a/src/bundles/std_commands/src/sym.py b/src/bundles/std_commands/src/sym.py index e92938c39..164327ce9 100644 --- a/src/bundles/std_commands/src/sym.py +++ b/src/bundles/std_commands/src/sym.py @@ -443,8 +443,8 @@ class Assembly: def _partition_atoms(self, atoms, chain_ids): mmcif_cids = mmcif_chain_ids(atoms, self.from_mmcif) - from numpy import in1d, logical_not - mask = in1d(mmcif_cids, chain_ids) + from numpy import isin, logical_not + mask = isin(mmcif_cids, chain_ids) included_atoms = atoms.filter(mask) logical_not(mask,mask) excluded_atoms = atoms.filter(mask)
comment:5 by , 8 months ago
FYI, I'm working on getting our workflows working for a Python 3.13 tech preview but I don't intend to release it yet since Python 3.13 forces numpy>2. I just want to get the workflows working with more than one kind of build before the next release window in June.
comment:7 by , 7 months ago
Adding longlong numpy support to the ChimeraX C++ numpy handling is unlikely to solve ChimeraX numpy 2 problems and I'd suggest not merging that pull request. None of the ChimeraX C++ code will work with longlong. Our many C++ routines that take numpy integer arrays will likely give an error that it doesn't handle longlong, or it will cast to long which is very inefficient for speed critical calculations like ribbon geometry, or it might just crash if the needed checking is not present.
comment:8 by , 7 months ago
I think the issue is that on other platforms, longs and long longs are the same size, but on Windows longs are 32 bit and long longs are 64 bit. I guess since Numpy 2.0's release notes say the default size on Windows is int64 (long long) now "like other platforms", the default used to be 32 bit (long).
parse_int_nm has this code:
if (v.value_type() == Numeric_Array::Long_Int && allow_copy)
{
IArray vi = IArray(v.dimension(), v.sizes());
vi.set(Reference_Counted_Array::Array<long int>(v));
v = Numeric_Array(Numeric_Array::Int, vi);
}
which to me just looks like it's saying you can treat a long array as a normal integer array. What would the difference be with long longs?
comment:9 by , 7 months ago
The ChimeraX C++ uses numpy int32 for almost all integer arrays for memory efficiency since the arrays can be large and the limiting factor for the size of models. The ribbon code is almost certainly not intentionally using any int64 arrays and so adding long long 64-bit integer support is probably not the solution of the ribbon error seen with numpy 2. More likely is that numpy 2 casting rules made an int64 array and our code that needs fixing is the place in Python where an int32 array got unintentionally upcast to int64.
The purpose of not adding long long to the C++ numpy code is to avoid creating maintenance burden for features that are not used.
comment:10 by , 7 months ago
Regarding the code you show in comment 8, that is copying an int64 array to a int32 array. Mostly our code does not do that and instead generates an error saying an int32 array is required so that we can maintain good performance without accidentally copying arrays. I'd have to look into that example to see why we made an exception for it. The exceptions tend to be related to supporting volume data with more than 2 billion voxels.
comment:11 by , 7 months ago
I'm not saying it's intentionally using it. I'm saying that on macOS:
compile this and run it:
#include <iostream> int main(void) { std::cout << sizeof(long) << std::endl; std::cout << sizeof(long long) << std::endl; return 0; }
>> import numpy >> numpy.array([1]).dtype dtype('int64')
For numpy 1 or 2. Any call to array that hasn't specified the data type has always been 64 bit, except on Windows!
We can certainly fix this by specifying dtype=np.int32 for all calls to array in Python. But no one's complained about memory usage in recent memory!
comment:12 by , 7 months ago
Very little code in ChimeraX should be using int64 arrays. Despite the difference in long on Windows being 32-bit and long on Mac/Linux being 64-bit, ChimeraX on all platforms tries to use the same integer size for instance by specifying "int32". If the ribbon code is using 32-bit on Windows and 64-bit on Mac/Linux for something that is likely a bug that should be fixed.
The memory use is important to the speed and maximum data size of some ChimeraX operations. This is especially true when operating on large volume data.
comment:13 by , 7 months ago
From that perspective, does the C++ as written have a bug? If long int is 64 bit on macOS shouldn't this line:
vi.set(Reference_Counted_Array::Array<long int>(v));
be
vi.set(Reference_Counted_Array::Array<int>(v));
If the intent is to copy 64-bit v into (we hope) 32-bit vi?
comment:14 by , 7 months ago
All uses of Long_Int:
bundles git:(develop) Ⲗ rg Long_Int segment/_segment/pysegment.cpp 38: a.value_type() == Numeric_Array::Unsigned_Long_Int)) arrays/src/include/arrays/rcarray.h 191: Long_Int, Unsigned_Long_Int, Float, Double }; 230: case RCANA::Long_Int: f<long> args; break; \ 231: case RCANA::Unsigned_Long_Int: f<unsigned long> args; break; \ arrays/_arrays/pythonarray.cpp 110: dtype = (sizeof(int) == sizeof(long) ? Numeric_Array::Int : Numeric_Array::Long_Int); break; 112: dtype = (sizeof(int) == sizeof(long) ? Numeric_Array::Unsigned_Int : Numeric_Array::Unsigned_Long_Int); break; 1312: if (v.value_type() == Numeric_Array::Long_Int && allow_copy) arrays/_arrays/rcarray.cpp 323: case Long_Int: return sizeof(long int); 324: case Unsigned_Long_Int: return sizeof(unsigned long int); 466: case Long_Int: return "long int"; 467: case Unsigned_Long_Int: return "unsigned long int"; arrays/_arrays/rcarray.h 191: Long_Int, Unsigned_Long_Int, Float, Double }; 230: case RCANA::Long_Int: f<long> args; break; \ 231: case RCANA::Unsigned_Long_Int: f<unsigned long> args; break; \
comment:15 by , 7 months ago
bundles git:(develop) Ⲗ rg -w long -tc -tcpp -th > long.txt bundles git:(develop) Ⲗ cat long.txt mlp/_mlp/mlp.cpp: long xs0 = xyz.stride(0), xs1 = xyz.stride(1); mlp/_mlp/mlp.cpp: long ps0 = pot.stride(0), ps1 = pot.stride(1), ps2 = pot.stride(2); mlp/_mlp/mlp.cpp: long fs0 = fi.stride(0); profile_grids/_pg/pg.cpp: std::vector<long> longs; chem_group/_cg/cg.cpp:initiate_find_group(CG_Condition* group_rep, std::vector<long>* group_principals, chem_group/_cg/cg.cpp: std::map<long, const Atom*> ring_atom_map; chem_group/_cg/cg.cpp: std::vector<long> group_principals; alignment_headers/al2co/al2co.c:char *cvector(long nl, long nh); alignment_headers/al2co/al2co.c:int *ivector(long nl, long nh); alignment_headers/al2co/al2co.c:double *dvector(long nl, long nh); alignment_headers/al2co/al2co.c:int **imatrix(long nrl, long nrh, long ncl, long nch); alignment_headers/al2co/al2co.c:double **dmatrix(long nrl, long nrh, long ncl, long nch); alignment_headers/al2co/al2co.c:double ***d3tensor(long nrl,long nrh,long ncl,long nch,long ndl,long ndh); alignment_headers/al2co/al2co.c:int **identity_imat(long n); alignment_headers/al2co/al2co.c:else if (flag==2){flag=3;stri[++j]=c;if(j>10){nrerror("string too long:FATAL");exit(0);}continue;} alignment_headers/al2co/al2co.c:int **identity_imat(long n){ alignment_headers/al2co/al2co.c:char *cvector(long nl, long nh){ alignment_headers/al2co/al2co.c:int *ivector(long nl, long nh){ alignment_headers/al2co/al2co.c:long *lvector(long nl, long nh){ alignment_headers/al2co/al2co.c:long int *v; alignment_headers/al2co/al2co.c:v=(long int *)malloc((size_t) ((nh-nl+1+NR_END)*sizeof(long int))); alignment_headers/al2co/al2co.c:double *dvector(long nl, long nh){ alignment_headers/al2co/al2co.c:int **imatrix(long nrl, long nrh, long ncl, long nch){ alignment_headers/al2co/al2co.c:long i, nrow=nrh-nrl+1,ncol=nch-ncl+1; alignment_headers/al2co/al2co.c:double **dmatrix(long nrl, long nrh, long ncl, long nch){ alignment_headers/al2co/al2co.c:long i, nrow=nrh-nrl+1,ncol=nch-ncl+1; alignment_headers/al2co/al2co.c:double ***d3tensor(long nrl,long nrh,long ncl,long nch,long ndl,long ndh){ alignment_headers/al2co/al2co.c:long i,j,nrow=nrh-nrl+1,ncol=nch-ncl+1,ndep=ndh-ndl+1; mmcif/mmcif_cpp/readcif_cpp/readcif.cpp: long page_size = sysconf(_SC_PAGESIZE); mmcif/mmcif_cpp/readcif_cpp/readcif.cpp: // TODO: if ";\" then fold long lines mmcif/mmcif_cpp/readcif_cpp/readcif.h: long long iv = 0; core/core_cpp/pysupport/convert.h: auto pyobj = PyLong_FromLong(static_cast<long>(integer)); core/core_cpp/pysupport/convert.h:inline long pyint_to_clong(PyObject* pyint, const char* item_description) { mmcif/mmcif_cpp/mmcif.cpp: std::map <long, std::pair<Atom*, char>> atom_lookup; mmcif/mmcif_cpp/mmcif.cpp: long seq_id; mmcif/mmcif_cpp/mmcif.cpp: ResidueKey(const string& e, long n, const ResName& m): mmcif/mmcif_cpp/mmcif.cpp: ^ hash<long>()(k.seq_id) mmcif/mmcif_cpp/mmcif.cpp: long seq_id; mmcif/mmcif_cpp/mmcif.cpp: PolySeq(long s, const ResName& m, bool h): mmcif/mmcif_cpp/mmcif.cpp: Residue* find_residue(int model_num, const ChainID& chain_id, long position, const ResName& name); mmcif/mmcif_cpp/mmcif.cpp:ExtractMolecule::find_residue(int model_num, const ChainID& chain_id, long position, const ResName& name) { mmcif/mmcif_cpp/mmcif.cpp: long position; // label_seq_id mmcif/mmcif_cpp/mmcif.cpp: long auth_position = INT_MAX; // auth_seq_id mmcif/mmcif_cpp/mmcif.cpp: long serial_num = 0; // id mmcif/mmcif_cpp/mmcif.cpp: long user_position; // auth_position if given else position mmcif/mmcif_cpp/mmcif.cpp: long atom_serial = 0; mmcif/mmcif_cpp/mmcif.cpp: long serial_num = 0; // id mmcif/mmcif_cpp/mmcif.cpp: long position1, position2; // ptnr[12]_label_seq_id mmcif/mmcif_cpp/mmcif.cpp: long auth_position1 = INT_MAX, // ptnr1_auth_seq_id mmcif/mmcif_cpp/mmcif.cpp: long position1, position2; // (beg|end)_label_seq_id mmcif/mmcif_cpp/mmcif.cpp: long position1, position2; // (beg|end)_label_seq_id mmcif/mmcif_cpp/mmcif.cpp: long position1, position2; // range_[12]_label_seq_id mmcif/mmcif_cpp/mmcif.cpp: long seq_id = 0; mmcif/mmcif_cpp/readcif_cpp/counts.cpp: unsigned long num_rows = 0; core/core_cpp/mac_util/enablemultitouch_mac.h:extern "C" int enable_mac_multitouch(long window_id); morph/_morph/morph.cpp: long n = i.size(); morph/_morph/morph.cpp: for (long j = 0 ; j < n ; j += 4) morph/_morph/morph.cpp: long n = i.size(); morph/_morph/morph.cpp: for (long j = 0 ; j < n ; ++j) morph/_morph/morph.cpp: long n = i.size(); morph/_morph/morph.cpp: for (long j = 0 ; j < n ; ++j) core/core_cpp/mac_util/enablemultitouch.cpp: long win_id; core/core_cpp/chutil/CString.h: error_msg << "\" too long, maximum "; mmcif/mmcif_cpp/readcif_cpp/cifgrep/getopt.h:struct option /* specification for a long form option... */ mmcif/mmcif_cpp/readcif_cpp/cifgrep/getopt.h: * ...for the long form API only; keep this for compatibility. mmcif/mmcif_cpp/readcif_cpp/cifgrep/getopt.h:{ /* When attempting to match a command line argument to a long form option, mmcif/mmcif_cpp/readcif_cpp/cifgrep/getopt.h: * messages, for invalid arguments to long form options ... mmcif/mmcif_cpp/readcif_cpp/cifgrep/getopt.h: * only display one hyphen, for implicit long form options, mmcif/mmcif_cpp/readcif_cpp/cifgrep/getopt.h: * long form options. mmcif/mmcif_cpp/readcif_cpp/cifgrep/getopt.h: * on resolution of a long form option. mmcif/mmcif_cpp/readcif_cpp/cifgrep/getopt.h: * ambiguously matched long form option string is valid as mmcif/mmcif_cpp/readcif_cpp/cifgrep/getopt.h: /* and it looks like a long format option ... mmcif/mmcif_cpp/readcif_cpp/cifgrep/getopt.h: * it to force explicit handling as a long format option. mmcif/mmcif_cpp/readcif_cpp/cifgrep/getopt.h: /* it's not an explicit long option, and `getopt_long_only' isn't active, mmcif/mmcif_cpp/readcif_cpp/cifgrep/getopt.h: /* the current argument is a long form option, (either explicitly, mmcif/mmcif_cpp/readcif_cpp/cifgrep/getopt.h: /* scan the list of defined long form options ... mmcif/mmcif_cpp/readcif_cpp/cifgrep/getopt.h: * partial long option isn't valid for short option mmcif/mmcif_cpp/readcif_cpp/cifgrep/getopt.h: /* if here, then we had what SHOULD have been a long form option, mmcif/mmcif_cpp/readcif_cpp/cifgrep/getopt.h: * as long, nor implicitly matched as such, and the argument isn't segment/_segment/segsurf.cpp: // long t = (long)std::chrono::duration_cast<std::chrono::nanoseconds>(end-begin).count(); atomic_lib/atomic_cpp/pyinstance_cpp/PythonInstance.declare.h: long get_py_int_attr(const char* attr_name, bool create=false) const; atomic_lib/atomic_cpp/pyinstance_cpp/PythonInstance.declare.h: long get_py_int_attr(std::string& attr_name, bool create=false) const; segment/_segment/pysegment.cpp: (sizeof(unsigned long) == sizeof(unsigned int) && atomic_lib/atomic_cpp/pyinstance_cpp/PythonInstance.instantiate.h:long PythonInstance<C>::get_py_int_attr(std::string& attr_name, bool create) const atomic_lib/atomic_cpp/pyinstance_cpp/PythonInstance.instantiate.h:long atomic_lib/atomic_cpp/element_cpp/Element.h: long hash() const { return number(); } atomic_lib/atomic_cpp/atomstruct_cpp/seq_assoc.cpp: // long "covalent" bonds across gaps, some of the atomic_lib/atomic_cpp/atomstruct_cpp/seq_assoc.cpp: // between a normal-numbered segment and a long atomic/atomic_cpp/molc.cpp: PyTuple_SET_ITEM(tuple, 1, PyLong_FromLong(static_cast<long>(arv.num_errors))); atomic/atomic_cpp/molc.cpp: PyObject *ptype = PyLong_FromLong((long)pt); atomic_lib/atomic_cpp/atomstruct_cpp/Ring.cpp:long atomic_lib/atomic_cpp/atomstruct_cpp/Ring.cpp: long value = 0; atomic_lib/atomic_cpp/atomstruct_cpp/Ring.cpp: long v = reinterpret_cast<long>(b); atomic_lib/atomic_cpp/atomstruct_cpp/ChangeTracker.h: long num_deleted = 0; atomic_lib/atomic_cpp/atomstruct_cpp/AtomTypes.cpp: // takes too long to do a massive fused-ring system; surface/_surface/triangulate.cpp:// technique that generates long skinny triangles. surface/_surface/triangulate.cpp: int k = static_cast<int>(reinterpret_cast<long>(vertex_data)); atomic_lib/atomic_cpp/atomstruct_cpp/Structure.cpp: long ci = ii + i2c_offset; atomic_lib/atomic_cpp/atomstruct_cpp/Structure.cpp: if (ci >= (long)complete_size) { atomic_lib/atomic_cpp/atomstruct_cpp/AtomicStructure.cpp: // the estimated length can be too long... atomic_lib/atomic_cpp/atomstruct_cpp/AtomicStructure.cpp: // a _lot_ of errors can make it take a very long time to find the atomic_lib/atomic_cpp/atomstruct_cpp/Ring.h: long hash() const; surface/_surface/refinemesh.cpp:// edge. The algorithm divides long edges, collapses short edges, and swaps surface/_surface/refinemesh.cpp: // std::cerr << "split long edges\n"; pdb/pdbio_cpp/PDBio.cpp: std::map<const Residue*, long> res_order; pdb/pdbio_cpp/PDBio.cpp: long i = 0; alphafold/src/kmer_search/make_kmer_index.c:long num_kmers(int k); alphafold/src/kmer_search/make_kmer_index.c:long kmer_counts(FILE *seqs_file, int k, int unique, long kmin, long kmax, unsigned int *counts); alphafold/src/kmer_search/make_kmer_index.c:unsigned short *sequence_entry_sizes(FILE *seqs_file, long nseq); alphafold/src/kmer_search/make_kmer_index.c: int unique, long kmin, long kmax, unsigned int *seqi, long seqi_len); alphafold/src/kmer_search/make_kmer_index.c:void create_index_files(const char *sequences_path, int k, int unique, long max_memory) alphafold/src/kmer_search/make_kmer_index.c: long nk = num_kmers(k); alphafold/src/kmer_search/make_kmer_index.c: long kmin = 0, kmax = nk; alphafold/src/kmer_search/make_kmer_index.c: long num_sequences = kmer_counts(seqs_file, k, unique, kmin, kmax, counts); alphafold/src/kmer_search/make_kmer_index.c: long ni = 0; alphafold/src/kmer_search/make_kmer_index.c: for (long i = 0 ; i < nk ; ++i) alphafold/src/kmer_search/make_kmer_index.c: long max_seqi = max_memory / sizeof(unsigned int); alphafold/src/kmer_search/make_kmer_index.c: long ni = 0; alphafold/src/kmer_search/make_kmer_index.c:unsigned short *sequence_entry_sizes(FILE *seqs_file, long nseq) alphafold/src/kmer_search/make_kmer_index.c: for (long i = 0 ; i < nseq ; ++i) alphafold/src/kmer_search/make_kmer_index.c:long kmer_counts(FILE *seqs_file, int k, int unique, long kmin, long kmax, unsigned int *counts) alphafold/src/kmer_search/make_kmer_index.c: long nseq = 0; alphafold/src/kmer_search/make_kmer_index.c: int unique, long kmin, long kmax, unsigned int *seqi, long seqi_len) alphafold/src/kmer_search/make_kmer_index.c: long noff = kmax - kmin; alphafold/src/kmer_search/make_kmer_index.c: unsigned long *offsets = (unsigned long *) malloc(noff * sizeof(unsigned long)); alphafold/src/kmer_search/make_kmer_index.c: unsigned long csum = 0; alphafold/src/kmer_search/make_kmer_index.c: for (long i = 0 ; i < noff ; ++i) alphafold/src/kmer_search/make_kmer_index.c: for (long i = 0 ; i < noff ; ++i) alphafold/src/kmer_search/make_kmer_index.c: long nseq = 0; alphafold/src/kmer_search/make_kmer_index.c: unsigned long o = offsets[kii] + counts[kii]; alphafold/src/kmer_search/make_kmer_index.c:long num_kmers(int k) alphafold/src/kmer_search/make_kmer_index.c: long nk = 1; alphafold/src/kmer_search/make_kmer_index.c: long nk = num_kmers(k); alphafold/src/kmer_search/make_kmer_index.c: long nk = num_kmers(k); arrays/_arrays/rcarray.h: case RCANA::Long_Int: f<long> args; break; \ arrays/_arrays/rcarray.h: case RCANA::Unsigned_Long_Int: f<unsigned long> args; break; \ arrays/_arrays/rcarray.cpp: case Long_Int: return sizeof(long int); arrays/_arrays/rcarray.cpp: case Unsigned_Long_Int: return sizeof(unsigned long int); arrays/_arrays/rcarray.cpp: case Long_Int: return "long int"; arrays/_arrays/rcarray.cpp: case Unsigned_Long_Int: return "unsigned long int"; arrays/_arrays/pythonarray.cpp: // Make 32-bit integers always be int instead of long on 32-bit machines. arrays/_arrays/pythonarray.cpp: dtype = (sizeof(int) == sizeof(long) ? Numeric_Array::Int : Numeric_Array::Long_Int); break; arrays/_arrays/pythonarray.cpp: dtype = (sizeof(int) == sizeof(long) ? Numeric_Array::Unsigned_Int : Numeric_Array::Unsigned_Long_Int); break; arrays/_arrays/pythonarray.cpp: case NPY_LONG: name = "long"; break; arrays/_arrays/pythonarray.cpp:// Returns an array of long which will be 32-bit on Windows, arrays/_arrays/pythonarray.cpp: long *py_data = (long *)PyArray_DATA((PyArrayObject *)a); arrays/_arrays/pythonarray.cpp: vi.set(Reference_Counted_Array::Array<long int>(v)); arrays/_arrays/pythonarray.cpp: PyErr_Format(PyExc_TypeError, "array type must be int or long int, got %s", arrays/_arrays/pythonarray.h:// single void * value from Python long pdb_lib/connect_cpp/connect.cpp:static std::map<Element::AS, unsigned long> _saturationMap = { pdb_lib/connect_cpp/connect.cpp: // turn long inter-residue bonds into "missing structure" pseudobonds
comment:16 by , 7 months ago
I wrote a test program to confirm the sizes of various data types on my Mac:
#include <iostream> #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION #include <numpy/arrayobject.h> int main(int argc, char *argv[]) { std::cout << "int: " << sizeof(int) << std::endl; std::cout << "long: " << sizeof(long) << std::endl; std::cout << "long long: " << sizeof(long long) << std::endl; std::cout << "npy_int: " << sizeof(npy_int) << std::endl; std::cout << "npy_long: " << sizeof(npy_long) << std::endl; std::cout << "npy_longlong: " << sizeof(npy_longlong) << std::endl; std::cout << "npy_int64: " << sizeof(npy_int64) << std::endl; return 0; }
Which I then compiled: g++ -IChimeraX.app/Contents/lib/python3.11/site-packages/numpy/core/include -IChimeraX.app/Contents/include/python3.11 test.cpp
>> ./a.out int: 4 long: 8 long long: 8 npy_int: 4 npy_long: 8 npy_longlong: 8 npy_int64: 8
The same on Windows gives:
>> ./a.out int: 4 long: 4 long long: 8 npy_int: 4 npy_long: 4 npy_longlong: 8 npy_int64: 8
comment:17 by , 7 months ago
I was able to fix the bounding box traceback without the arrays patch, by setting the data type of the vertex and triangle arrays to int32. But I still got a traceback in the same place because the edge mask was left as a bare Python list. And I noticed I got a traceback for orthoplanes too, but not tilted slab. Tilted slab relies on some code from the surface bundle which hands it back correctly typed arrays, but the orthoplanes and and bounding box outlines don't -- they hand back bare lists. I plan to make the other two methods return correctly typed numpy arrays tomorrow, but I'm finally at a point where I can put this problem down for the night and not have it gnaw at me (I nerd sniped myself).
comment:18 by , 7 months ago
I tracked the ribbon error down to the display_ranges. It's a bare Python list that gets passed to some C++ code in ribbon_cpp, which then passes it to the array parsing code in arrays/pythonarray.cpp (parse_int_n2_array is the specific call, on line 1154 of xsection.cpp). parse_int_n2_array calls parse_int_nm_array which calls array_from_python which calls PyArray_TYPE(a) on the python list. It must think, what would the data type be if I converted this Python list to a numpy array? Of course, the new default data type is NPY_LONGLONG on Windows, so it crashes out because we don't recognize it.
We could add
import numpy, int32
ranges = array(ranges, dtype=int32)
to _ribbon_geometry in ribbon.py or make the display_range a numpy array in the first place instead of a Python array.
Before putting numpy 2 in a tech preview some careful study should be done to the changes from numpy 1 to numpy 2 since our C++ code heavily uses the numpy 1 C API. In December we put out a build I think just one that had numpy 2 and got bug reports such as #16520 where ribbons did not work correctly. Almost every ChimeraX C++ module I have written to optimize anything (e.g. ribbons) uses numpy 1 C API so we need to understand the numpy 2 API changes and compare against our numpy C++ use.