Opened 8 years ago

Closed 8 years ago

Last modified 8 years ago

#1063 closed defect (not a bug)

Collections don't accept Numpy arrays with mismatched type

Reported by: Tristan Croll Owned by: Tom Goddard
Priority: moderate Milestone:
Component: Core Version:
Keywords: Cc:
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

import numpy
m = session.models.list()[0]
m.atoms.coords = m.atoms.coords.astype(numpy.float32)
---------------------------------------------------------------------------
ArgumentError                             Traceback (most recent call last)
<ipython-input-3-98d541f0a9d6> in <module>()
----> 1 m.atoms.coords = m.atoms.coords.astype(numpy.float32)

~/apps/chimerax/lib/python3.6/site-packages/chimerax/core/atomic/molc.py in set_prop(self, values)
    194                     va[:] = values
    195                     v = pointer(va)
--> 196                 cset(self._c_pointers, n, v)
    197 
    198         return property(get_prop, set_prop, doc = doc)

ArgumentError: argument 3: <class 'TypeError'>: expected LP_c_double instance instead of LP_c_float

This came up for me because I have a Collection class with a 64-bit cvec_property that uses Atoms.elements.masses (which returns a float32 array) as an input to calculate spring constants. Of course I can cast it to the required type myself, but would it be better to change line 181 of molc.py to the following?

                if isinstance(values,ndarray) and values.ndim == vdim and values.dtype == value_type:

Change History (7)

comment:1 by Tom Goddard, 8 years ago

Copying numpy arrays is surprisingly expensive. The current ctypes design was intentional to obtain the best performance so you will not get implicit copies slowing down your code that you never detect. Instead you have to explicitly cast the numpy array to the correct type (xyz.astype(float32)) and that makes you aware of the performance issue and often you can then make a float32 array in the first place and avoid the copy.

So I'm not keen on adding implicit type conversions in the array interfaces that are intended to give high speed.

But there is an unpleasant wrinkle to this. The actual atom coordinates in ChimeraX C++ are float64, but the Python interface only accepts float32. Sometimes you really want float64 in Python (maybe it is what is natively used in OpenMM). So really it should be possible to set atom coordinates from Python with float64 or float32. In fact maybe it should only accept float64. I'm not sure if float32 offers significantly better performance (reduced memory bandwidth or faster floating point arithmetic) than float 64. If Python can pass float64 atom coordinates to C++ you will of course want to also be able to get float64 coordinates. It is a bit ugly to choose between float32 and float64 value types when getting coordinates -- we certainly would want to use the right default so most developers never need to know about this esoterica.

You are concerned about speed. Perhaps you have an opinion about whether float32 or float64 coordinates are a better default in Python.

in reply to:  2 ; comment:2 by tic20@…, 8 years ago

Actually, the coordinates are now passed to Python as a float64 array 
(just checked). Not sure when that was changed, but I think it was a 
while back. Most other properties are still float32.

While tuning my C++ Ramachandran/rotamer interpolator I did do some 
comparisons of float/double performance (running over tens of thousands 
to millions of interpolations). The performance difference was 
negligible (in keeping with this StackExchange post: 
https://stackoverflow.com/questions/3426165/is-using-double-faster-than-float 
- there's no difference in arithmetic speed, only memory bandwidth), so 
I decided to standardise on double for all my C++ classes (except where 
interfacing with existing ChimeraX float functions).  Double is 
certainly a more "natural" fit to Python, which doesn't have a 32-bit 
floating point type outside of Numpy.

On 2018-03-20 19:15, ChimeraX wrote:

comment:3 by Tom Goddard, 8 years ago

My previous comment was mostly wrong. ChimeraX Atoms.coords uses float64 for getting and setting and not float32. That is probably good because the C++ also uses float64 for the atom coordinates.

As I mentioned I don't want the code to implicitly cast your float32 coordinates to float64 when you do atoms.coords = xyz32bit because this will hide a performance problem. On the other hand if you have to convert to a float64 numpy array that is slower than it should be and I could add code that can pass the float32 coords to C++ without making a copy. I don't want to spend the time to write that code unless it really will be helpful as their are many other priorities.

in reply to:  4 ; comment:4 by tic20@…, 8 years ago

No, none of this is in performance-critical code for me. Now that I understand the reasoning I’m happy to work with the existing framework.

 
 
Tristan Croll
Research Fellow
Cambridge Institute for Medical Research
University of Cambridge CB2 0XY
 

 


comment:5 by Tom Goddard, 8 years ago

Resolution: not a bug
Status: assignedclosed

If high speed setting float32 atom coordinates is needed in the future then we can make the code automatically do that by detecting the array is float32 and calling a different C++ setting method. Won't do that until it is needed.

in reply to:  6 ; comment:6 by tic20@…, 8 years ago

Actually, one other tangentially-related thought that might save 
headaches for someone down the track. Numpy has a useful (and fast - 
sub-microsecond) call to check if the contents of an array are 
C-contiguous:

{{{
import numpy
arr = numpy.zeros((1000,3))
arr.data.c_contiguous
   True
arr[:,3].data.c_contiguous
   False
}}}

It would probably be a good idea to check for this in the Collection 
code and trigger a copy if it's not contiguous. It's rather easy to 
imagine, for example, someone aggregating alternative sets of values for 
a property into different columns of a Numpy array, and getting 
unexpected results when they slice them out and attempt to apply them.

On 2018-03-20 20:45, ChimeraX wrote:

comment:7 by Tom Goddard, 8 years ago

Ok. I made setting a Collection property convert non-contiguous arrays to contiguous arrays.

Note: See TracTickets for help on using tickets.