#1063 closed defect (not a bug)
Collections don't accept Numpy arrays with mismatched type
| Reported by: | Tristan Croll | Owned by: | Tom Goddard |
|---|---|---|---|
| Priority: | moderate | Milestone: | |
| Component: | Core | Version: | |
| Keywords: | Cc: | ||
| Blocked By: | Blocking: | ||
| Notify when closed: | Platform: | all | |
| Project: | ChimeraX |
Description
import numpy
m = session.models.list()[0]
m.atoms.coords = m.atoms.coords.astype(numpy.float32)
---------------------------------------------------------------------------
ArgumentError Traceback (most recent call last)
<ipython-input-3-98d541f0a9d6> in <module>()
----> 1 m.atoms.coords = m.atoms.coords.astype(numpy.float32)
~/apps/chimerax/lib/python3.6/site-packages/chimerax/core/atomic/molc.py in set_prop(self, values)
194 va[:] = values
195 v = pointer(va)
--> 196 cset(self._c_pointers, n, v)
197
198 return property(get_prop, set_prop, doc = doc)
ArgumentError: argument 3: <class 'TypeError'>: expected LP_c_double instance instead of LP_c_float
This came up for me because I have a Collection class with a 64-bit cvec_property that uses Atoms.elements.masses (which returns a float32 array) as an input to calculate spring constants. Of course I can cast it to the required type myself, but would it be better to change line 181 of molc.py to the following?
if isinstance(values,ndarray) and values.ndim == vdim and values.dtype == value_type:
Change History (7)
comment:1 by , 8 years ago
follow-up: 2 comment:2 by , 8 years ago
Actually, the coordinates are now passed to Python as a float64 array (just checked). Not sure when that was changed, but I think it was a while back. Most other properties are still float32. While tuning my C++ Ramachandran/rotamer interpolator I did do some comparisons of float/double performance (running over tens of thousands to millions of interpolations). The performance difference was negligible (in keeping with this StackExchange post: https://stackoverflow.com/questions/3426165/is-using-double-faster-than-float - there's no difference in arithmetic speed, only memory bandwidth), so I decided to standardise on double for all my C++ classes (except where interfacing with existing ChimeraX float functions). Double is certainly a more "natural" fit to Python, which doesn't have a 32-bit floating point type outside of Numpy. On 2018-03-20 19:15, ChimeraX wrote:
comment:3 by , 8 years ago
My previous comment was mostly wrong. ChimeraX Atoms.coords uses float64 for getting and setting and not float32. That is probably good because the C++ also uses float64 for the atom coordinates.
As I mentioned I don't want the code to implicitly cast your float32 coordinates to float64 when you do atoms.coords = xyz32bit because this will hide a performance problem. On the other hand if you have to convert to a float64 numpy array that is slower than it should be and I could add code that can pass the float32 coords to C++ without making a copy. I don't want to spend the time to write that code unless it really will be helpful as their are many other priorities.
follow-up: 4 comment:4 by , 8 years ago
No, none of this is in performance-critical code for me. Now that I understand the reasoning I’m happy to work with the existing framework. Tristan Croll Research Fellow Cambridge Institute for Medical Research University of Cambridge CB2 0XY
comment:5 by , 8 years ago
| Resolution: | → not a bug |
|---|---|
| Status: | assigned → closed |
If high speed setting float32 atom coordinates is needed in the future then we can make the code automatically do that by detecting the array is float32 and calling a different C++ setting method. Won't do that until it is needed.
follow-up: 6 comment:6 by , 8 years ago
Actually, one other tangentially-related thought that might save
headaches for someone down the track. Numpy has a useful (and fast -
sub-microsecond) call to check if the contents of an array are
C-contiguous:
{{{
import numpy
arr = numpy.zeros((1000,3))
arr.data.c_contiguous
True
arr[:,3].data.c_contiguous
False
}}}
It would probably be a good idea to check for this in the Collection
code and trigger a copy if it's not contiguous. It's rather easy to
imagine, for example, someone aggregating alternative sets of values for
a property into different columns of a Numpy array, and getting
unexpected results when they slice them out and attempt to apply them.
On 2018-03-20 20:45, ChimeraX wrote:
comment:7 by , 8 years ago
Ok. I made setting a Collection property convert non-contiguous arrays to contiguous arrays.
Copying numpy arrays is surprisingly expensive. The current ctypes design was intentional to obtain the best performance so you will not get implicit copies slowing down your code that you never detect. Instead you have to explicitly cast the numpy array to the correct type (xyz.astype(float32)) and that makes you aware of the performance issue and often you can then make a float32 array in the first place and avoid the copy.
So I'm not keen on adding implicit type conversions in the array interfaces that are intended to give high speed.
But there is an unpleasant wrinkle to this. The actual atom coordinates in ChimeraX C++ are float64, but the Python interface only accepts float32. Sometimes you really want float64 in Python (maybe it is what is natively used in OpenMM). So really it should be possible to set atom coordinates from Python with float64 or float32. In fact maybe it should only accept float64. I'm not sure if float32 offers significantly better performance (reduced memory bandwidth or faster floating point arithmetic) than float 64. If Python can pass float64 atom coordinates to C++ you will of course want to also be able to get float64 coordinates. It is a bit ugly to choose between float32 and float64 value types when getting coordinates -- we certainly would want to use the right default so most developers never need to know about this esoterica.
You are concerned about speed. Perhaps you have an opinion about whether float32 or float64 coordinates are a better default in Python.