Opened 8 years ago
Closed 8 years ago
#778 closed defect (fixed)
changing coordinates very slow for large structures
Reported by: | Tristan Croll | Owned by: | Tom Goddard |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | Graphics | Version: | |
Keywords: | Cc: | Tristan Croll, pett@… | |
Blocked By: | Blocking: | ||
Notify when closed: | Platform: | all | |
Project: | ChimeraX |
Description
It would seem that in large structures (i.e. over 100k atoms), moving any atom(s) triggers something rather expensive. It's making ISOLDE almost unusably slow in large constructs, even when the fraction I'm simulating is tiny. As a simple demo:
import numpy from time import time class MoveTimer: def __init__(self, session, cycles, indices, model): self.start_time = time() self.counter = 0 self.max_cycles = cycles self.session = session self.atoms = model.atoms[indices] self.model = model self.length = len(self.atoms) self.times = numpy.empty(cycles) self.handler = session.triggers.add_handler('new frame', self.time_change_coords) def time_change_coords(self,*_): if self.counter >= self.max_cycles: print('Updating coordinates for {} atoms in a structure with {} total atoms took {:0.4f} +/- {:0.4f} seconds per frame' .format(self.length, len(self.model.atoms), self.times.mean(), self.times.std())) self.session.triggers.remove_handler(self.handler) else: self.times[self.counter] = time() - self.start_time self.start_time = time() targets = numpy.random.rand(self.length, 3)*50 self.atoms.coords = targets self.counter += 1 indices = numpy.array([i for i in range(100,110)]) m0 = {model with 184,550 atoms, displayed in stick representation with simple lighting} m1 = {model with 5133 atoms, identical representation} MoveTimer(session, 50, indices, m0) > Updating coordinates for 10 atoms in a structure with 184550 total atoms took 0.2716 +/- 0.0444 seconds per frame MoveTimer(session, 50, indices, m1) > Updating coordinates for 10 atoms in a structure with 5133 total atoms took 0.0356 +/- 0.0095 seconds per frame indices = numpy.array([10], numpy.int) MoveTimer(session, 50, indices, m0) Updating coordinates for 1 atoms in a structure with 184550 total atoms took 0.2829 +/- 0.0444 seconds MoveTimer(session, 50, indices, m1) Updating coordinates for 1 atoms in a structure with 5133 total atoms took 0.0343 +/- 0.0084 seconds
If I hide everything but the single mobile atom:
MoveTimer(session, 50, indices, m0) > Updating coordinates for 1 atoms in a structure with 184550 total atoms took 0.1864 +/- 0.0460 seconds per frame MoveTimer(session, 50, indices, m0) > Updating coordinates for 1 atoms in a structure with 5133 total atoms took 0.0310 +/- 0.0046 seconds per frame
In fact, even if all atoms are hidden:
Updating coordinates for 1 atoms in a structure with 184550 total atoms took 0.1346 +/- 0.0383 seconds per frame Updating coordinates for 1 atoms in a structure with 5133 total atoms took 0.0283 +/- 0.0074 seconds per frame
Attachments (3)
Change History (54)
follow-up: 2 comment:2 by , 8 years ago
Are ribbons being shown? Ribbon recalculations are very slow. —Eric
follow-up: 3 comment:3 by , 8 years ago
Eric implemented the structure change tracking. Assign the ticket to him and put me on the cc list. Of course we would love if you isolate the problem and fix it! We are about to make a second alpha release though (next week) so don't push changes to the core without discussing first.
follow-up: 4 comment:4 by , 8 years ago
Nope. Just bonds - but because any change triggers a complete recalculation of atom spheres and bond geometry regardless of their display state or whether they've moved, it starts to really crawl for big structures. I'm working on a fix, as long as you're ok with it... my first real foray into your C++ API - nice! The basic idea is: For each object which tracks its changes in ChangeTracker, add a boolean _changed private flag, and changed()/set_changed(bool flag) functions. Then, when an object calls: structure()->change_tracker()->add_modified(this, ChangeTracker::REASON_ALT_LOC); ChangeTracker::add_modified in turn writes back to the object: ptr->set_changed(true) ... and then once changed() and set_changed() are exposed to python through molc.py, it should be trivial to limit structure.py to only redraw the parts that have actually changed, and then reset the flags for the next graphics iteration. Any thoughts? On 2017-08-04 17:07, Eric Pettersen wrote:
comment:5 by , 8 years ago
Cc: | added |
---|---|
Owner: | changed from | to
I don't seem to have the ability to cc in Tom - the form only gives me the ability to add myself to the cc list. But reassigning to Eric as requested.
follow-up: 6 comment:6 by , 8 years ago
Couldn't you just get the atoms that changed for this structure (in Python) with: ct = session.change_tracker changes = Changes(ct.changes) changed_atoms = changes.modified_atoms() structure_atoms = changed_atoms.filter(changed_atoms.structures == struct) —Eric
follow-up: 7 comment:7 by , 8 years ago
I don't think it will take a tenth of a second to update the graphics for 100000 atoms and bonds. You need to know more precisely where the time is being used. It would be complex to update just the changed atoms and bonds in the graphics arrays. Did ChimeraX from 3 months ago have this problem?
follow-up: 8 comment:8 by , 8 years ago
Hmm... I suppose that *would* be much easier. Sometimes the obvious answer is staring you right in the face. :) Tristan Croll Research Fellow Cambridge Institute for Medical Research University of Cambridge CB2 0XY
follow-up: 9 comment:9 by , 8 years ago
That I couldn't tell you, to be quite honest. I've only started playing with larger structures in ISOLDE over the past month or two. Will try to dig into it a bit more. Tristan Croll Research Fellow Cambridge Institute for Medical Research University of Cambridge CB2 0XY
follow-up: 10 comment:10 by , 8 years ago
Actually, thinking about it a little more there's still a good reason to do it my way. To update only those entities that have changed you need their indices - and changed_atoms.indices(atoms) takes about 10ms for 200k atoms all by itself. Judging by the performance of other molarray functions, a hypothetical call to atoms[changed] should be an order of magnitude faster. Tristan Croll Research Fellow Cambridge Institute for Medical Research University of Cambridge CB2 0XY
comment:11 by , 8 years ago
Cc: | added |
---|---|
Component: | Unassigned → Graphics |
Owner: | changed from | to
For a speed optimization problem you have to first find where the time is going. It doesn't make sense to propose a solution until you know that. Your timing code appears to time the entire delay between two redraws (including graphics vertical sync probably at 60 frames / second), and who knows what all code is listening to the atom changes, most of the time may not even be in the graphics.
If you want to reduce 100 ms to 10 ms but there are 5 parts of the code each taking 2 ms it could be a huge job to improve most of those to reduce the time.
So here is one way to get a better idea where the time is going. First guess. Maybe it is computing and updating the bond positions (this produces a new 4x4 matrix for all bonds when any atom moves). So go into structure.py and put a "return" in to prevent the bond positions from updating. Run your timing test and see how much faster it is. In general, disable optional code (like updating bond positions) and see how the time changers and try to find where most of the time is going.
Another important thing to understand is that you can't just move a few bonds in the array of 100,000 that is being handed to OpenGL. You have to hand it the whole new array (100000 * 4 * 4 * 4 - 6.4 Mbytes per draw) and if that is to run at say 100 times per second that is 640 Mbytes / second of bandwidth to the graphics card just for the bonds. And on the CPU we are also recomputing a whole new array of bonds, probably another 640 Mbytes / sec of bandwidth. Very complex optimization could improve this but you'd have to understand in great detail where the performance bottlenecks are. Our code has not be optimized much and there is probably lots of low hanging fruit to speed it up that does not involve trying to limit updates to subsets of atoms and bonds.
Will be interested to hear what you find.
follow-up: 12 comment:12 by , 8 years ago
Well, once the structure gets *really* large then the delays certainly add up elsewhere, but in a "only fairly large" structure it seems to be mostly in bond and atom recalculation. Working with a particularly large structure (HIV capsid, 3j3y) to really bring it out: MoveTimer(session, 10, indices, m) ... but after editing structure.py to time each key step in recalculation (_create_ribbon_graphics(), _update_ribbon_tethers(), _update_atom_graphics(), _update_bond_graphics(), the pbg._update_graphics() loop, _update_ribbon_graphics()): Ribbon creation took 2.6941299438476562e-05 Ribbon tether update took 4.982948303222656e-05 Update atom graphics took 0.36548638343811035 Update bond graphics took 0.6054508686065674 Update pseudobondgroups took 6.556510925292969e-05 Update ribbon took 0.02550506591796875 ... which adds up to roughly a second. For a less extreme structure (4v8r): Updating coordinates for 10 atoms in a structure with 128780 total atoms took 0.1138 +/- 0.0354 seconds per frame Ribbon creation took 2.9087066650390625e-05 Ribbon tether update took 5.1975250244140625e-05 Update atom graphics took 0.02771759033203125 Update bond graphics took 0.036278486251831055 Update pseudobondgroups took 0.0020813941955566406 Update ribbon took 0.0026438236236572266 ... making roughly half of the total time, most of which could be made to go away in the sort of case I'm talking about where <<5% of all atoms change. Still, perhaps it will be best if I bite the bullet and take the approach that VMD took for interactive MD: split the simulated portion off into its own temporary molecule and hide the equivalent atoms in the original. Would take a fair bit of work, but looks like it'll be worth it in the end. On 2017-08-04 18:22, ChimeraX wrote:
follow-up: 13 comment:13 by , 8 years ago
Possibly those update atom graphics and update bond graphics times can be improved directly. The next step would be to go into those routines and see which particular lines are taking the time. For the 4v8r case you say the 0.06 seconds for updating atom/bond graphics is taking roughly half the time. Still would need to know where the other half is, since even if you brought the graphics update time to zero you would only get a speedup of a factor of 2 which and I’m sure you want more. I agree that it will probably be more successful to use a separate partial molecule if you really want the best speed for the least coding effort.
follow-up: 14 comment:14 by , 8 years ago
The following simple modifications in structure.py (to limit atom and bond redrawing to only visible atoms) goes a long way towards improving things (in ISOLDE, I hide everything that isn't in the immediate vicinity of the mobile selection as a matter of course): def _update_atom_graphics(self, changes = StructureData._ALL_CHANGE): atoms = self.atoms # optimzation, avoid making new numpy array from C++ avis = atoms.visibles vis_atoms = atoms[avis] p = self._atoms_drawing if p is None: if avis.sum() == 0: return self._atoms_drawing = p = self.new_drawing('atoms') self._atoms_drawing.custom_x3d = self._custom_atom_x3d # Update level of detail of spheres self._level_of_detail.set_atom_sphere_geometry(p) #~ if changes & self._SHAPE_CHANGE: #~ # Set instanced sphere center position and radius #~ n = len(atoms) #~ from numpy import empty, float32, multiply #~ xyzr = empty((n, 4), float32) #~ xyzr[:, :3] = atoms.coords #~ xyzr[:, 3] = self._atom_display_radii(atoms) if changes & (self._COLOR_CHANGE | self._SHAPE_CHANGE): # Set atom colors #~ p.colors = atoms.colors p.colors = vis_atoms.colors if changes & self._SHAPE_CHANGE: # Set instanced sphere center position and radius n = len(vis_atoms) from numpy import empty, float32, multiply xyzr = empty((n, 4), float32) xyzr[:, :3] = vis_atoms.coords xyzr[:, 3] = self._atom_display_radii(vis_atoms) from ..geometry import Places p.positions = Places(shift_and_scale=xyzr) #~ p.display_positions = avis def _update_bond_graphics(self, changes = StructureData._ALL_CHANGE): bonds = self.bonds # optimzation, avoid making new numpy array from C++ bvis = bonds.showns vis_bonds = bonds[bvis] p = self._bonds_drawing if p is None: if sum(bvis) == 0: return self._bonds_drawing = p = self.new_drawing('bonds') self._bonds_drawing.custom_x3d = self._custom_bond_x3d # Update level of detail of cylinders self._level_of_detail.set_bond_cylinder_geometry(p) #~ if changes & (self._SHAPE_CHANGE | self._SELECT_CHANGE): #~ bond_atoms = bonds.atoms #~ if changes & self._SHAPE_CHANGE: #~ ba1, ba2 = bond_atoms #~ p.positions = _halfbond_cylinder_placements(ba1.coords, ba2.coords, bonds.radii) #~ p.display_positions = _shown_bond_cylinders(bonds) #~ if changes & (self._COLOR_CHANGE | self._SHAPE_CHANGE): #~ p.colors = c = bonds.half_colors #~ if changes & (self._SELECT_CHANGE | self._SHAPE_CHANGE): #~ p.selected_positions = _selected_bond_cylinders(bond_atoms) if changes & (self._SHAPE_CHANGE | self._SELECT_CHANGE): bond_atoms = vis_bonds.atoms if changes & (self._COLOR_CHANGE | self._SHAPE_CHANGE): #~ p.colors = c = bonds.half_colors p.colors = c = vis_bonds.half_colors if changes & self._SHAPE_CHANGE: ba1, ba2 = bond_atoms p.positions = _halfbond_cylinder_placements(ba1.coords, ba2.coords, vis_bonds.radii) p.display_positions = _shown_bond_cylinders(vis_bonds) if changes & (self._SELECT_CHANGE | self._SHAPE_CHANGE): p.selected_positions = _selected_bond_cylinders(bond_atoms) Certainly not a magic bullet, but from the timings below it would be enough for me to defer the problem for a while - seems to stabilise to a roughly 4-fold increase in frame rate for large structures. For the 184k atom case, in real-world ISOLDE simulations it takes me from a more-or-less unusable 1-1.5 fps to a still-slow-but-usable 4-6 fps. Obviously I'll still need to take further steps down the track, but it's a start. As far as I can tell, everything else still works fine with the changes. Before: Updating coordinates for 10 atoms in a structure with 184550 total atoms took 0.2632 +/- 0.0385 seconds per frame Changed 2440800 atom styles Updating coordinates for 10 atoms in a structure with 2440800 total atoms took 2.5118 +/- 1.2431 seconds per frame Only first 100 residues shown: Updating coordinates for 10 atoms in a structure with 184550 total atoms took 0.1759 +/- 0.0436 seconds per frame Updating coordinates for 10 atoms in a structure with 2440800 total atoms took 1.9635 +/- 0.9822 seconds per frame After: All atoms shown: Updating coordinates for 10 atoms in a structure with 26740 total atoms took 0.0515 +/- 0.0051 seconds per frame All atoms shown: Updating coordinates for 10 atoms in a structure with 184550 total atoms took 0.2598 +/- 0.0436 seconds per frame Updating coordinates for 10 atoms in a structure with 2440800 total atoms took 2.7552 +/- 1.4352 seconds per frame Only first 100 residues shown: Updating coordinates for 10 atoms in a structure with 26740 total atoms took 0.0320 +/- 0.0022 seconds per frame Updating coordinates for 10 atoms in a structure with 184550 total atoms took 0.0631 +/- 0.0089 seconds per frame Updating coordinates for 10 atoms in a structure with 2440800 total atoms took 0.7370 +/- 0.1270 seconds per frame On 2017-08-04 19:08, ChimeraX wrote:
follow-up: 15 comment:15 by , 8 years ago
Wouldn't this change make the sequence: 1. set color of all atoms, both displayed and undisplayed 2. display all atoms not update the displayed color of atoms? That is, the atom colors will be set, but the drawing colors will not? Conrad On 8/8/2017 4:50 AM, Tristan Croll wrote:
follow-up: 16 comment:16 by , 8 years ago
... and this change to Structure._atom_bounds saves 20-25ms (from 25-30ms down to 3ms) when I have only 100 residues (1630 atoms) of my 184,550-atom structure displayed: def _atom_bounds(self): if not self._atom_bounds_needs_update: return self._cached_atom_bounds a = self.atoms a = a[a.displays] #disp = a.displays xyz = a.coords #[disp] radii = a.radii #[disp] from .. import geometry b = geometry.sphere_bounds(xyz, radii) self._cached_atom_bounds = b self._atom_bounds_needs_update = False return b That gets me to a framerate of about 8fps (125ms per frame) for a simulation where ISOLDE takes about 75ms per iteration (from which I can probably fairly easily trim 20-30 ms). Good enough for now. On 2017-08-08 12:51, ChimeraX wrote:
follow-up: 17 comment:17 by , 8 years ago
Just tried that: 1. with all atoms displayed, set atoms to colour by element. 2. limit the display to a subset 3. set atoms to colour by chain 4. show all atoms ... and all atoms are coloured by chain, as expected. On 2017-08-08 18:16, Conrad Huang wrote:
comment:18 by , 8 years ago
The case Conrad asks about of coloring undisplayed atoms and later showing those atoms will work with Tristan's code because it updates the drawing sphere colors on SHAPE_CHANGED and that happens when any atom is shown or hidden.
My initial take is that Tristan's changes should not cause any problem, they simply put into the Drawing only the visible atom spheres instead of the current code that includes all atom spheres and simply hides the undisplayed ones. I don't remember why I didn't do it originally as Tristan suggests -- maybe I thought it reduced deleting and creating new OpenGL vertex buffers, but I don't even think that is the case. I'll analyze the code changes more carefully and commit them if it all looks good. My one reservation in my quick look is that it slightly slows down the case of all atoms being displayed, which is a very common case, while it speeds up the case of moving atoms when only a small fraction are visible, a rare case -- basically just ISOLDE. I know Tristan does not see ISOLDE as a rare case, but I mean 95% of the people running at any given instant are probably not using ISOLDE.
follow-up: 19 comment:19 by , 8 years ago
Is it actually that common to have *every* atom displayed, though? In my experience that's really quite rare - one is usually hiding as many atoms as possible to make it easier to focus on and interpret a region of interest. Tristan Croll Research Fellow Cambridge Institute for Medical Research University of Cambridge CB2 0XY
follow-up: 20 comment:20 by , 8 years ago
I think the most common situation has ribbons shown so backbone atoms and maybe many side chains are hidden. So I agree probably having most atoms hidden is typical.
comment:21 by , 8 years ago
The atom and bond graphics update improvements look reasonable. But there are changes in other places that would be required (picking and x3d) that involve about 3 times as many code changes. Look for all uses of _atoms_drawing and _bonds_drawing to find the places where more changes would be needed.
If you want these changes made I'd like a complete set of code changes from you (or you can wait until it becomes my highest priority, a long wait probably). I'd suggest providing those changes by giving me a new structure.py files (and pbgroup.py will also need to follow bond changes since they share code with bonds). It is quite difficult for me to examine your changes made by commenting out code. With a whole new file I can easily see the diffs to the current git version. The typical git way to propose these changes would be via a pull request, but we have not used that yet for ChimeraX, so that approach to providing your changes would likely take more work.
comment:22 by , 8 years ago
I have included your atom bounds improvement. Surprisingly the numpy masking operations coords[disp] and radii[disp] take much longer than forming atoms[disp] which is a masking operation with our Atoms collection class. It never ceases to horrify me how slow basic numpy operations on arrays are.
comment:23 by , 8 years ago
My previous comment mis-analyzed the improvement in atom bounds speed. Our C++ atoms.coords and atoms.radii methods take more time than slicing the numpy arrays. For 3j3q looking up atom coords too 65 msec and slicing took 60 msec, looking up atom radii took 85 msec, and slicing took 2 msec, total bounds time is 270 msec with original code. Radii lookup is horribly slow because it is recomputing the default radius from elements and number of bonds every time -- ugh. Coordinate slicing I guess is slow because it is a 2D array while the radii slicing is fast as a 1D array. Coordinate lookup handles alt locs and active coordinate set lookup, but I haven't yet determined why it is so slow.
comment:25 by , 8 years ago
I did some more playing around today with my original idea, which maintains the complete Atom and Bond drawings at all times but only updates the parts that are both visible and have changed since the last time they were drawn.
Short description: in the C++ core Atom and Bond objects I added the following to the headers:
private: bool _changed = false; public: bool changed() const { return _changed; } void set_changed(bool flag) { _changed = flag; } void track_change(const std::string& reason) { change_tracker()->add_modified(this, reason); this->set_changed(true); }
In the Atom case I also added a bonds_track_change function:
void Atom::bonds_track_change(const std::string& reason) { for(std::vector<Bond*>::iterator b = _bonds.begin(); b != _bonds.end(); ++b) { (*b)->track_change(reason); } }
... so that bonds can be notified when their atoms change in a way that affects their drawing (color, coord, display). After passing the changed() and set_changed() functions up to Python, then in structure._update_graphics() one can do:
import numpy atoms = self.atoms avis = atoms.visibles achanged = atoms._changed vis_atom_changes = numpy.logical_and(avis, achanged) ch_atoms = atoms[vis_atom_changes] bonds = self.bonds bvis = bonds.showns bchanged = bonds._changed vis_bond_changes = numpy.logical_and(bvis, bchanged) ch_bonds = bonds[vis_bond_changes] half_bond_change_filter = numpy.concatenate((vis_bond_changes, vis_bond_changes))
... and change _update_atom_graphics() and _update_bond_graphics() to work only on the results. To handle the cases where the model itself changes (i.e. atoms are added or deleted), I've included a force_recalc flag which causes a redraw of everything from scratch.
It's required quite an extensive list of changes to allow the relevant arrays to be changed in place, but the results are quite pleasing. The resulting drawing objects are all identical as far as the rest of ChimeraX is concerned, and the timings are more-or-less on a par with the other approach - significantly faster when all atoms are shown, slightly slower when the display is restricted to smaller numbers of atoms.
For 3j3q, all atoms shown:
Updating coordinates for 10 atoms in a structure with 2440800 total atoms took 1.5500 +/- 0.7743 seconds per frame
First 100 residues shown:
Updating coordinates for 10 atoms in a structure with 2440800 total atoms took 0.4485 +/- 0.2115 seconds per frame
For my intermediate-sized example, all atoms shown:
Updating coordinates for 10 atoms in a structure with 184550 total atoms took 0.1233 +/- 0.0319 seconds per frame
First 100 residues shown:
Updating coordinates for 10 atoms in a structure with 184550 total atoms took 0.0605 +/- 0.0109 seconds per frame
It just struck me that a really easy way to get blazingly fast speed when necessary would be to add a function, say, Structure.restrict_drawing(atoms), such that one could do:
def _update_graphics(self, ...): if self.restrict_to_subset: atoms = self._restrict_atoms bonds = self._restrict_bonds else: atoms = self.atoms bonds = self.bonds
... and nothing much else would need to change.
Will do a bit more double-checking and try to bundle it all together and send tomorrow.
comment:26 by , 8 years ago
To summarize your optimization ideas presented in previous comments:
1) Have the atom Drawing only contain spheres for the visible atoms, and the bond drawing only cylinders for visible bonds.
2) Same as 1, plus when updating only change coordinates and colors for atoms/bonds that have actually changed.
Optimization 1) can improve speed a lot when only a small fraction of atoms are shown. Optimization 2) can further improve speed when only a small fraction of visible atoms are changed.
For your extreme 3j3q test case (2.4 million atoms) the times for optimization method (2) of 1.55 seconds per frame to change 10 atom coordinates when all atoms shown and 0.45 seconds per frame when 100 residues shown seems abysmal. I wonder why it is taking so long. Is it using ambient lighting and all that time is in the lighting recalculation?
follow-up: 27 comment:27 by , 8 years ago
Not quite. Option (2) keeps the entire drawing geometry in memory as per the status quo, but limits recalculation to only atoms/bonds that are both changed and visible, modifying the existing Places and colors arrays by Numpy slicing. A fair bit of the time cost just goes into the initial setup. On my laptop (admittedly a little faster than the desktop I was using for the previous timings), for 3j3y: %timeit a = m.atoms 2 ms ± 11.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) %timeit a.visibles 37 ms ± 2.78 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) ... and about the same for a._changed %timeit b = m.bonds 2.07 ms ± 26.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) %timeit b.showns 74.8 ms ± 2.97 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) ... not sure if this would be the same for b._changed, or if it's closer to the atoms case. But that's already 190-230 ms gone before even starting on the main calculations. It would take quite a lot of re-jigging of the core machinery (having the Atoms, Bonds etc. write these flags to a common array?) to optimise past that. On 2017-08-09 18:40, ChimeraX wrote:
follow-up: 28 comment:28 by , 8 years ago
Ok. But the option 2 I described cuts down the number of atoms to the visible ones so a.visibles and b.showns that take 112 msec for 3j3q all atoms would take less time. (I don’t understand why you said 190-230 msec in your previous comment). But the 75 msec for b.showns seems ridiculously slow, although I did observe a.coords taking 66 msec when I was testing yesterday which also seemed very slow, but timing subparts did not reveal and clear way to make the C++ faster. A standard approach in ChimeraX to optimize is replace the vector numpy operations with C++ code that gets rid of as many temporary numpy arrays (like b.showns, a.visibles) as possible. This is not difficult if the code is amenable and many examples are in src/atomic/molc.cpp which is the code that provides Python interfaces to the molecule C++ using ctypes. Thanks for investigating the optimiization possibilities. I liked your idea 1) to have the Drawing just represent the visible objects — very sensible. That combined with some C++ to optimize the slow update stuff could make it very fast.
follow-up: 29 comment:29 by , 8 years ago
There's also the new a._changed and b._changed, with timeframes about the same as a.visibles. I think these make sense - organization becomes so much easier when the objects themselves know when they need updating. With a little tweaking the ._changed call could be run on only the visible atoms (or vice versa) - but you're right, further optimization is probably better at the lower level. Will play some more - I'm enjoying learning about how the innards work! Tristan Croll Research Fellow Cambridge Institute for Medical Research University of Cambridge CB2 0XY
follow-up: 30 comment:30 by , 8 years ago
Slightly off-topic, but still on the subject of large structures: when I have 3j3y loaded, then: a = m.atoms a.disp<tab> ... hangs for almost a minute before autocompleting. On 2017-08-09 20:08, ChimeraX wrote:
follow-up: 31 comment:31 by , 8 years ago
The shell tab completion evaluates every single matching the property — I guess it wants to know the values for some reason. This is reckless since some properties may require lots of computation. I consider this a defect of the shell. It should not be evaluating properties in order to autocomplete them by name. Feel free to investigate whether the shell (qtconsole.rich_jupyter_widget) has options to disable evaluating all properties during autocomplete.
comment:32 by , 8 years ago
OK, serious progress now.
What I've done:
- Added functions to get/set flags called visible_atoms_changed and visible_bonds_changed to Structure.cpp, and made every Atom and Bond function that can affect these set the relevant flag to True. They're reset back to False by each run of structure._update_graphics_if_needed().
- Added structure_visible_bonds() and structure_visible_atoms() functions that get arrays of visible bonds and atoms directly without having to filter down from the list of all atoms. These take 1.3 and 5.1ms respectively for my 185k atom case - to speed things up further structure.py now caches these, using the visible_xxx_changed flag to decide if it needs to update the cache.
- structure.py draws only visible atoms and bonds. If the number of either changes, the relevant drawing is redrawn from scratch, otherwise the existing drawing is updated from only the changed atoms.
- other places within structure.py where only visible atoms are relevant (e.g. _atom_bounds()) have been changed to use the fast/cached visible_atoms approach.
The upshot:
185k atoms, all atoms shown, sticks, simple lighting: MoveTimer(session, 50, indices, m) > Updating coordinates for 10 atoms in a structure with 184550 total atoms took 0.1206 +/- 0.0243 seconds per frame %timeit m._update_graphics() > 11 ms ± 119 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 185k atoms, first 100 residues shown, sticks, simple lighting: MoveTimer(session, 50, indices, m) > Updating coordinates for 10 atoms in a structure with 184550 total atoms took 0.0285 +/- 0.0108 seconds per frame %timeit m._update_graphics() > 1.58 ms ± 140 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 2.44M atoms, all atoms shown, sticks, simple lighting: MoveTimer(session, 50, indices, m) > Updating coordinates for 10 atoms in a structure with 2440800 total atoms took 1.4435 +/- 0.2139 seconds per frame %timeit m._update_graphics() 103 ms ± 3.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ... of which it should be possible to make about half go away with some tweaking of _update_ribbon_graphics to exit early when no ribbons are displayed: %timeit m._update_ribbon_graphics() > 43.7 ms ± 7.67 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 2.44M atoms, sticks, first 100 residues shown, sticks, simple lighting: MoveTimer(session, 50, indices, m) > Updating coordinates for 10 atoms in a structure with 2440800 total atoms took 0.0694 +/- 0.0175 seconds per frame %timeit m._update_graphics() > 6.1 ms ± 263 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ... of which about 5ms is _update_ribbon_graphics()
So there's still a little bit of overhead somewhere that's being triggered by coordinate updates but doesn't appear to be part of the graphics loop (70ms for the 2.44M atom structure vs 28.5ms for the 185k with comparable numbers of atoms displayed), but it will probably be relatively hard to track down (the timing is consistent with 1-2 calls to structure.atoms.<property>). But then, it's getting pretty academic at this point - once your structure gets to this sort of size, you'd be crazy *not* to break it down into manageable chunks unless you absolutely have to work with the whole thing at once. I'll attach the diff in a follow-up.
comment:33 by , 8 years ago
... oh, and as far as I can tell picking still seems to work just fine, without me having done much of anything to it. Rather amazing, considering.
comment:34 by , 8 years ago
Spoke too soon. Selection highlighting *isn't* working properly. Could have sworn it was...
comment:35 by , 8 years ago
OK, that should be better. For the picking, all I had to do was replace self.atoms with self.visible_atoms and self.bonds with self.visible_bonds, and all now looks normal. Also squashed another bug - adding the rather interesting line:
p.positions = p.positions
... so the atoms and bonds drawings know that their positions arrays have changed. The bug was unmasked when I changed the colour and selection handling code to run only under the control of self._COLOR_CHANGE and self._SELECT_CHANGE respectively (previously they would also run on self._SHAPE_CHANGE, but I believe that's now unnecessary).
One thing that will need to be checked (should be easy to fix if it doesn't already work) is what happens if atoms are added or removed to an already-displayed structure. As long as the the change sets _visible_atoms_changed and _visible_bonds_changed (where applicable) to True, it should just work.
comment:36 by , 8 years ago
Hmm... Not sure why it should have changed overnight, but this morning when I run the 2.44M atom test with only the first 100 residues shown, it times reproducibly at 30-40ms per frame, almost indistinguishable from the 185k atom case.
comment:38 by , 8 years ago
I looked over your code changes, and I think they are an interesting direction that will greatly improve performance. But they add complexity, very likely add bugs, and it is in code that Eric (C++) and I (Python) maintain, so we don't like more complexity. So we want the ideal world of maximum speed without too much complexity. So I have some questions.
1) Why aren't you using the current C++ change tracker which keeps track of every atom changed? Instead you add _changed attributes to atoms and bonds and lots of new methods and code to set those new attributes. It doesn't seem like any of that is needed and Eric and I (granted without going deeply into it) think the current C++ code can do what you are doing (updating graphics of only changed atoms) with no modifications to the C++.
2) Do we need to update only the changed atoms for the graphics to be fast? Your test case of ~200,000 atoms and moving just 10 atoms interactively with the mouse, say one side chain seems reasonable. Can we make that fast without the added complexity of updating only changed atoms -- in other words, pretend all the atoms moved and make it run as fast as it can, for instance by moving a few time critical numpy Python code segments to C++? Having to recompute all the bond cylinder 4x4 matrices for 200,000 atoms might be the bottleneck.
I think we should first do changes that make the code no more complex and see what is the best performance we get. You were on that track initially, just changing the Drawing to contain only the visible atoms instead of containing all atoms and a mask (Drawing.displayed_positions) of which are visible. I think that approach is no more complex than current code. I know you tried it and timed it, but I don't think you made every effort to retain that simple approach and make it run fast (by say doing adding a few routines in C++). I think we should start with that change. If after our best effort on that we want it faster, then we can pursue updating only changed atoms/bonds using the simplest code possible. Does this sound like a good plan?
We are soon to release alpha 2 (in 1 or 2 weeks) and I think we probably won't put these changes in until after that because any critical code changes demand weeks of testing to assure it is stable and that would delay our release.
follow-up: 37 comment:39 by , 8 years ago
The short answer is that keeping everything within (or even close to) a 10ms footprint to maintain 100fps is quite a task, when getting the coordinates of 100k atoms into Python already takes about 7.5ms. Looking at it this morning, I see I can do quite a bit more within the change tracker framework (and will try to revise to do so). Adding a _VISIBLE_ATOMS_CHANGE flag to the bit mask should be straightforward. But I still think there's a lot of value in having the atoms and bonds record directly when they've changed. Assuming for the time being the strategy is to limit geometry recalculation to the changed atoms, then I need the indices of the changed atoms in the array of visible atoms a. If each atom knows it's changed, then this is a trivial lookup: changed_mask = a._changed changed_atoms = a[changed_mask] - about 400 microseconds if we have 50k visible atoms On the other hand, the change manager just gives a list of all atoms (across all structures) that have changed. To get that mask, you'd have to do: changed_atoms = self.session.change_tracker.changes['Atom'].modified (about 330 microseconds) structure_changed_atoms = changed_atoms.filter(changed_atoms.structures == self) (variable - 50 us for 1000 changed atoms, 435 us for 50000) changed_indices = self.visible_atoms.indices(structure_changed_atoms) (prohibitive - already 24ms when 1000 out of 50k visible atoms have changed) ... and similar timing for bonds. So for a quite modest increase in complexity, you get a very fast way to limit all following calculations to only what's necessary. I think so, yes - unless just about *everything* is moved to C. It's not necessarily just the geometry calculations, it's the look-ups. If I have 3j3y loaded as Model m: # Best-case scenario: all atoms sequential in memory a = m.atoms[0:100000] coords = a.coords # 2.77 ms # More realistic scenario: atoms scattered throughout structure import numpy indices = numpy.random.randint(0, 2000000, 100000) indices.sort() a= m.atoms[indices] coords = a.coords # 7ms - most of the budget blown already, and there are three similarly-sized # coordinate lookups in _update_atom_graphics() and _update_bond_graphics() A more complete example, working through _update_bond_graphics: For 100k bonds b: ba1, ba2 = b.atoms # 1-4 ms, depending on arrangement in memory. Following timings are based # on a random selection of bonds. coords1 = ba1.coords # 7.4 ms coords2 = ba2.coords # 7.4 ms radii = b.radii # 1.1 ms positions = _halfbond_cylinder_placements(coords1, coords2, radii) # 4.9ms # ... so 20-25 ms total. On the other hand, using a random mask (50k true) to define "changed" bonds: ch_bonds = b[mask] # 550 us ... while the rest now takes 11 ms. Better, but still over budget. I guess the obvious thing here would be to define a new _halfbond_cylinder_placements(bonds, mask, result_array) which does everything in C, replacing result_array[i] when mask[i] is true - which wouldn't be too difficult, by the look of things, and would probably get a few-fold improvement. Then we get to ribbons: the obvious easy thing to do for now would be to define a num_ribbons_visible property of Structure analogous to num_atoms_visible, and have the ribbon code exit immediately if it returns 0. In most cases with structures of non-insane size it's not particularly problematic, but for 3j3y with no ribbons shown _update_ribbon_graphics() takes almost 100ms. On 2017-08-12 01:48, ChimeraX wrote:
by , 8 years ago
Attachment: | changes_take_3.diff added |
---|
Updated to make better use of the ChangeManager framework, further speed improvements.
comment:40 by , 8 years ago
OK, take 3:
- added two new flags to
GraphicsContainer::ChangeType
:
ATOM_VIS_CHANGE = (1 << 4),
BOND_VIS_CHANGE = (1 << 5)
and updated everything accordingly so that
Structure
strictly uses theChangeTracker
framework to check if atom or bond visibility has changed.
- renamed the
_changed
flag to_changed_since_last_drawn
(and similarly changed the names of all associated functions) in Atom.h and Bond.h to better reflect their intent. Removed it from higher-level objects (Residue and Structure). There is perhaps some argument to be made for adding a similar flag to Residue (_any_ribbon_atom_changed ?) to make it easy to limit ribbon redrawing to only where necessary, but I haven't touched that.
- added some more caching to
structure.py
to minimise unnecessary lookups per GUI update. The full list is now:self._cached_num_atoms_visible = self._num_atoms_visible self._cached_visible_atoms = self._visible_atoms self._cached_num_bonds_visible = self._num_bonds_visible self._cached_visible_bonds = self._visible_bonds self._cached_visible_bond_atoms = self.visible_bonds.atoms
... and the actual get functions look like:@property def num_atoms_visible(self): ''' Number of atoms currently visible in this structure. Cached for performance. ''' if self._graphics_changed & self._VISIBLE_ATOMS_CHANGE: self._cached_num_atoms_visible = self._num_atoms_visible return self._cached_num_atoms_visible
- Added a trivial function,
positions_changed()
todrawing.py
to let it know when its positions object has changed internally. Avoids using the rather strange-lookingp.positions = p.positions
.
- Added the line:
self.atoms.radii = self.atoms.default_radii
as per #789.
- Added the following to the head of
Structure._update_ribbon_graphics()
andStructure._update_ribbon_tethers
:if not self.ribbon_display_count: return
With all that done, it looks to me like Structure._update_graphics()
won't be the bottleneck in most real-world cases. Some timings (all in the 184k atom structure, stick display, no ribbons, simple lighting):
def time_update_graphics(m, indices): m.atoms[indices].coords = numpy.random.rand(len(indices),3)*50 m._update_graphics_if_needed() # 1000 atoms displayed, 10 atoms moving: %timeit time_update_graphics(m, indices) > 1.2 ms ± 83.3 µs per loop MoveTimer(session, 50, indices, m) > Updating coordinates for 10 atoms in a structure with 184550 total atoms took 0.0174 +/- 0.0041 seconds per frame # 10,000 atoms displayed, 1000 atoms moving: %timeit time_update_graphics(m, indices) > 1.89 ms ± 157 µs per loop MoveTimer(session, 50, indices, m) > Updating coordinates for 1000 atoms in a structure with 184550 total atoms took 0.0337 +/- 0.0057 seconds per frame # 100,000 atoms displayed, 100 atoms moving: %timeit time_update_graphics(m, indices) 3.78 ms ± 81.1 µs per loop MoveTimer(session, 50, indices, m) > Updating coordinates for 100 atoms in a structure with 184550 total atoms took 0.0982 +/- 0.0177 seconds per frame
About 10ms of that last case is the bounds calculation, so it's probably not the biggest bottleneck.
All existing functionality appears to be working as expected. This includes:
- mouse picking (ctrl-click and ctrl-drag) select and highlight the expected atoms.
- "promotion/demotion" of selections via the up- and down-arrows correctly update the highlighting
- switching between ball, stick and ball-and-stick render modes
- changing atom colours
- hiding/displaying any selection
follow-up: 39 comment:41 by , 8 years ago
I'm going to look at doing the optimization with no C++ Atom/Bond changes since the code already reports what atoms and there is already C++ code to find the indices of those atoms in another atom collection (the visible atoms).
The "p.positions = p.positions" obviously indicates a problem. The problem is that the Places() object is immutable, and you added methods set_shift_and_scale() and set_opengl_matrices() to change it. With that change any code that uses Places() will have to consider the possibility that they have changed. There is no change notification currently. I'm not excited about this change.
Setting the radius to the default radius changes the behavior in that editing the structure no longer updates the radius (which depends on bond connectivity).
Sorry to be slowing down your efforts! New cached attributes to optimize things are source of very hard to debug problems caused by cached values not being updated when they should be. Also new redundant APIs like atoms._changed start looking very mysterious a year later when we forget it was an optimization. My first goal in optimizing is to change as little as possible in terms of APIs and caching and complexity. Maybe your design is the simplest. But if I made it myself I wouldn't accept it as a reasonable speed / complexity tradeoff. But I see you need more speed so I'll put aside light microscopy work for now and run some tests.
comment:42 by , 8 years ago
Quick answer: it's actually atoms.display_radii that does the calculation you refer to (I believe this is what should actually be getting called in _atom_bounds(), and changed the code today accordingly. At present, atoms.radii just directly calls the C++ Atom::radius() function, which returns the stored value if nonzero or calculates and returns Atom::default_radius() otherwise. The reason it's currently so slow is that nothing ever sets a nonzero value so default_radius() gets called every time. Tristan Croll Research Fellow Cambridge Institute for Medical Research University of Cambridge CB2 0XY
comment:43 by , 8 years ago
Atom::radius() is behaving as designed, the radius attribute is supposed to be zero unless manually overridden, so that the atom radius reflects the current bonds made by the atom. More sophisticated code that only recomputes the radius when bond patterns or element type change is possible but has not been done yet. I think I used Atom::radius() in the bounding box calculation even though it isn’t quite right for a little better performance.
comment:44 by , 8 years ago
I tried some simple optimizations that sped up graphics update in stick mode with all atom display of 240,000 atoms (pdb 4v6x) from 120 msec to 55 msec per frame. This factor of 2 speed-up should apply whether all atoms are shown or just a few atoms. A lot of time was spent computing bond cylinder positions using combined numpy and C++. I put it all in C++ (about 30 lines) getting rid of the numpy temporary arrays of coordinates and radii. Also I added a few more reasons to the graphics change bit field (add/delete and display) so that graphics updates of Drawing arrays are not done unless needed.
It looks to me like the whole C++ graphics changes tracking (GraphicsContainer class) is obsolete. It was added before TrackChanges was implemented and I think TrackChanges Reasons can probably completely replace the graphics changes tracking simplifying the code. Needs a closer look.
I have not pushed these changes since tomorrow we plan to make an alpha2 release.
I still think the graphics code should be revised to make spheres/cylinders for only visible atoms and bonds instead of for all atoms and bonds with an additional display mask as currently done. This will be faster even in the case of all atoms and bonds shown I think because the OpenGL won't need to handle the mask.
Whether we need to attempt updating of just the individual changed atoms and bonds to get more speed should be evaluated after the simpler optimizations are in and tested.
follow-up: 42 comment:45 by , 8 years ago
With regard to the positions and changing the arrays in place: fair enough. If I instead do, for example: ssa = p.positions.shift_and_scale_array() ssa[ch_filter] = xyzr p.positions = Places(shift_and_scale=ssa) ... then the cost is quite minimal and those extra functions become obsolete. In terms of whether or not to limit redrawing to changed atoms at all, it's going to depend a lot on the precise situation. A fairly typical situation for me would have 1000 atoms mobile and 10,000 visible. With my current approach (of course, without your optimisation to bond drawing), redrawing just the mobile 1,000 vs all 10,000 works out to 19 vs 34ms per frame. Radii: I see. So you'd need a callback to update the radius when an atom changes its bond(s) or an ion changes its pseudobond(s). In terms of caching, I think it will make sense to cache at least the visible atoms and bonds, since fetching these becomes a serious time-sink for larger structures - and the cost seems to be mostly in the actual C++ loops rather than in Python. For the HIV capsid: %timeit a = m.atoms %timeit vis = a.visibles %timeit a[vis] # if no atoms visible # if all atoms visible # 35~45 ms total If I instead define the following in molc.cpp: extern "C" EXPORT void structure_visible_atoms(void *mols, size_t n, pyobject_t *atoms) { Structure **m = static_cast<Structure **>(mols); try { for (size_t i = 0; i < n; ++i) { const Structure::Atoms &a = m[i]->atoms(); for (size_t j = 0, size = a.size(); j < size; ++j) { if ( a[j]->visible() ) *atoms++ = a[j]; } } } catch (...) { molc_error(); } } ... and expose it to Python via molobject.py as _visible_atoms = c_property('structure_visible_atoms', cptr, 'num_atoms_visible', astype = _atoms, read_only = True) ''':class:`.Atoms` containing only the visible atoms of the structure. Read only.''' ... and make sure num_atoms_visible is cached to take its timing out of the equation, then: m.atoms.displays = 0 # with m.num_atoms_visible == 0, m._visible_atoms will return a # zero-length Atoms array so the Python overhead is minimal. %timeit m._visible_atoms That works out at just 10ns per atom, but it's still not fast enough - unless you want to take the next step and go parallel, for any really large structure just this one loop through the Atoms array will put a significant dent in any animation framerate if it has to happen every time something changes. And this is by no means the largest structure out there - when this HIV capsid work was being done the fitting simulation was run in a box of explicit solvent with a grand total of 64 million atoms, and I've heard tell of work with 100 million plus atom assemblies. Especially with the rapid development of high-resolution cryoelectron tomography experiments at this scale are going to become more commonplace, so I think it's worth considering strategies with that in mind. All that being said, I do understand where you're coming from with respect to cached values. But at the same time I think one would have to try quite hard to break this caching approach. With the code as written, the moment the change manager registers a change in visibility then all calls revert to using the full lookup functions until the relevant change bit is reset - which is only ever done as a final step in Structure._update_graphics_if_needed(). On 2017-08-14 19:02, ChimeraX wrote:
follow-up: 43 comment:46 by , 8 years ago
Argh! Re-sending the timings because the server helpfully edited out all my lines beginning with ">"... %timeit a = m.atoms 4.24 ms ± 396 µs per loop %timeit vis = a.visibles 31 ms ± 1.36 ms per loop %timeit a[vis] # if no atoms visible 174 µs ± 17.3 µs per loop # if all atoms visible 9.48 ms ± 1.93 ms per loop # 35~45 ms total m.atoms.displays = 0 # with m.num_atoms_visible == 0, m._visible_atoms will return a # zero-length Atoms array so the Python overhead is minimal. %timeit m._visible_atoms 24.5 ms ± 916 µs per loop That works out at just 10ns per atom, but it's still not fast enough - unless you want to take the next step and go parallel, for any really large structure just this one loop through the Atoms array will put a significant dent in any animation framerate if it has to happen every time something changes. And this is by no means the largest structure out there - when this HIV capsid work was being done the fitting simulation was run in a box of explicit solvent with a grand total of 64 million atoms, and I've heard tell of work with 100 million plus atom assemblies. Especially with the rapid development of high-resolution cryoelectron tomography experiments at this scale are going to become more commonplace, so I think it's worth considering strategies with that in mind. All that being said, I do understand where you're coming from with respect to cached values. But at the same time I think one would have to try quite hard to break this caching approach. With the code as written, the moment the change manager registers a change in visibility then all calls revert to using the full lookup functions until the relevant change bit is reset - which is only ever done as a final step in Structure._update_graphics_if_needed(). On 2017-08-15 13:40, Tristan Croll wrote:
comment:47 by , 8 years ago
A little more tinkering today - as much as a learning exercise as anything else. A C++-only atom bounds calculation. For what it's worth, the speed gain isn't enormous if the radii are pre-calculated - but it's not negligible either.
molc.cpp:
extern "C" EXPORT void atom_bounds(void *atoms, size_t n, float32_t *minmax) { Atom **a = static_cast<Atom **>(atoms); try { float32_t x, y, z, xmin, xmax, ymin, ymax, zmin, zmax, r; float32_t x_minus_r, y_minus_r, z_minus_r, x_plus_r, y_plus_r, z_plus_r; Coord c; Atom* this_a = a[0]; r = this_a->radius(); c = this_a->coord(); x = c[0]; y = c[1]; z = c[2]; xmin=x-r; ymin=y-r; zmin=z-r; xmax=x+r; ymax=y+r; zmax=z+r; for (size_t j=1; j<n ; ++j ) { this_a = a[j]; r = this_a->radius(); c = this_a->coord(); x=c[0]; y=c[1]; z=c[2]; x_minus_r = x-r; if (x_minus_r < xmin) { xmin = x_minus_r; } else { x_plus_r = x+r; if (x_plus_r > xmax) xmax = x_plus_r; } y_minus_r = y-r; if (y_minus_r < ymin) { ymin = y_minus_r; } else { y_plus_r = y+r; if (y_plus_r > ymax) ymax = y_plus_r; } z_minus_r = z-r; if (z_minus_r < zmin) { zmin = z_minus_r; } else { z_plus_r = z+r; if (z_plus_r > zmax) zmax = z_plus_r; } } minmax[0] = xmin; minmax[1] = ymin; minmax[2] = zmin; minmax[3] = xmax; minmax[4] = ymax; minmax[5] = zmax; } catch (...) { molc_error(); } }
molarray.py, in class Atoms:
@property def _bounds(self): '''Atom bounds in scene''' bounds = empty([2,3], float32) f = c_function('atom_bounds', args = [ctypes.c_void_p, ctypes.c_size_t, ctypes.c_void_p] ) f(self._c_pointers, len(self), pointer(bounds)) from ..geometry.bounds import Bounds return Bounds(*bounds)
Note: in the below tests Structure._atom_bounds uses the cached self.visible_atoms array (all atoms are visible).
def time_atom_bounds(model):
model._atom_bounds_needs_update = True
return model._atom_bounds()
For 184k atoms, no radii pre-set:
%timeit b = time_atom_bounds(m)
21.1 ms ± 32.8 µs per loop
%timeit b = m.visible_atoms._bounds
12.6 ms ± 161 µs per loop
#Setting the radii to constant values
m.atoms.radii = m.atoms.default_radii
%timeit b = time_atom_bounds(m)
11.2 ms ± 1.24 ms per loop
%timeit m.visible_atoms._bounds
9.77 ms ± 49.4 µs per loop
# And of course, making sure the result is the same:
bounds_a = m._atom_bounds()
bounds_b = m.visible_atoms._bounds
bounds_a.xyz_min - bounds_b.xyz_min
array([ 0., 0., 0.], dtype=float32)
bounds_a.xyz_max - bounds_b.xyz_max
array([ 0., 0., 0.], dtype=float32)
follow-up: 45 comment:48 by , 8 years ago
A factor of almost 2 is nice. This is the kind of optimization I like to make — simple and clear code in C++ that replaces slow numpy vector calcluations — makes the Python even more readable.
follow-up: 46 comment:49 by , 8 years ago
I have checked in optimized bond cylinder position calculation code, improved atom/bond graphics update detection to not update colors, selection, positions unless necessary, and I change the drawing code to only create spheres and cylinders for the visible atoms and bonds. These changes increased rendering speed from 4 frames/sec to 10 frames/sec when changing atom coordinates for 240,000 atoms shown as stick, simple lighting, PDB 4v6x.
My current timings for the above test case indicate about 50 msec/frame used to update the position arrays when moving atoms, and about 50 msec/frame from other sources, like updating the OpenGL buffers that happen during the draw() call.
More could be done to optimize rendering speed while moving atoms. But I spent about a full day and a half on this so far which I think is probably more than it merits given the many other priorities (e.g. getting funding). If you would like to go further, please look first towards optimizations that do not increase the complexity of the code.
follow-up: 48 comment:51 by , 8 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
As described in the previous 46 comments, some changes allowed faster rendering when atoms are moved. If further optimizations are needed the first efforts should try to optimize without adding additional logic complexity (e.g. no extra cached state). More can be done within these bounds and that would be the first approach before more complex solutions are tried.