Opened 8 years ago

Closed 8 years ago

#778 closed defect (fixed)

changing coordinates very slow for large structures

Reported by: Tristan Croll Owned by: Tom Goddard
Priority: major Milestone:
Component: Graphics Version:
Keywords: Cc: Tristan Croll, pett@…
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

It would seem that in large structures (i.e. over 100k atoms), moving any atom(s) triggers something rather expensive. It's making ISOLDE almost unusably slow in large constructs, even when the fraction I'm simulating is tiny. As a simple demo:

import numpy
from time import time
class MoveTimer:
    
    def __init__(self, session, cycles, indices, model):
        self.start_time = time()
        self.counter = 0
        self.max_cycles = cycles
        self.session = session
        self.atoms = model.atoms[indices]
        self.model = model
        self.length = len(self.atoms)
        self.times = numpy.empty(cycles)
        self.handler = session.triggers.add_handler('new frame', self.time_change_coords)
    def time_change_coords(self,*_):
        if self.counter >= self.max_cycles:
            print('Updating coordinates for {} atoms in a structure with {} total atoms took {:0.4f} +/- {:0.4f} seconds per frame'
                    .format(self.length, len(self.model.atoms), self.times.mean(), self.times.std()))
            self.session.triggers.remove_handler(self.handler)
        else:
            self.times[self.counter] = time() - self.start_time
            self.start_time = time()
            targets = numpy.random.rand(self.length, 3)*50
            self.atoms.coords = targets
            self.counter += 1


indices = numpy.array([i for i in range(100,110)])
m0 = {model with 184,550 atoms, displayed in stick representation with simple lighting}
m1 = {model with 5133 atoms, identical representation}
MoveTimer(session, 50, indices, m0)
> Updating coordinates for 10 atoms in a structure with 184550 total atoms took 0.2716 +/- 0.0444 seconds per frame

MoveTimer(session, 50, indices, m1)
> Updating coordinates for 10 atoms in a structure with 5133 total atoms took 0.0356 +/- 0.0095 seconds per frame

indices = numpy.array([10], numpy.int)
MoveTimer(session, 50, indices, m0)
Updating coordinates for 1 atoms in a structure with 184550 total atoms took 0.2829 +/- 0.0444 seconds

MoveTimer(session, 50, indices, m1)
Updating coordinates for 1 atoms in a structure with 5133 total atoms took 0.0343 +/- 0.0084 seconds

If I hide everything but the single mobile atom:

MoveTimer(session, 50, indices, m0)
> Updating coordinates for 1 atoms in a structure with 184550 total atoms took 0.1864 +/- 0.0460 seconds per frame

MoveTimer(session, 50, indices, m0)
> Updating coordinates for 1 atoms in a structure with 5133 total atoms took 0.0310 +/- 0.0046 seconds per frame

In fact, even if all atoms are hidden:

Updating coordinates for 1 atoms in a structure with 184550 total atoms took 0.1346 +/- 0.0383 seconds per frame
Updating coordinates for 1 atoms in a structure with 5133 total atoms took 0.0283 +/- 0.0074 seconds per frame

Attachments (3)

changes.diff (39.5 KB ) - added by Tristan Croll 8 years ago.
Optimisations to atom redrawing
changes_take_2.diff (42.6 KB ) - added by Tristan Croll 8 years ago.
Corrected a few bugs in the previous diff
changes_take_3.diff (45.9 KB ) - added by Tristan Croll 8 years ago.
Updated to make better use of the ChangeManager framework, further speed improvements.

Download all attachments as: .zip

Change History (54)

in reply to:  1 ; comment:1 by tic20@…, 8 years ago

OK, I can see why this is happening: in the current implementation of 
structure.py, changes are only detected at the structure level, and 
trigger a redraw of the whole structure even if the change is only in a 
single atom. The change_handler does keep track of which atoms have 
changed, but doesn't group them by structure - so finding the subset of 
atoms that have changed within the structure is going to be slow in 
Python. I think the elegant thing to do would be to have each molobject 
have a boolean property "_changed" and pass this up through the 
corresponding molarray. Then within structure.py, you could do:

atoms = self.atoms
changed_atoms = atoms._changed
changed_bonds = bonds._changed
changed_residues = residues._changed

... and limit the geometry recalculation to only these, then set the 
flags back to false when done. It's about time I started learning to 
find my way around in the ChimeraX core, so I'd like to take a stab at 
this if that's OK.

On 2017-08-04 11:58, ChimeraX wrote:

in reply to:  2 ; comment:2 by pett, 8 years ago

Are ribbons being shown?  Ribbon recalculations are very slow.

—Eric


in reply to:  3 ; comment:3 by goddard@…, 8 years ago

Eric implemented the structure change tracking.  Assign the ticket to him and put me on the cc list.  Of course we would love if you isolate the problem and fix it!  We are about to make a second alpha release though (next week) so don't push changes to the core without discussing first.

in reply to:  4 ; comment:4 by tic20@…, 8 years ago

Nope. Just bonds - but because any change triggers a complete 
recalculation of atom spheres and bond geometry regardless of their 
display state or whether they've moved, it starts to really crawl for 
big structures. I'm working on a fix, as long as you're ok with it... my 
first real foray into your C++ API - nice! The basic idea is:

For each object which tracks its changes in ChangeTracker, add a boolean 
_changed private flag, and changed()/set_changed(bool flag) functions.

Then, when an object calls:
structure()->change_tracker()->add_modified(this, 
ChangeTracker::REASON_ALT_LOC);

ChangeTracker::add_modified in turn writes back to the object:
ptr->set_changed(true)

... and then once changed() and set_changed() are exposed to python 
through molc.py, it should be trivial to limit structure.py to only 
redraw the parts that have actually changed, and then reset the flags 
for the next graphics iteration.

Any thoughts?


On 2017-08-04 17:07, Eric Pettersen wrote:

comment:5 by Tristan Croll, 8 years ago

Cc: Tristan Croll added
Owner: changed from Tom Goddard to eric

I don't seem to have the ability to cc in Tom - the form only gives me the ability to add myself to the cc list. But reassigning to Eric as requested.

in reply to:  6 ; comment:6 by pett, 8 years ago

Couldn't you just get the atoms that changed for this structure (in Python) with:

ct = session.change_tracker
changes = Changes(ct.changes)
changed_atoms = changes.modified_atoms()
structure_atoms = changed_atoms.filter(changed_atoms.structures == struct)

—Eric


in reply to:  7 ; comment:7 by goddard@…, 8 years ago

I don't think it will take a tenth of a second to update the graphics for 100000 atoms and bonds. You need to know more precisely where the time is being used.  It would be complex to update just the changed atoms and bonds in the graphics arrays.  Did ChimeraX from 3 months ago have this problem?

in reply to:  8 ; comment:8 by tic20@…, 8 years ago

Hmm... I suppose that *would* be much easier. Sometimes the obvious answer is staring you right in the face. :)

 
 
Tristan Croll
Research Fellow
Cambridge Institute for Medical Research
University of Cambridge CB2 0XY
 

 

in reply to:  9 ; comment:9 by tic20@…, 8 years ago

That I couldn't tell you, to be quite honest. I've only started playing with larger structures in ISOLDE over the past month or two. Will try to dig into it a bit more.

 
 
Tristan Croll
Research Fellow
Cambridge Institute for Medical Research
University of Cambridge CB2 0XY
 

 

in reply to:  10 ; comment:10 by tic20@…, 8 years ago

Actually, thinking about it a little more there's still a good reason to do it my way. To update only those entities that have changed you need their indices - and changed_atoms.indices(atoms) takes about 10ms for 200k atoms all by itself. Judging by the performance of other molarray functions, a hypothetical call to atoms[changed] should be an order of magnitude faster.

 
 
Tristan Croll
Research Fellow
Cambridge Institute for Medical Research
University of Cambridge CB2 0XY
 

 

comment:11 by Tom Goddard, 8 years ago

Cc: pett@… added
Component: UnassignedGraphics
Owner: changed from eric to Tom Goddard

For a speed optimization problem you have to first find where the time is going. It doesn't make sense to propose a solution until you know that. Your timing code appears to time the entire delay between two redraws (including graphics vertical sync probably at 60 frames / second), and who knows what all code is listening to the atom changes, most of the time may not even be in the graphics.
If you want to reduce 100 ms to 10 ms but there are 5 parts of the code each taking 2 ms it could be a huge job to improve most of those to reduce the time.

So here is one way to get a better idea where the time is going. First guess. Maybe it is computing and updating the bond positions (this produces a new 4x4 matrix for all bonds when any atom moves). So go into structure.py and put a "return" in to prevent the bond positions from updating. Run your timing test and see how much faster it is. In general, disable optional code (like updating bond positions) and see how the time changers and try to find where most of the time is going.

Another important thing to understand is that you can't just move a few bonds in the array of 100,000 that is being handed to OpenGL. You have to hand it the whole new array (100000 * 4 * 4 * 4 - 6.4 Mbytes per draw) and if that is to run at say 100 times per second that is 640 Mbytes / second of bandwidth to the graphics card just for the bonds. And on the CPU we are also recomputing a whole new array of bonds, probably another 640 Mbytes / sec of bandwidth. Very complex optimization could improve this but you'd have to understand in great detail where the performance bottlenecks are. Our code has not be optimized much and there is probably lots of low hanging fruit to speed it up that does not involve trying to limit updates to subsets of atoms and bonds.

Will be interested to hear what you find.

in reply to:  12 ; comment:12 by tic20@…, 8 years ago

Well, once the structure gets *really* large then the delays certainly 
add up elsewhere, but in a "only fairly large" structure it seems to be 
mostly in bond and atom recalculation.

Working with a particularly large structure (HIV capsid, 3j3y) to really 
bring it out:

MoveTimer(session, 10, indices, m)


... but after editing structure.py to time each key step in 
recalculation (_create_ribbon_graphics(), _update_ribbon_tethers(), 
_update_atom_graphics(), _update_bond_graphics(), the 
pbg._update_graphics() loop, _update_ribbon_graphics()):

Ribbon creation took 2.6941299438476562e-05
Ribbon tether update took 4.982948303222656e-05
Update atom graphics took 0.36548638343811035
Update bond graphics took 0.6054508686065674
Update pseudobondgroups took 6.556510925292969e-05
Update ribbon took 0.02550506591796875

... which adds up to roughly a second.


For a less extreme structure (4v8r):

Updating coordinates for 10 atoms in a structure with 128780 total atoms 
took 0.1138 +/- 0.0354 seconds per frame

Ribbon creation took 2.9087066650390625e-05
Ribbon tether update took 5.1975250244140625e-05
Update atom graphics took 0.02771759033203125
Update bond graphics took 0.036278486251831055
Update pseudobondgroups took 0.0020813941955566406
Update ribbon took 0.0026438236236572266

... making roughly half of the total time, most of which could be made 
to go away in the sort of case I'm talking about where <<5% of all atoms 
change.

Still, perhaps it will be best if I bite the bullet and take the 
approach that VMD took for interactive MD: split the simulated portion 
off into its own temporary molecule and hide the equivalent atoms in the 
original. Would take a fair bit of work, but looks like it'll be worth 
it in the end.


On 2017-08-04 18:22, ChimeraX wrote:

in reply to:  13 ; comment:13 by goddard@…, 8 years ago

Possibly those update atom graphics and update bond graphics times can be improved directly.  The next step would be to go into those routines and see which particular lines are taking the time.

For the 4v8r case you say the 0.06 seconds for updating atom/bond graphics is taking roughly half the time.  Still would need to know where the other half is, since even if you brought the graphics update time to zero you would only get a speedup of a factor of 2 which and I’m sure you want more.

I agree that it will probably be more successful to use a separate partial molecule if you really want the best speed for the least coding effort.

in reply to:  14 ; comment:14 by tic20@…, 8 years ago

The following simple modifications in structure.py (to limit atom and 
bond redrawing to only visible atoms) goes a long way towards improving 
things (in ISOLDE, I hide everything that isn't in the immediate 
vicinity of the mobile selection as a matter of course):

     def _update_atom_graphics(self, changes = 
StructureData._ALL_CHANGE):
         atoms = self.atoms  # optimzation, avoid making new numpy array 
from C++
         avis = atoms.visibles
         vis_atoms = atoms[avis]
         p = self._atoms_drawing
         if p is None:
             if avis.sum() == 0:
                 return
             self._atoms_drawing = p = self.new_drawing('atoms')
             self._atoms_drawing.custom_x3d = self._custom_atom_x3d
             # Update level of detail of spheres
             self._level_of_detail.set_atom_sphere_geometry(p)

         #~ if changes & self._SHAPE_CHANGE:
             #~ # Set instanced sphere center position and radius
             #~ n = len(atoms)
             #~ from numpy import empty, float32, multiply
             #~ xyzr = empty((n, 4), float32)
             #~ xyzr[:, :3] = atoms.coords
             #~ xyzr[:, 3] = self._atom_display_radii(atoms)

         if changes & (self._COLOR_CHANGE | self._SHAPE_CHANGE):
             # Set atom colors
             #~ p.colors = atoms.colors
             p.colors = vis_atoms.colors

         if changes & self._SHAPE_CHANGE:
             # Set instanced sphere center position and radius
             n = len(vis_atoms)
             from numpy import empty, float32, multiply
             xyzr = empty((n, 4), float32)
             xyzr[:, :3] = vis_atoms.coords
             xyzr[:, 3] = self._atom_display_radii(vis_atoms)

             from ..geometry import Places
             p.positions = Places(shift_and_scale=xyzr)
             #~ p.display_positions = avis

     def _update_bond_graphics(self, changes = 
StructureData._ALL_CHANGE):
         bonds = self.bonds  # optimzation, avoid making new numpy array 
from C++
         bvis = bonds.showns
         vis_bonds = bonds[bvis]
         p = self._bonds_drawing
         if p is None:
             if sum(bvis) == 0:
                 return
             self._bonds_drawing = p = self.new_drawing('bonds')
             self._bonds_drawing.custom_x3d = self._custom_bond_x3d
             # Update level of detail of cylinders
             self._level_of_detail.set_bond_cylinder_geometry(p)

         #~ if changes & (self._SHAPE_CHANGE | self._SELECT_CHANGE):
             #~ bond_atoms = bonds.atoms
         #~ if changes & self._SHAPE_CHANGE:
             #~ ba1, ba2 = bond_atoms
             #~ p.positions = _halfbond_cylinder_placements(ba1.coords, 
ba2.coords, bonds.radii)
             #~ p.display_positions = _shown_bond_cylinders(bonds)
         #~ if changes & (self._COLOR_CHANGE | self._SHAPE_CHANGE):
             #~ p.colors = c = bonds.half_colors
         #~ if changes & (self._SELECT_CHANGE | self._SHAPE_CHANGE):
             #~ p.selected_positions = 
_selected_bond_cylinders(bond_atoms)


         if changes & (self._SHAPE_CHANGE | self._SELECT_CHANGE):
             bond_atoms = vis_bonds.atoms
         if changes & (self._COLOR_CHANGE | self._SHAPE_CHANGE):
             #~ p.colors = c = bonds.half_colors
             p.colors = c = vis_bonds.half_colors
         if changes & self._SHAPE_CHANGE:
             ba1, ba2 = bond_atoms
             p.positions = _halfbond_cylinder_placements(ba1.coords, 
ba2.coords, vis_bonds.radii)
             p.display_positions = _shown_bond_cylinders(vis_bonds)
         if changes & (self._SELECT_CHANGE | self._SHAPE_CHANGE):
             p.selected_positions = _selected_bond_cylinders(bond_atoms)


Certainly not a magic bullet, but from the timings below it would be 
enough for me to defer the problem for a while - seems to stabilise to a 
roughly 4-fold increase in frame rate for large structures. For the 184k 
atom case, in real-world ISOLDE simulations it takes me from a 
more-or-less unusable 1-1.5 fps to a still-slow-but-usable 4-6 fps. 
Obviously I'll still need to take further steps down the track, but it's 
a start. As far as I can tell, everything else still works fine with the 
changes.

Before:
Updating coordinates for 10 atoms in a structure with 184550 total atoms 
took 0.2632 +/- 0.0385 seconds per frame

Changed 2440800 atom styles
Updating coordinates for 10 atoms in a structure with 2440800 total 
atoms took 2.5118 +/- 1.2431 seconds per frame

Only first 100 residues shown:
Updating coordinates for 10 atoms in a structure with 184550 total atoms 
took 0.1759 +/- 0.0436 seconds per frame

Updating coordinates for 10 atoms in a structure with 2440800 total 
atoms took 1.9635 +/- 0.9822 seconds per frame






After:
All atoms shown:
Updating coordinates for 10 atoms in a structure with 26740 total atoms 
took 0.0515 +/- 0.0051 seconds per frame

All atoms shown:
Updating coordinates for 10 atoms in a structure with 184550 total atoms 
took 0.2598 +/- 0.0436 seconds per frame

Updating coordinates for 10 atoms in a structure with 2440800 total 
atoms took 2.7552 +/- 1.4352 seconds per frame

Only first 100 residues shown:
Updating coordinates for 10 atoms in a structure with 26740 total atoms 
took 0.0320 +/- 0.0022 seconds per frame

Updating coordinates for 10 atoms in a structure with 184550 total atoms 
took 0.0631 +/- 0.0089 seconds per frame

Updating coordinates for 10 atoms in a structure with 2440800 total 
atoms took 0.7370 +/- 0.1270 seconds per frame


On 2017-08-04 19:08, ChimeraX wrote:

in reply to:  15 ; comment:15 by Conrad Huang, 8 years ago

Wouldn't this change make the sequence:

   1. set color of all atoms, both displayed and undisplayed
   2. display all atoms

not update the displayed color of atoms?  That is, the atom colors will 
be set, but the drawing colors will not?

Conrad

On 8/8/2017 4:50 AM, Tristan Croll wrote:

in reply to:  16 ; comment:16 by tic20@…, 8 years ago

... and this change to Structure._atom_bounds saves 20-25ms (from 
25-30ms down to 3ms) when I have only 100 residues (1630 atoms) of my 
184,550-atom structure displayed:

     def _atom_bounds(self):
         if not self._atom_bounds_needs_update:
             return self._cached_atom_bounds
         a = self.atoms
         a = a[a.displays]
         #disp = a.displays
         xyz = a.coords #[disp]
         radii = a.radii #[disp]
         from .. import geometry
         b = geometry.sphere_bounds(xyz, radii)
         self._cached_atom_bounds = b
         self._atom_bounds_needs_update = False
         return b

That gets me to a framerate of about 8fps (125ms per frame) for a 
simulation where ISOLDE takes about 75ms per iteration (from which I can 
probably fairly easily trim 20-30 ms). Good enough for now.


On 2017-08-08 12:51, ChimeraX wrote:

in reply to:  17 ; comment:17 by tic20@…, 8 years ago

Just tried that:

1. with all atoms displayed, set atoms to colour by element.
2. limit the display to a subset
3. set atoms to colour by chain
4. show all atoms

... and all atoms are coloured by chain, as expected.

On 2017-08-08 18:16, Conrad Huang wrote:

comment:18 by Tom Goddard, 8 years ago

The case Conrad asks about of coloring undisplayed atoms and later showing those atoms will work with Tristan's code because it updates the drawing sphere colors on SHAPE_CHANGED and that happens when any atom is shown or hidden.

My initial take is that Tristan's changes should not cause any problem, they simply put into the Drawing only the visible atom spheres instead of the current code that includes all atom spheres and simply hides the undisplayed ones. I don't remember why I didn't do it originally as Tristan suggests -- maybe I thought it reduced deleting and creating new OpenGL vertex buffers, but I don't even think that is the case. I'll analyze the code changes more carefully and commit them if it all looks good. My one reservation in my quick look is that it slightly slows down the case of all atoms being displayed, which is a very common case, while it speeds up the case of moving atoms when only a small fraction are visible, a rare case -- basically just ISOLDE. I know Tristan does not see ISOLDE as a rare case, but I mean 95% of the people running at any given instant are probably not using ISOLDE.

in reply to:  19 ; comment:19 by tic20@…, 8 years ago

Is it actually that common to have *every* atom displayed, though? In my experience that's really quite rare - one is usually hiding as many atoms as possible to make it easier to focus on and interpret a region of interest.



Tristan Croll
Research Fellow
Cambridge Institute for Medical Research
University of Cambridge CB2 0XY




in reply to:  20 ; comment:20 by goddard@…, 8 years ago

I think the most common situation has ribbons shown so backbone atoms and maybe many side chains are hidden.  So I agree probably having most atoms hidden is typical.

comment:21 by Tom Goddard, 8 years ago

The atom and bond graphics update improvements look reasonable. But there are changes in other places that would be required (picking and x3d) that involve about 3 times as many code changes. Look for all uses of _atoms_drawing and _bonds_drawing to find the places where more changes would be needed.

If you want these changes made I'd like a complete set of code changes from you (or you can wait until it becomes my highest priority, a long wait probably). I'd suggest providing those changes by giving me a new structure.py files (and pbgroup.py will also need to follow bond changes since they share code with bonds). It is quite difficult for me to examine your changes made by commenting out code. With a whole new file I can easily see the diffs to the current git version. The typical git way to propose these changes would be via a pull request, but we have not used that yet for ChimeraX, so that approach to providing your changes would likely take more work.

comment:22 by Tom Goddard, 8 years ago

I have included your atom bounds improvement. Surprisingly the numpy masking operations coords[disp] and radii[disp] take much longer than forming atoms[disp] which is a masking operation with our Atoms collection class. It never ceases to horrify me how slow basic numpy operations on arrays are.

comment:23 by Tom Goddard, 8 years ago

My previous comment mis-analyzed the improvement in atom bounds speed. Our C++ atoms.coords and atoms.radii methods take more time than slicing the numpy arrays. For 3j3q looking up atom coords too 65 msec and slicing took 60 msec, looking up atom radii took 85 msec, and slicing took 2 msec, total bounds time is 270 msec with original code. Radii lookup is horribly slow because it is recomputing the default radius from elements and number of bonds every time -- ugh. Coordinate slicing I guess is slow because it is a 2D array while the radii slicing is fast as a 1D array. Coordinate lookup handles alt locs and active coordinate set lookup, but I haven't yet determined why it is so slow.

comment:24 by Tom Goddard, 8 years ago

Made a new ticket to speed up atom radius lookup #789, low priority.

comment:25 by Tristan Croll, 8 years ago

I did some more playing around today with my original idea, which maintains the complete Atom and Bond drawings at all times but only updates the parts that are both visible and have changed since the last time they were drawn.

Short description: in the C++ core Atom and Bond objects I added the following to the headers:

private:
    bool _changed = false;
public:
    bool changed() const { return _changed; }
    void set_changed(bool flag) { _changed = flag; }

    void track_change(const std::string& reason) {
        change_tracker()->add_modified(this, reason);
        this->set_changed(true);
    }

In the Atom case I also added a bonds_track_change function:

void 
Atom::bonds_track_change(const std::string& reason) {
    for(std::vector<Bond*>::iterator b = _bonds.begin(); b != _bonds.end(); ++b) {
        (*b)->track_change(reason);
    }
}

... so that bonds can be notified when their atoms change in a way that affects their drawing (color, coord, display). After passing the changed() and set_changed() functions up to Python, then in structure._update_graphics() one can do:

import numpy
atoms = self.atoms
avis = atoms.visibles
achanged = atoms._changed
vis_atom_changes = numpy.logical_and(avis, achanged)
ch_atoms = atoms[vis_atom_changes]

bonds = self.bonds
bvis = bonds.showns
bchanged = bonds._changed
vis_bond_changes = numpy.logical_and(bvis, bchanged)
ch_bonds = bonds[vis_bond_changes]
half_bond_change_filter = numpy.concatenate((vis_bond_changes, vis_bond_changes))

... and change _update_atom_graphics() and _update_bond_graphics() to work only on the results. To handle the cases where the model itself changes (i.e. atoms are added or deleted), I've included a force_recalc flag which causes a redraw of everything from scratch.

It's required quite an extensive list of changes to allow the relevant arrays to be changed in place, but the results are quite pleasing. The resulting drawing objects are all identical as far as the rest of ChimeraX is concerned, and the timings are more-or-less on a par with the other approach - significantly faster when all atoms are shown, slightly slower when the display is restricted to smaller numbers of atoms.

For 3j3q, all atoms shown:
Updating coordinates for 10 atoms in a structure with 2440800 total atoms took 1.5500 +/- 0.7743 seconds per frame

First 100 residues shown:
Updating coordinates for 10 atoms in a structure with 2440800 total atoms took 0.4485 +/- 0.2115 seconds per frame

For my intermediate-sized example, all atoms shown:
Updating coordinates for 10 atoms in a structure with 184550 total atoms took 0.1233 +/- 0.0319 seconds per frame

First 100 residues shown:
Updating coordinates for 10 atoms in a structure with 184550 total atoms took 0.0605 +/- 0.0109 seconds per frame

It just struck me that a really easy way to get blazingly fast speed when necessary would be to add a function, say, Structure.restrict_drawing(atoms), such that one could do:

    def _update_graphics(self, ...):
        if self.restrict_to_subset:
            atoms = self._restrict_atoms
            bonds = self._restrict_bonds
        else:
            atoms = self.atoms
            bonds = self.bonds

... and nothing much else would need to change.

Will do a bit more double-checking and try to bundle it all together and send tomorrow.

comment:26 by Tom Goddard, 8 years ago

To summarize your optimization ideas presented in previous comments:

1) Have the atom Drawing only contain spheres for the visible atoms, and the bond drawing only cylinders for visible bonds.
2) Same as 1, plus when updating only change coordinates and colors for atoms/bonds that have actually changed.

Optimization 1) can improve speed a lot when only a small fraction of atoms are shown. Optimization 2) can further improve speed when only a small fraction of visible atoms are changed.

For your extreme 3j3q test case (2.4 million atoms) the times for optimization method (2) of 1.55 seconds per frame to change 10 atom coordinates when all atoms shown and 0.45 seconds per frame when 100 residues shown seems abysmal. I wonder why it is taking so long. Is it using ambient lighting and all that time is in the lighting recalculation?

in reply to:  27 ; comment:27 by tic20@…, 8 years ago

Not quite.

Option (2) keeps the entire drawing geometry in memory as per the status 
quo, but limits recalculation to only atoms/bonds that are both changed 
and visible, modifying the existing Places and colors arrays by Numpy 
slicing. A fair bit of the time cost just goes into the initial setup. 
On my laptop (admittedly a little faster than the desktop I was using 
for the previous timings), for 3j3y:

%timeit a = m.atoms
2 ms ± 11.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit a.visibles
37 ms ± 2.78 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

... and about the same for a._changed

%timeit b = m.bonds
2.07 ms ± 26.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit b.showns
74.8 ms ± 2.97 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

... not sure if this would be the same for b._changed, or if it's closer 
to the atoms case.

But that's already 190-230 ms gone before even starting on the main 
calculations. It would take quite a lot of re-jigging of the core 
machinery (having the Atoms, Bonds etc. write these flags to a common 
array?) to optimise past that.

On 2017-08-09 18:40, ChimeraX wrote:

in reply to:  28 ; comment:28 by goddard@…, 8 years ago

Ok.  But the option 2 I described cuts down the number of atoms to the visible ones so a.visibles and b.showns that take 112 msec for 3j3q all atoms would take less time.  (I don’t understand why you said 190-230 msec in your previous comment).  But the 75 msec for b.showns seems ridiculously slow, although I did observe a.coords taking 66 msec when I was testing yesterday which also seemed very slow, but timing subparts did not reveal and clear way to make the C++ faster.

A standard approach in ChimeraX to optimize is replace the vector numpy operations with C++ code that gets rid of as many temporary numpy arrays (like b.showns, a.visibles) as possible.  This is not difficult if the code is amenable and many examples are in src/atomic/molc.cpp which is the code that provides Python interfaces to the molecule C++ using ctypes.

Thanks for investigating the optimiization possibilities.  I liked your idea 1) to have the Drawing just represent the visible objects — very sensible.  That combined with some C++ to optimize the slow update stuff could make it very fast.

in reply to:  29 ; comment:29 by tic20@…, 8 years ago

There's also the new a._changed and b._changed, with timeframes about the same as a.visibles. I think these make sense - organization becomes so much easier when the objects themselves know when they need updating. With a little tweaking the ._changed call could be run on only the visible atoms (or vice versa) - but you're right, further optimization is probably better at the lower level. Will play some more - I'm enjoying learning about how the innards work!

 
 
Tristan Croll
Research Fellow
Cambridge Institute for Medical Research
University of Cambridge CB2 0XY
 

 

in reply to:  30 ; comment:30 by tic20@…, 8 years ago

Slightly off-topic, but still on the subject of large structures: when I 
have 3j3y loaded, then:

a = m.atoms
a.disp<tab>

... hangs for almost a minute before autocompleting.

On 2017-08-09 20:08, ChimeraX wrote:

in reply to:  31 ; comment:31 by goddard@…, 8 years ago

The shell tab completion evaluates every single matching the property — I guess it wants to know the values for some reason.  This is reckless since some properties may require lots of computation.  I consider this a defect of the shell.  It should not be evaluating properties in order to autocomplete them by name.  Feel free to investigate whether the shell (qtconsole.rich_jupyter_widget) has options to disable evaluating all properties during autocomplete.

comment:32 by Tristan Croll, 8 years ago

OK, serious progress now.

What I've done:

  • Added functions to get/set flags called visible_atoms_changed and visible_bonds_changed to Structure.cpp, and made every Atom and Bond function that can affect these set the relevant flag to True. They're reset back to False by each run of structure._update_graphics_if_needed().
  • Added structure_visible_bonds() and structure_visible_atoms() functions that get arrays of visible bonds and atoms directly without having to filter down from the list of all atoms. These take 1.3 and 5.1ms respectively for my 185k atom case - to speed things up further structure.py now caches these, using the visible_xxx_changed flag to decide if it needs to update the cache.
  • structure.py draws only visible atoms and bonds. If the number of either changes, the relevant drawing is redrawn from scratch, otherwise the existing drawing is updated from only the changed atoms.
  • other places within structure.py where only visible atoms are relevant (e.g. _atom_bounds()) have been changed to use the fast/cached visible_atoms approach.

The upshot:

185k atoms, all atoms shown, sticks, simple lighting:

MoveTimer(session, 50, indices, m)
> Updating coordinates for 10 atoms in a structure with 184550 total atoms took 0.1206 +/- 0.0243 seconds per frame

%timeit m._update_graphics()
> 11 ms ± 119 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

185k atoms, first 100 residues shown, sticks, simple lighting:

MoveTimer(session, 50, indices, m)
> Updating coordinates for 10 atoms in a structure with 184550 total atoms took 0.0285 +/- 0.0108 seconds per frame

%timeit m._update_graphics()
> 1.58 ms ± 140 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

2.44M atoms, all atoms shown, sticks, simple lighting:

MoveTimer(session, 50, indices, m)
> Updating coordinates for 10 atoms in a structure with 2440800 total atoms took 1.4435 +/- 0.2139 seconds per frame
%timeit m._update_graphics()
103 ms ± 3.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

... of which it should be possible to make about half go away with some tweaking of _update_ribbon_graphics to exit early when no ribbons are displayed:
%timeit m._update_ribbon_graphics()
> 43.7 ms ± 7.67 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

2.44M atoms, sticks, first 100 residues shown, sticks, simple lighting:

MoveTimer(session, 50, indices, m)
> Updating coordinates for 10 atoms in a structure with 2440800 total atoms took 0.0694 +/- 0.0175 seconds per frame

%timeit m._update_graphics()
> 6.1 ms ± 263 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

... of which about 5ms is _update_ribbon_graphics()

So there's still a little bit of overhead somewhere that's being triggered by coordinate updates but doesn't appear to be part of the graphics loop (70ms for the 2.44M atom structure vs 28.5ms for the 185k with comparable numbers of atoms displayed), but it will probably be relatively hard to track down (the timing is consistent with 1-2 calls to structure.atoms.<property>). But then, it's getting pretty academic at this point - once your structure gets to this sort of size, you'd be crazy *not* to break it down into manageable chunks unless you absolutely have to work with the whole thing at once. I'll attach the diff in a follow-up.

by Tristan Croll, 8 years ago

Attachment: changes.diff added

Optimisations to atom redrawing

comment:33 by Tristan Croll, 8 years ago

... oh, and as far as I can tell picking still seems to work just fine, without me having done much of anything to it. Rather amazing, considering.

comment:34 by Tristan Croll, 8 years ago

Spoke too soon. Selection highlighting *isn't* working properly. Could have sworn it was...

by Tristan Croll, 8 years ago

Attachment: changes_take_2.diff added

Corrected a few bugs in the previous diff

comment:35 by Tristan Croll, 8 years ago

OK, that should be better. For the picking, all I had to do was replace self.atoms with self.visible_atoms and self.bonds with self.visible_bonds, and all now looks normal. Also squashed another bug - adding the rather interesting line:

p.positions = p.positions

... so the atoms and bonds drawings know that their positions arrays have changed. The bug was unmasked when I changed the colour and selection handling code to run only under the control of self._COLOR_CHANGE and self._SELECT_CHANGE respectively (previously they would also run on self._SHAPE_CHANGE, but I believe that's now unnecessary).

One thing that will need to be checked (should be easy to fix if it doesn't already work) is what happens if atoms are added or removed to an already-displayed structure. As long as the the change sets _visible_atoms_changed and _visible_bonds_changed (where applicable) to True, it should just work.

comment:36 by Tristan Croll, 8 years ago

Hmm... Not sure why it should have changed overnight, but this morning when I run the 2.44M atom test with only the first 100 residues shown, it times reproducibly at 30-40ms per frame, almost indistinguishable from the 185k atom case.

in reply to:  39 comment:37 by goddard@…, 8 years ago

I hope to try your patch today, was too busy yesterday to look into it.

comment:38 by Tom Goddard, 8 years ago

I looked over your code changes, and I think they are an interesting direction that will greatly improve performance. But they add complexity, very likely add bugs, and it is in code that Eric (C++) and I (Python) maintain, so we don't like more complexity. So we want the ideal world of maximum speed without too much complexity. So I have some questions.

1) Why aren't you using the current C++ change tracker which keeps track of every atom changed? Instead you add _changed attributes to atoms and bonds and lots of new methods and code to set those new attributes. It doesn't seem like any of that is needed and Eric and I (granted without going deeply into it) think the current C++ code can do what you are doing (updating graphics of only changed atoms) with no modifications to the C++.

2) Do we need to update only the changed atoms for the graphics to be fast? Your test case of ~200,000 atoms and moving just 10 atoms interactively with the mouse, say one side chain seems reasonable. Can we make that fast without the added complexity of updating only changed atoms -- in other words, pretend all the atoms moved and make it run as fast as it can, for instance by moving a few time critical numpy Python code segments to C++? Having to recompute all the bond cylinder 4x4 matrices for 200,000 atoms might be the bottleneck.

I think we should first do changes that make the code no more complex and see what is the best performance we get. You were on that track initially, just changing the Drawing to contain only the visible atoms instead of containing all atoms and a mask (Drawing.displayed_positions) of which are visible. I think that approach is no more complex than current code. I know you tried it and timed it, but I don't think you made every effort to retain that simple approach and make it run fast (by say doing adding a few routines in C++). I think we should start with that change. If after our best effort on that we want it faster, then we can pursue updating only changed atoms/bonds using the simplest code possible. Does this sound like a good plan?

We are soon to release alpha 2 (in 1 or 2 weeks) and I think we probably won't put these changes in until after that because any critical code changes demand weeks of testing to assure it is stable and that would delay our release.

in reply to:  41 ; comment:39 by tic20@…, 8 years ago

The short answer is that keeping everything within (or even close to) a 
10ms footprint to maintain 100fps is quite a task, when getting the 
coordinates of 100k atoms into Python already takes about 7.5ms.


Looking at it this morning, I see I can do quite a bit more within the 
change tracker framework (and will try to revise to do so). Adding a 
_VISIBLE_ATOMS_CHANGE flag to the bit mask should be straightforward. 
But I still think there's a lot of value in having the atoms and bonds 
record directly when they've changed. Assuming for the time being the 
strategy is to limit geometry recalculation to the changed atoms, then I 
need the indices of the changed atoms in the array of visible atoms a. 
If each atom knows it's changed, then this is a trivial lookup:
changed_mask = a._changed
changed_atoms = a[changed_mask]

- about 400 microseconds if we have 50k visible atoms

On the other hand, the change manager just gives a list of all atoms 
(across all structures) that have changed. To get that mask, you'd have 
to do:
changed_atoms = self.session.change_tracker.changes['Atom'].modified
(about 330 microseconds)
structure_changed_atoms = changed_atoms.filter(changed_atoms.structures 
== self)
(variable - 50 us for 1000 changed atoms, 435 us for 50000)
changed_indices = self.visible_atoms.indices(structure_changed_atoms)
(prohibitive - already 24ms when 1000 out of 50k visible atoms have 
changed)

... and similar timing for bonds. So for a quite modest increase in 
complexity, you get a very fast way to limit all following calculations 
to only what's necessary.



I think so, yes - unless just about *everything* is moved to C. It's not 
necessarily just the geometry calculations, it's the look-ups. If I have 
3j3y loaded as Model m:

# Best-case scenario: all atoms sequential in memory
a = m.atoms[0:100000]
coords = a.coords
# 2.77 ms

# More realistic scenario: atoms scattered throughout structure
import numpy
indices = numpy.random.randint(0, 2000000, 100000)
indices.sort()
a= m.atoms[indices]
coords = a.coords
# 7ms - most of the budget blown already, and there are three 
similarly-sized
# coordinate lookups in _update_atom_graphics() and 
_update_bond_graphics()

A more complete example, working through _update_bond_graphics:

For 100k bonds b:

ba1, ba2 = b.atoms
# 1-4 ms, depending on arrangement in memory. Following timings are 
based
# on a random selection of bonds.

coords1 = ba1.coords
# 7.4 ms
coords2 = ba2.coords
# 7.4 ms
radii = b.radii
# 1.1 ms
positions = _halfbond_cylinder_placements(coords1, coords2, radii)
# 4.9ms

# ... so 20-25 ms total.

On the other hand, using a random mask (50k true) to define "changed" 
bonds:
ch_bonds = b[mask]
# 550 us

... while the rest now takes 11 ms. Better, but still over budget. I 
guess the obvious thing here would be to define a new 
_halfbond_cylinder_placements(bonds, mask, result_array) which does 
everything in C, replacing result_array[i] when mask[i] is true - which 
wouldn't be too difficult, by the look of things, and would probably get 
a few-fold improvement.

Then we get to ribbons: the obvious easy thing to do for now would be to 
define a num_ribbons_visible property of Structure analogous to 
num_atoms_visible, and have the ribbon code exit immediately if it 
returns 0. In most cases with structures of non-insane size it's not 
particularly problematic, but for 3j3y with no ribbons shown 
_update_ribbon_graphics() takes almost 100ms.




On 2017-08-12 01:48, ChimeraX wrote:

by Tristan Croll, 8 years ago

Attachment: changes_take_3.diff added

Updated to make better use of the ChangeManager framework, further speed improvements.

comment:40 by Tristan Croll, 8 years ago

OK, take 3:

  • added two new flags to GraphicsContainer::ChangeType:

ATOM_VIS_CHANGE = (1 << 4),
BOND_VIS_CHANGE = (1 << 5)

and updated everything accordingly so that Structure strictly uses the ChangeTracker framework to check if atom or bond visibility has changed.

  • renamed the _changed flag to _changed_since_last_drawn (and similarly changed the names of all associated functions) in Atom.h and Bond.h to better reflect their intent. Removed it from higher-level objects (Residue and Structure). There is perhaps some argument to be made for adding a similar flag to Residue (_any_ribbon_atom_changed ?) to make it easy to limit ribbon redrawing to only where necessary, but I haven't touched that.
  • added some more caching to structure.py to minimise unnecessary lookups per GUI update. The full list is now:
            self._cached_num_atoms_visible = self._num_atoms_visible
            self._cached_visible_atoms = self._visible_atoms
            self._cached_num_bonds_visible = self._num_bonds_visible
            self._cached_visible_bonds = self._visible_bonds   
            self._cached_visible_bond_atoms = self.visible_bonds.atoms
    
    ... and the actual get functions look like:
        @property
        def num_atoms_visible(self):
            '''
            Number of atoms currently visible in this structure. Cached for
            performance.
            '''
            if self._graphics_changed & self._VISIBLE_ATOMS_CHANGE:
                self._cached_num_atoms_visible = self._num_atoms_visible
            return self._cached_num_atoms_visible
    
  • Added a trivial function, positions_changed() to drawing.py to let it know when its positions object has changed internally. Avoids using the rather strange-looking p.positions = p.positions.
  • Added the line:
            self.atoms.radii = self.atoms.default_radii  
    
    as per #789.
  • Added the following to the head of Structure._update_ribbon_graphics() and Structure._update_ribbon_tethers:
            if not self.ribbon_display_count:
                return
    

With all that done, it looks to me like Structure._update_graphics() won't be the bottleneck in most real-world cases. Some timings (all in the 184k atom structure, stick display, no ribbons, simple lighting):

def time_update_graphics(m, indices):
    m.atoms[indices].coords = numpy.random.rand(len(indices),3)*50
    m._update_graphics_if_needed()

# 1000 atoms displayed, 10 atoms moving:
%timeit time_update_graphics(m, indices)
> 1.2 ms ± 83.3 µs per loop

MoveTimer(session, 50, indices, m)
> Updating coordinates for 10 atoms in a structure with 184550 total atoms took 0.0174 +/- 0.0041 seconds per frame

# 10,000 atoms displayed, 1000 atoms moving:
%timeit time_update_graphics(m, indices)
> 1.89 ms ± 157 µs per loop

MoveTimer(session, 50, indices, m)
> Updating coordinates for 1000 atoms in a structure with 184550 total atoms took 0.0337 +/- 0.0057 seconds per frame

# 100,000 atoms displayed, 100 atoms moving:
%timeit time_update_graphics(m, indices)
3.78 ms ± 81.1 µs per loop

MoveTimer(session, 50, indices, m)
> Updating coordinates for 100 atoms in a structure with 184550 total atoms took 0.0982 +/- 0.0177 seconds per frame

About 10ms of that last case is the bounds calculation, so it's probably not the biggest bottleneck.

All existing functionality appears to be working as expected. This includes:

  • mouse picking (ctrl-click and ctrl-drag) select and highlight the expected atoms.
  • "promotion/demotion" of selections via the up- and down-arrows correctly update the highlighting
  • switching between ball, stick and ball-and-stick render modes
  • changing atom colours
  • hiding/displaying any selection

comment:41 by Tom Goddard, 8 years ago

I'm going to look at doing the optimization with no C++ Atom/Bond changes since the code already reports what atoms and there is already C++ code to find the indices of those atoms in another atom collection (the visible atoms).

The "p.positions = p.positions" obviously indicates a problem. The problem is that the Places() object is immutable, and you added methods set_shift_and_scale() and set_opengl_matrices() to change it. With that change any code that uses Places() will have to consider the possibility that they have changed. There is no change notification currently. I'm not excited about this change.

Setting the radius to the default radius changes the behavior in that editing the structure no longer updates the radius (which depends on bond connectivity).

Sorry to be slowing down your efforts! New cached attributes to optimize things are source of very hard to debug problems caused by cached values not being updated when they should be. Also new redundant APIs like atoms._changed start looking very mysterious a year later when we forget it was an optimization. My first goal in optimizing is to change as little as possible in terms of APIs and caching and complexity. Maybe your design is the simplest. But if I made it myself I wouldn't accept it as a reasonable speed / complexity tradeoff. But I see you need more speed so I'll put aside light microscopy work for now and run some tests.

in reply to:  45 comment:42 by tic20@…, 8 years ago

Quick answer: it's actually atoms.display_radii that does the calculation you refer to (I believe this is what should actually be getting called in _atom_bounds(), and changed the code today accordingly. At present, atoms.radii just directly calls the C++ Atom::radius() function, which returns the stored value if nonzero or calculates and returns Atom::default_radius() otherwise. The reason it's currently so slow is that nothing ever sets a nonzero value so default_radius() gets called every time.

 
 
Tristan Croll
Research Fellow
Cambridge Institute for Medical Research
University of Cambridge CB2 0XY
 

 

in reply to:  46 comment:43 by goddard@…, 8 years ago

Atom::radius() is behaving as designed, the radius attribute is supposed to be zero unless manually overridden, so that the atom radius reflects the current bonds made by the atom.  More sophisticated code that only recomputes the radius when bond patterns or element type change is possible but has not been done yet.

I think I used Atom::radius() in the bounding box calculation even though it isn’t quite right for a little better performance.

comment:44 by Tom Goddard, 8 years ago

I tried some simple optimizations that sped up graphics update in stick mode with all atom display of 240,000 atoms (pdb 4v6x) from 120 msec to 55 msec per frame. This factor of 2 speed-up should apply whether all atoms are shown or just a few atoms. A lot of time was spent computing bond cylinder positions using combined numpy and C++. I put it all in C++ (about 30 lines) getting rid of the numpy temporary arrays of coordinates and radii. Also I added a few more reasons to the graphics change bit field (add/delete and display) so that graphics updates of Drawing arrays are not done unless needed.

It looks to me like the whole C++ graphics changes tracking (GraphicsContainer class) is obsolete. It was added before TrackChanges was implemented and I think TrackChanges Reasons can probably completely replace the graphics changes tracking simplifying the code. Needs a closer look.

I have not pushed these changes since tomorrow we plan to make an alpha2 release.

I still think the graphics code should be revised to make spheres/cylinders for only visible atoms and bonds instead of for all atoms and bonds with an additional display mask as currently done. This will be faster even in the case of all atoms and bonds shown I think because the OpenGL won't need to handle the mask.

Whether we need to attempt updating of just the individual changed atoms and bonds to get more speed should be evaluated after the simpler optimizations are in and tested.

in reply to:  48 ; comment:45 by tic20@…, 8 years ago

With regard to the positions and changing the arrays in place: fair 
enough. If I instead do, for example:

             ssa = p.positions.shift_and_scale_array()
             ssa[ch_filter] = xyzr
             p.positions = Places(shift_and_scale=ssa)

... then the cost is quite minimal and those extra functions become 
obsolete.

In terms of whether or not to limit redrawing to changed atoms at all, 
it's going to depend a lot on the precise situation. A fairly typical 
situation for me would have 1000 atoms mobile and 10,000 visible. With 
my current approach (of course, without your optimisation to bond 
drawing), redrawing just the mobile 1,000 vs all 10,000 works out to 19 
vs 34ms per frame.

Radii: I see. So you'd need a callback to update the radius when an atom 
changes its bond(s) or an ion changes its pseudobond(s).

In terms of caching, I think it will make sense to cache at least the 
visible atoms and bonds, since fetching these becomes a serious 
time-sink for larger structures - and the cost seems to be mostly in the 
actual C++ loops rather than in Python. For the HIV capsid:

%timeit a = m.atoms
%timeit vis = a.visibles
%timeit a[vis]
# if no atoms visible
# if all atoms visible

# 35~45 ms total

If I instead define the following in molc.cpp:

extern "C" EXPORT void structure_visible_atoms(void *mols, size_t n, 
pyobject_t *atoms)
{
     Structure **m = static_cast<Structure **>(mols);
     try {
         for (size_t i = 0; i < n; ++i) {
             const Structure::Atoms &a = m[i]->atoms();
             for (size_t j = 0, size = a.size(); j < size; ++j) {
                 if ( a[j]->visible() )
                     *atoms++ = a[j];
             }
         }
     } catch (...) {
         molc_error();
     }
}

... and expose it to Python via molobject.py as
     _visible_atoms = c_property('structure_visible_atoms', cptr, 
'num_atoms_visible', astype = _atoms,
                           read_only = True)
     ''':class:`.Atoms` containing only the visible atoms of the 
structure. Read only.'''

... and make sure num_atoms_visible is cached to take its timing out of 
the equation, then:

m.atoms.displays = 0
# with m.num_atoms_visible == 0, m._visible_atoms will return a
# zero-length Atoms array so the Python overhead is minimal.
%timeit m._visible_atoms

That works out at just 10ns per atom, but it's still not fast enough - 
unless you want to take the next step and go parallel, for any really 
large structure just this one loop through the Atoms array will put a 
significant dent in any animation framerate if it has to happen every 
time something changes. And this is by no means the largest structure 
out there - when this HIV capsid work was being done the fitting 
simulation was run in a box of explicit solvent with a grand total of 64 
million atoms, and I've heard tell of work with 100 million plus atom 
assemblies. Especially with the rapid development of high-resolution 
cryoelectron tomography experiments at this scale are going to become 
more commonplace, so I think it's worth considering strategies with that 
in mind.

All that being said, I do understand where you're coming from with 
respect to cached values. But at the same time I think one would have to 
try quite hard to break this caching approach. With the code as written, 
the moment the change manager registers a change in visibility then all 
calls revert to using the full lookup functions until the relevant 
change bit is reset - which is only ever done as a final step in 
Structure._update_graphics_if_needed().




On 2017-08-14 19:02, ChimeraX wrote:

in reply to:  49 ; comment:46 by tic20@…, 8 years ago

Argh! Re-sending the timings because the server helpfully edited out all 
my lines beginning with ">"...

%timeit a = m.atoms

     4.24 ms ± 396 µs per loop

%timeit vis = a.visibles

     31 ms ± 1.36 ms per loop

%timeit a[vis]
# if no atoms visible

     174 µs ± 17.3 µs per loop

# if all atoms visible

     9.48 ms ± 1.93 ms per loop


# 35~45 ms total



m.atoms.displays = 0
# with m.num_atoms_visible == 0, m._visible_atoms will return a
# zero-length Atoms array so the Python overhead is minimal.
%timeit m._visible_atoms

     24.5 ms ± 916 µs per loop


That works out at just 10ns per atom, but it's still not fast enough - 
unless you want to take the next step and go parallel, for any really 
large structure just this one loop through the Atoms array will put a 
significant dent in any animation framerate if it has to happen every 
time something changes. And this is by no means the largest structure 
out there - when this HIV capsid work was being done the fitting 
simulation was run in a box of explicit solvent with a grand total of 64 
million atoms, and I've heard tell of work with 100 million plus atom 
assemblies. Especially with the rapid development of high-resolution 
cryoelectron tomography experiments at this scale are going to become 
more commonplace, so I think it's worth considering strategies with that 
in mind.

All that being said, I do understand where you're coming from with 
respect to cached values. But at the same time I think one would have to 
try quite hard to break this caching approach. With the code as written, 
the moment the change manager registers a change in visibility then all 
calls revert to using the full lookup functions until the relevant 
change bit is reset - which is only ever done as a final step in 
Structure._update_graphics_if_needed().



On 2017-08-15 13:40, Tristan Croll wrote:

comment:47 by Tristan Croll, 8 years ago

A little more tinkering today - as much as a learning exercise as anything else. A C++-only atom bounds calculation. For what it's worth, the speed gain isn't enormous if the radii are pre-calculated - but it's not negligible either.

molc.cpp:

extern "C" EXPORT void atom_bounds(void *atoms, size_t n, float32_t *minmax)
{
    Atom **a = static_cast<Atom **>(atoms);
    try {
        float32_t x, y, z, xmin, xmax, ymin, ymax, zmin, zmax, r;
        float32_t x_minus_r, y_minus_r, z_minus_r, x_plus_r, y_plus_r, z_plus_r;
        Coord c;
        Atom* this_a = a[0];
        r = this_a->radius();
        c = this_a->coord();
        x = c[0]; y = c[1]; z = c[2];
        xmin=x-r; ymin=y-r; zmin=z-r; xmax=x+r; ymax=y+r; zmax=z+r;

        for (size_t j=1; j<n ; ++j )
        {
            this_a = a[j];
            r = this_a->radius();
            c = this_a->coord();
            x=c[0]; y=c[1]; z=c[2];
            
            x_minus_r = x-r;
            if (x_minus_r < xmin) {
                xmin = x_minus_r;
            } else {
                x_plus_r = x+r;
                if (x_plus_r > xmax) xmax = x_plus_r;
            }
            
            y_minus_r = y-r;
            if (y_minus_r < ymin) {
                ymin = y_minus_r;
            } else {
                y_plus_r = y+r;
                if (y_plus_r > ymax) ymax = y_plus_r;
            }
            
            z_minus_r = z-r;
            if (z_minus_r < zmin) {
                zmin = z_minus_r;
            } else {
                z_plus_r = z+r;
                if (z_plus_r > zmax) zmax = z_plus_r;
            }
            
        }
             
        minmax[0] = xmin;
        minmax[1] = ymin;
        minmax[2] = zmin;
        minmax[3] = xmax;
        minmax[4] = ymax;
        minmax[5] = zmax;
        
    } catch (...) {
        molc_error();
    }
}

molarray.py, in class Atoms:

    @property
    def _bounds(self):
        '''Atom bounds in scene'''
        bounds = empty([2,3], float32)
        f = c_function('atom_bounds',
                        args = [ctypes.c_void_p, ctypes.c_size_t, ctypes.c_void_p]
             )
        f(self._c_pointers, len(self), pointer(bounds))
        from ..geometry.bounds import Bounds
        return Bounds(*bounds)

Note: in the below tests Structure._atom_bounds uses the cached self.visible_atoms array (all atoms are visible).

def time_atom_bounds(model):

model._atom_bounds_needs_update = True
return model._atom_bounds()

For 184k atoms, no radii pre-set:

%timeit b = time_atom_bounds(m)

21.1 ms ± 32.8 µs per loop

%timeit b = m.visible_atoms._bounds

12.6 ms ± 161 µs per loop

#Setting the radii to constant values

m.atoms.radii = m.atoms.default_radii
%timeit b = time_atom_bounds(m)

11.2 ms ± 1.24 ms per loop

%timeit m.visible_atoms._bounds

9.77 ms ± 49.4 µs per loop

# And of course, making sure the result is the same:
bounds_a = m._atom_bounds()
bounds_b = m.visible_atoms._bounds

bounds_a.xyz_min - bounds_b.xyz_min

array([ 0., 0., 0.], dtype=float32)

bounds_a.xyz_max - bounds_b.xyz_max

array([ 0., 0., 0.], dtype=float32)

in reply to:  51 ; comment:48 by goddard@…, 8 years ago

A factor of almost 2 is nice.  This is the kind of optimization I like to make — simple and clear code in C++ that replaces slow numpy vector calcluations — makes the Python even more readable.


comment:49 by Tom Goddard, 8 years ago

I have checked in optimized bond cylinder position calculation code, improved atom/bond graphics update detection to not update colors, selection, positions unless necessary, and I change the drawing code to only create spheres and cylinders for the visible atoms and bonds. These changes increased rendering speed from 4 frames/sec to 10 frames/sec when changing atom coordinates for 240,000 atoms shown as stick, simple lighting, PDB 4v6x.

My current timings for the above test case indicate about 50 msec/frame used to update the position arrays when moving atoms, and about 50 msec/frame from other sources, like updating the OpenGL buffers that happen during the draw() call.

More could be done to optimize rendering speed while moving atoms. But I spent about a full day and a half on this so far which I think is probably more than it merits given the many other priorities (e.g. getting funding). If you would like to go further, please look first towards optimizations that do not increase the complexity of the code.

comment:50 by Tristan Croll, 8 years ago

Thanks for bearing with me, Tom. Much appreciated.

comment:51 by Tom Goddard, 8 years ago

Resolution: fixed
Status: assignedclosed

As described in the previous 46 comments, some changes allowed faster rendering when atoms are moved. If further optimizations are needed the first efforts should try to optimize without adding additional logic complexity (e.g. no extra cached state). More can be done within these bounds and that would be the first approach before more complex solutions are tried.

Note: See TracTickets for help on using tickets.