#3000 closed enhancement (fixed)
Show / hide slow on very large structures due to change tracking
Reported by: | Tom Goddard | Owned by: | Tom Goddard |
---|---|---|---|
Priority: | moderate | Milestone: | |
Component: | Performance | Version: | |
Keywords: | Cc: | pett | |
Blocked By: | Blocking: | ||
Notify when closed: | Platform: | all | |
Project: | ChimeraX |
Description
Show and hide commands for atoms on large structure such as 6q3g (1 million atoms) are sluggish due to atomic ChangeTracker (C++) accumulating all the modified atoms in a std::set. These commands become 5 times faster (taking 0.16 seconds instead of 0.80 seconds) if the modified atoms are not accumulated. The actual set of atoms modified are rarely needed by change trigger handlers. So this 5x performance decrease is for no purpose. Snappy performance on large structures is a selling point of ChimeraX so we should figure out how to optimize this.
I don't have a good idea for optimizing it. I tried std::unordered_set for modified objects and it was ~30% slower. std::vector would probably be much faster, but there could be tons of undesired duplicates. Remarkably half the time is in creating the modified std::set and half is in clearing it.
Optimally we would not even accumulate the modified atoms if no one is using them. But I don't know how to achieve that. Maybe client code has to actually request when registering that it wants modified atoms. It could be a global request that effects all change tracking. I should do a survey to see if any code at all looks at the modified atoms.
Maybe nicer would be to just mark modified atoms. Then only collect them when a handler asks for them. The mark could be an integer attribute of the C++ Atom class. To avoid having to clear it after every change the integer marking value can be increased by 1 each time a "clear" is done.
Change History (15)
comment:1 by , 6 years ago
comment:2 by , 6 years ago
Probably should have said that "Changes" is atomic.changes.Changes and the _changes attribute is a local class created in atomic.molobject.ChangeTracker.changes
comment:3 by , 6 years ago
I'd be a bit concerned that the vector can be very large with duplicate atoms if any code modifies atoms over and over. And we can't remove items from the vector one at a time during delete or we will hit disastrous N2 behavior. But we could convert to a set on the fly if delete is needed. But I think the duplicates problem can't be ignored. One solution would be Atom having a byte to mark that it has already been recorded as modified. Then it could be added to the vector only if it was not yet marked. But ChangeTracker handles all objects (Residues, Bonds, ...) generically so that won't work without a base class for all those objects.
comment:4 by , 6 years ago
I searched all ChimeraX code for modified_atoms then name of the Python method that returns the list of atoms. It is only used in two places. Bond rotation manager uses modified_atoms().unique_structures -- so it is only using the structures of the modified atoms. And molecular surfaces intersects the modified_atoms() with its surface atoms if modified_reasons() contains "coord changed" so it can update the surface. So really only one place, molecular structures is using the modified atoms. We are paying a pretty heavy price in performance everywhere to handle rare cases efficiently. Better to be efficient almost everywhere and let rare cases be less efficient. For example if modified atoms were only identified by what structures they belong to, then all molecular surfaces involving a moved atom would get updated.
comment:5 by , 6 years ago
Keep in mind that these sets/vectors get emptied every time atomic's check_for_changes runs, which is after every frame draw and every command execution and whenever it gets called "manually" as needed from code, so the only chance it really has to "build up" is in some kind of Python script/code that never calls check_for_changes either directly or incidentally -- and it would only matter for a large number of atom-level modifications on a large structure.
Also, you should be aware than deletion from the modified list only happens if _part_ of a structure gets deleted. If the entire structure is being deleted it does not come into play. And those atoms had to also to have been modified since the last check for changes.
I can sense that we are not going to reach consensus on this idea so I have an alternate proposal. We #ifdef out all the currently unused atom-modification tracked changes with e.g. #ifdef ATOM_HIDE_CHANGE_TRACKED and document in changes.modified_atoms what changes actually get tracked. Since the only one currently used is the rarely-happens "coord changed", I think that almost completely addresses this issue. Gives us flexibility to add a few more in the future if we deem them necessary. I don't think any code will ever care about the "big ones" like hide, show, color.
follow-up: 6 comment:6 by , 6 years ago
That is a nice idea, only keep the modified atoms for "coord changed" and possibly other reasons that could be added in the future. Let's mull that one over. You are right it will be hard for me to swallow using the vector. I just don't want to be debugging weird corner cases where some Python code runs out of memory when it doesn't appear to be allocating anything, just doing lots of computations and structure twiddling.
follow-up: 7 comment:7 by , 6 years ago
I don't think we can just ifdef out the Atom add_modified() calls. We still want the reasons atoms changed. But we can change add_modified() making a version that only takes a reason. Hmm... It is a template method and needs to know what type of object was modified. Could use the current add_modified() and pass it a null Atom pointer, or something like add_modified<Atom>(structure, reason_changed).
follow-up: 8 comment:8 by , 6 years ago
Do we really want the reason atoms changed without the actual atoms that changed? I’m dubious. I’d say go with the simple solution first. If necessary, later we can add an #else to the #ifdefs that adds a reason without adding the atom if it comes to that.
follow-up: 9 comment:9 by , 6 years ago
Ok, but I didn't search for code that is using atom change reasons and there might be quite a number of those. I will look tomorrow.
comment:11 by , 6 years ago
Here is a list of all ChimeraX code (not including ISOLDE) using changes.atom_reasons()
Model Panel: color changed, display changed
Clashes: coord changed, alt_loc changed
Bond Rotation: coord changed
Label: name changed
Molecular surface: coord changed
Nucleotides: coord changed, display changed
Interfaces: display changed
Only Bond Rotation and Molecular surface use changes.modified_atoms().
So it looks like we need to report atom display changed and color changed even if the atoms are not included in the modified list.
comment:12 by , 6 years ago
Here is baseline command timing on show/hide/color for a million atom structure. Each command run 3 times because there is some variation.
UCSF ChimeraX version: 0.93 (2020-04-02)
time open 6q3g
command time 4.538 seconds
draw time 0.09639 seconds
time hide
command time 0.8146 seconds
draw time 0.1359 seconds
time show
command time 0.9189 seconds
draw time 0.3367 seconds
time hide
command time 0.8661 seconds
draw time 0.1272 seconds
time show
command time 0.9268 seconds
draw time 0.3405 seconds
time hide
command time 0.9247 seconds
draw time 0.1333 seconds
time show
command time 0.8825 seconds
draw time 0.3555 seconds
time color pink
command time 2.053 seconds
draw time 0.07301 seconds
time color bypoly
command time 1.533 seconds
draw time 0.07154 seconds
time color bychain
command time 2.744 seconds
draw time 0.06679 seconds
time color pink
command time 1.231 seconds
draw time 0.07486 seconds
time color bypoly
command time 1.567 seconds
draw time 0.08178 seconds
time color bychain
command time 2.755 seconds
draw time 0.07937 seconds
time color pink
command time 1.296 seconds
draw time 0.07878 seconds
time color bypoly
command time 1.526 seconds
draw time 0.0717 seconds
time color bychain
command time 2.735 seconds
draw time 0.07251 seconds
And below is timing with display changed and color changed not adding atoms to modified set.
time open 6q3g
command time 4.285 seconds
draw time 0.09344 seconds
time hide
command time 0.1412 seconds
draw time 0.142 seconds
time show
command time 0.141 seconds
draw time 0.3455 seconds
time hide
command time 0.1899 seconds
draw time 0.1262 seconds
time show
command time 0.1412 seconds
draw time 0.3469 seconds
time hide
command time 0.1399 seconds
draw time 0.1431 seconds
time show
command time 0.1389 seconds
draw time 0.3737 seconds
time color pink
command time 1.47 seconds
draw time 0.07216 seconds
time color bypoly
command time 0.8173 seconds
draw time 0.07489 seconds
time color bychain
command time 2.007 seconds
draw time 0.07602 seconds
time color pink
command time 0.6549 seconds
draw time 0.08724 seconds
time color bypoly
command time 0.8243 seconds
draw time 0.08635 seconds
time color bychain
command time 2.006 seconds
draw time 0.07628 seconds
time color pink
command time 0.5942 seconds
draw time 0.08281 seconds
time color bypoly
command time 0.8078 seconds
draw time 0.08119 seconds
time color bychain
command time 1.967 seconds
draw time 0.07652 seconds
comment:13 by , 6 years ago
Summarizing the previous results, taking the fastest times for the 3 tries, first time with modified atoms, second without modified atoms. Times in seconds. We see show/hide is much faster (6x) and coloring is 2x faster in two cases, and 1.4x for bychain.
open 4.5 4.3
hide 0.13 0.81
show 0.14 0.88
color pink 0.59 1.23
color bypoly 0.81 1.53
color bychain 1.97 2.74
comment:14 by , 6 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Fixed.
I've exclude atoms from the modified list on display and color changes. The atom reasons are still updated. The speed gains are large and no current code was using the modified atoms during display/color changes. In the future we might optimize ChangeTracker C++ code and put the atoms back into the modified set.
comment:15 by , 4 years ago
Got snagged by this today since I'd forgotten that modified_atoms() returns nothing for color changes, and had to go through a whole debugging cycle before rediscovering this.
Possibly you could switch to vectors (should be 20x faster for 6q3g) and then convert them to sets on demand when the 'created' or 'modified' attributes of Changes._changes gets accessed (and cache the set of course). You obviously win with avoiding set creation when unneeded, but lose some back for those cases where you have to create both a vector and a set, and also in the C++ layer sometimes when an instance gets both modified and deleted, because the removal of the deleted item will be slower from a vector than from a set (and it may need to be deleted multiple times from a vector). My guess is that overall it would be a big win though.