#6243 closed enhancement (fixed)
Mesh calculation from triangles very slow
Reported by: | Owned by: | Tom Goddard | |
---|---|---|---|
Priority: | moderate | Milestone: | |
Component: | Graphics | Version: | |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Notify when closed: | Platform: | all | |
Project: | ChimeraX |
Description
As indicated in comments in ticket #5964 computing mesh edges from triangles for a volume surface is very slow. The _graphics.masked_edges() routine makes the mesh display about 5 times slower than triangle display when the surface changes due to the use of std::set as described in that earlier ticket. Also the mesh edges are computed twice because the graphics Drawing code updates the buffers for the whole model and the separate ones for the highlighted model even when nothing is highlighted.
Some ideas to optimize the masked_edges() routine to not use a std::set and allow duplicate edges were tried by Tristan Croll (see #5964).
It would be nice to optimize out the gratuitous updating of highlighting OpenGL buffers when nothing is highlighted.
Attachments (2)
Change History (8)
comment:1 by , 4 years ago
Type: | defect → enhancement |
---|
by , 4 years ago
Attachment: | mesh_edges.cpp.diff added |
---|
comment:2 by , 4 years ago
I found this discussion really instructive: https://douglasorr.github.io/2019-09-hash-vs-sort/article.html. Short story is that at least for simple integer values, doing sort
followed by unique
on a std::vector
is as fast or faster than unordered_set
, particularly for very large sets where the capacity of the L3 cache is exceeded. Sure enough, the attached implementation (reversibly combining the two vertex indices into a single size_t
, working on a vector of that and then separating back into two int
s at the end) is about 3 times faster than the set
implementation for three test cases with 100k to 32m triangles (timing just the mesh_edges
call by putting calls to time.perf_counter
either side of it in drawing.py
):
PDB ID Num triangles Original mean Original std Vector mean Vector_std Speedup mean Speedup std 3io0 110000 19.5 2.2 6.7 0.79 2.9 0.67 7ogu 1700000 357.7 6.94 125.8 3.9 2.8 0.14 4v99 32000000 5358.4 269.23 1649.2 15.07 3.2 0.19
comment:3 by , 4 years ago
I found your vector / sort / unique method of removing duplicates to be about 4x faster on macOS 10.15.7. So I changed the code to use that.
In the test_mesh.py benchmark (attached) which computes contour surfaces and shows meshes at a variety of thresholds the original std::set code took 150 seconds (be careful that recent file history is not shown when running, since it will prevent graphics redraw and edge calculation). Running the benchmark with surface display took 30 seconds, so it appears 120 seconds were computing the unique mesh edges. The new sorting code takes 60 seconds for mesh on this benchmark, so it appears to take 30 seconds for computing unique mesh edges, a 4-fold speed-up of that computation.
I also tried a multipass simple hash array of size 3*(num triangles) of size_t. It typically did 8-10 passes for the test_mesh.py benchmark and took almost exactly the same time 59 seconds, but is somewhat more complex. I expected it to be faster, but probably memory cache speed is poor because of hashing into a large array that does not fit in CPU cache. Sorting probably has excellent cache performance.
One other test I tried was just keeping duplicate edges. That took 40 seconds with test_mesh.py. Since 30 seconds is being taken for contour calculation, 10 seconds is taken even with no filtering of duplicate mesh edges.
The graphics code still runs the masked_edge() C++ code twice for every update because it always computes the highlight buffer. Commenting out the highlight buffer update code reduced the test_mesh.py time to 45 seconds, as expected since the new sort method takes 30 seconds (with remaining 30 seconds for contouring calculation) so that time is halved to 15 seconds. I'll look if the graphics can avoid computing the highlight opengl buffers if no highlight is shown.
comment:4 by , 4 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Fixed.
The masked edges calculation now uses C++ vector, sort, and unique instead of set and is 3 or 4 times faster. I made a separate ticket #6297 to reduce the two calculations of the mesh edges to one.
comment:5 by , 4 years ago
Just a note for if/when it becomes desirable to optimise this further: an advantage of the sort/unique approach is that there are well-established and supported GPU algorithms for both tasks. See for example the thrust library (https://docs.nvidia.com/cuda/thrust/index.html - probably not the best general-purpose solution since it's specific to Nvidia, but the documentation is pretty good). ________________________________ From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu> Sent: 03 March 2022 00:32 Cc: goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk> Subject: Re: [ChimeraX] #6243: Mesh calculation from triangles very slow #6243: Mesh calculation from triangles very slow ----------------------------------+------------------------- Reporter: tic20@… | Owner: Tom Goddard Type: enhancement | Status: closed Priority: moderate | Milestone: Component: Graphics | Version: Resolution: fixed | Keywords: Blocked By: | Blocking: Notify when closed: | Platform: all Project: ChimeraX | ----------------------------------+------------------------- Changes (by Tom Goddard): * status: assigned => closed * resolution: => fixed Comment: Fixed. The masked edges calculation now uses C++ vector, sort, and unique instead of set and is 3 or 4 times faster. I made a separate ticket #6297 to reduce the two calculations of the mesh edges to one. -- Ticket URL: <https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/6243#comment:4> ChimeraX <https://www.rbvi.ucsf.edu/chimerax/> ChimeraX Issue Tracker
comment:6 by , 4 years ago
Thanks. I don't think GPU optimization of ChimeraX code is likely to happen. Of course we'll use OpenMM and machine learning Python packages that take advantage of it. We are too small a team working on too many things to get to that level of optimization in our own code.
Replaced set with size_t vector sort and unique