Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#6243 closed enhancement (fixed)

Mesh calculation from triangles very slow

Reported by: tic20@… Owned by: Tom Goddard
Priority: moderate Milestone:
Component: Graphics Version:
Keywords: Cc:
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

As indicated in comments in ticket #5964 computing mesh edges from triangles for a volume surface is very slow. The _graphics.masked_edges() routine makes the mesh display about 5 times slower than triangle display when the surface changes due to the use of std::set as described in that earlier ticket. Also the mesh edges are computed twice because the graphics Drawing code updates the buffers for the whole model and the separate ones for the highlighted model even when nothing is highlighted.

Some ideas to optimize the masked_edges() routine to not use a std::set and allow duplicate edges were tried by Tristan Croll (see #5964).

It would be nice to optimize out the gratuitous updating of highlighting OpenGL buffers when nothing is highlighted.

Attachments (2)

mesh_edges.cpp.diff (2.5 KB ) - added by Tristan Croll 4 years ago.
Replaced set with size_t vector sort and unique
test_mesh.py (522 bytes ) - added by Tom Goddard 4 years ago.
Test combined speed contouring and showing meshes.

Download all attachments as: .zip

Change History (8)

comment:1 by Tom Goddard, 4 years ago

Type: defectenhancement

by Tristan Croll, 4 years ago

Attachment: mesh_edges.cpp.diff added

Replaced set with size_t vector sort and unique

comment:2 by Tristan Croll, 4 years ago

I found this discussion really instructive: https://douglasorr.github.io/2019-09-hash-vs-sort/article.html. Short story is that at least for simple integer values, doing sort followed by unique on a std::vector is as fast or faster than unordered_set, particularly for very large sets where the capacity of the L3 cache is exceeded. Sure enough, the attached implementation (reversibly combining the two vertex indices into a single size_t, working on a vector of that and then separating back into two ints at the end) is about 3 times faster than the set implementation for three test cases with 100k to 32m triangles (timing just the mesh_edges call by putting calls to time.perf_counter either side of it in drawing.py):

PDB ID  Num triangles  Original mean  Original std  Vector mean  Vector_std  Speedup mean  Speedup std
3io0    110000	       19.5           2.2           6.7          0.79        2.9           0.67
7ogu    1700000        357.7          6.94          125.8        3.9         2.8           0.14
4v99    32000000       5358.4         269.23        1649.2       15.07       3.2           0.19

by Tom Goddard, 4 years ago

Attachment: test_mesh.py added

Test combined speed contouring and showing meshes.

comment:3 by Tom Goddard, 4 years ago

I found your vector / sort / unique method of removing duplicates to be about 4x faster on macOS 10.15.7. So I changed the code to use that.

In the test_mesh.py benchmark (attached) which computes contour surfaces and shows meshes at a variety of thresholds the original std::set code took 150 seconds (be careful that recent file history is not shown when running, since it will prevent graphics redraw and edge calculation). Running the benchmark with surface display took 30 seconds, so it appears 120 seconds were computing the unique mesh edges. The new sorting code takes 60 seconds for mesh on this benchmark, so it appears to take 30 seconds for computing unique mesh edges, a 4-fold speed-up of that computation.

I also tried a multipass simple hash array of size 3*(num triangles) of size_t. It typically did 8-10 passes for the test_mesh.py benchmark and took almost exactly the same time 59 seconds, but is somewhat more complex. I expected it to be faster, but probably memory cache speed is poor because of hashing into a large array that does not fit in CPU cache. Sorting probably has excellent cache performance.

One other test I tried was just keeping duplicate edges. That took 40 seconds with test_mesh.py. Since 30 seconds is being taken for contour calculation, 10 seconds is taken even with no filtering of duplicate mesh edges.

The graphics code still runs the masked_edge() C++ code twice for every update because it always computes the highlight buffer. Commenting out the highlight buffer update code reduced the test_mesh.py time to 45 seconds, as expected since the new sort method takes 30 seconds (with remaining 30 seconds for contouring calculation) so that time is halved to 15 seconds. I'll look if the graphics can avoid computing the highlight opengl buffers if no highlight is shown.

comment:4 by Tom Goddard, 4 years ago

Resolution: fixed
Status: assignedclosed

Fixed.

The masked edges calculation now uses C++ vector, sort, and unique instead of set and is 3 or 4 times faster. I made a separate ticket #6297 to reduce the two calculations of the mesh edges to one.

in reply to:  7 comment:5 by Tristan Croll, 4 years ago

Just a note for if/when it becomes desirable to optimise this further: an advantage of the sort/unique approach is that there are well-established and supported GPU algorithms for both tasks. See for example the thrust library (https://docs.nvidia.com/cuda/thrust/index.html - probably not the best general-purpose solution since it's specific to Nvidia, but the documentation is pretty good).

________________________________
From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
Sent: 03 March 2022 00:32
Cc: goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk>
Subject: Re: [ChimeraX] #6243: Mesh calculation from triangles very slow

#6243: Mesh calculation from triangles very slow
----------------------------------+-------------------------
          Reporter:  tic20@…      |      Owner:  Tom Goddard
              Type:  enhancement  |     Status:  closed
          Priority:  moderate     |  Milestone:
         Component:  Graphics     |    Version:
        Resolution:  fixed        |   Keywords:
        Blocked By:               |   Blocking:
Notify when closed:               |   Platform:  all
           Project:  ChimeraX     |
----------------------------------+-------------------------
Changes (by Tom Goddard):

 * status:  assigned => closed
 * resolution:   => fixed


Comment:

 Fixed.

 The masked edges calculation now uses C++ vector, sort, and unique instead
 of set and is 3 or 4 times faster.  I made a separate ticket #6297 to
 reduce the two calculations of the mesh edges to one.

--
Ticket URL: <https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/6243#comment:4>
ChimeraX <https://www.rbvi.ucsf.edu/chimerax/>
ChimeraX Issue Tracker

in reply to:  8 comment:6 by Tom Goddard, 4 years ago

Thanks.  I don't think GPU optimization of ChimeraX code is likely to happen.  Of course we'll use OpenMM and machine learning Python packages that take advantage of it.  We are too small a team working on too many things to get to that level of optimization in our own code.
Note: See TracTickets for help on using tickets.