Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#5964 closed enhancement (fixed)

Optimizing ISOLDE map zoning using JAX

Reported by: Tristan Croll Owned by: Tristan Croll
Priority: normal Milestone:
Component: Performance Version:
Keywords: Cc: Tom Goddard
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

The following bug report has been submitted:
Platform:        Windows-10-10.0.19041
ChimeraX Version: 1.3rc202112030319 (2021-12-03 03:19:12 UTC)
Description
I've just been working through the introductory JAX tutorial at https://colab.research.google.com/github/google/jax/blob/main/docs/jax-101/01-jax-basics.ipynb. Seems worth a look - developed by Google/DeepMind primarily for machine learning work, but it provides ridiculously simple GPU parallelisation of a lot of Numpy tasks. Could be applications in accelerating the main graphics loop?

OpenGL version: 3.3.0 NVIDIA 497.29
OpenGL renderer: NVIDIA GeForce RTX 2080/PCIe/SSE2
OpenGL vendor: NVIDIA Corporation
Manufacturer: Notebook                        
Model: P7xxTM1
OS: Microsoft Windows 10 Education (Build 19041)
Memory: 68,654,501,888
MaxProcessMemory: 137,438,953,344
CPU: 16 Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
OSLanguage: en-GB
Locale: ('en_GB', 'cp1252')
PyQt5 5.15.2, Qt 5.15.2
Installed Packages:
    -: imerax-clipper
    -himerax-clipper: 0.17.0
    -himerax-isolde: 1.3.dev33
    alabaster: 0.7.12
    appdirs: 1.4.4
    Babel: 2.9.1
    backcall: 0.2.0
    blockdiag: 2.0.1
    certifi: 2021.10.8
    cftime: 1.5.1.1
    charset-normalizer: 2.0.8
    ChimeraX-AddCharge: 1.2.2
    ChimeraX-AddH: 2.1.11
    ChimeraX-AlignmentAlgorithms: 2.0
    ChimeraX-AlignmentHdrs: 3.2
    ChimeraX-AlignmentMatrices: 2.0
    ChimeraX-Alignments: 2.2.3
    ChimeraX-AlphaFold: 1.0
    ChimeraX-AltlocExplorer: 1.0.1
    ChimeraX-AmberInfo: 1.0
    ChimeraX-Arrays: 1.0
    ChimeraX-Atomic: 1.31
    ChimeraX-AtomicLibrary: 4.2
    ChimeraX-AtomSearch: 2.0
    ChimeraX-AtomSearchLibrary: 1.0
    ChimeraX-AxesPlanes: 2.0
    ChimeraX-BasicActions: 1.1
    ChimeraX-BILD: 1.0
    ChimeraX-BlastProtein: 2.0
    ChimeraX-BondRot: 2.0
    ChimeraX-BugReporter: 1.0
    ChimeraX-BuildStructure: 2.6.1
    ChimeraX-Bumps: 1.0
    ChimeraX-BundleBuilder: 1.1
    ChimeraX-ButtonPanel: 1.0
    ChimeraX-CageBuilder: 1.0
    ChimeraX-CellPack: 1.0
    ChimeraX-Centroids: 1.2
    ChimeraX-ChemGroup: 2.0
    ChimeraX-Clashes: 2.2.2
    ChimeraX-Clipper: 0.17.0
    ChimeraX-ColorActions: 1.0
    ChimeraX-ColorGlobe: 1.0
    ChimeraX-ColorKey: 1.5
    ChimeraX-CommandLine: 1.1.5
    ChimeraX-ConnectStructure: 2.0
    ChimeraX-Contacts: 1.0
    ChimeraX-Core: 1.3rc202112030319
    ChimeraX-CoreFormats: 1.1
    ChimeraX-coulombic: 1.3.2
    ChimeraX-Crosslinks: 1.0
    ChimeraX-Crystal: 1.0
    ChimeraX-CrystalContacts: 1.0
    ChimeraX-DataFormats: 1.2.2
    ChimeraX-Dicom: 1.0
    ChimeraX-DistMonitor: 1.1.5
    ChimeraX-DistUI: 1.0
    ChimeraX-Dssp: 2.0
    ChimeraX-EMDB-SFF: 1.0
    ChimeraX-ExperimentalCommands: 1.0
    ChimeraX-FileHistory: 1.0
    ChimeraX-FunctionKey: 1.0
    ChimeraX-Geometry: 1.1
    ChimeraX-gltf: 1.0
    ChimeraX-Graphics: 1.1
    ChimeraX-Hbonds: 2.1.2
    ChimeraX-Help: 1.2
    ChimeraX-HKCage: 1.3
    ChimeraX-IHM: 1.1
    ChimeraX-ImageFormats: 1.2
    ChimeraX-IMOD: 1.0
    ChimeraX-IO: 1.0.1
    ChimeraX-ISOLDE: 1.3
    ChimeraX-ItemsInspection: 1.0
    ChimeraX-Label: 1.1
    ChimeraX-ListInfo: 1.1.1
    ChimeraX-Log: 1.1.4
    ChimeraX-LookingGlass: 1.1
    ChimeraX-Maestro: 1.8.1
    ChimeraX-Map: 1.1
    ChimeraX-MapData: 2.0
    ChimeraX-MapEraser: 1.0
    ChimeraX-MapFilter: 2.0
    ChimeraX-MapFit: 2.0
    ChimeraX-MapSeries: 2.1
    ChimeraX-Markers: 1.0
    ChimeraX-Mask: 1.0
    ChimeraX-MatchMaker: 2.0.4
    ChimeraX-MDcrds: 2.6
    ChimeraX-MedicalToolbar: 1.0.1
    ChimeraX-Meeting: 1.0
    ChimeraX-MLP: 1.1
    ChimeraX-mmCIF: 2.4
    ChimeraX-MMTF: 2.1
    ChimeraX-Modeller: 1.2.6
    ChimeraX-ModelPanel: 1.2.1
    ChimeraX-ModelSeries: 1.0
    ChimeraX-Mol2: 2.0
    ChimeraX-Morph: 1.0
    ChimeraX-MouseModes: 1.1
    ChimeraX-Movie: 1.0
    ChimeraX-Neuron: 1.0
    ChimeraX-Nucleotides: 2.0.2
    ChimeraX-OpenCommand: 1.7
    ChimeraX-PDB: 2.6.5
    ChimeraX-PDBBio: 1.0
    ChimeraX-PDBLibrary: 1.0.2
    ChimeraX-PDBMatrices: 1.0
    ChimeraX-PickBlobs: 1.0
    ChimeraX-Positions: 1.0
    ChimeraX-PresetMgr: 1.0.1
    ChimeraX-PubChem: 2.1
    ChimeraX-ReadPbonds: 1.0.1
    ChimeraX-Registration: 1.1
    ChimeraX-RemoteControl: 1.0
    ChimeraX-ResidueFit: 1.0
    ChimeraX-RestServer: 1.1
    ChimeraX-RNALayout: 1.0
    ChimeraX-RotamerLibMgr: 2.0.1
    ChimeraX-RotamerLibsDunbrack: 2.0
    ChimeraX-RotamerLibsDynameomics: 2.0
    ChimeraX-RotamerLibsRichardson: 2.0
    ChimeraX-SaveCommand: 1.5
    ChimeraX-SchemeMgr: 1.0
    ChimeraX-SDF: 2.0
    ChimeraX-Segger: 1.0
    ChimeraX-Segment: 1.0
    ChimeraX-SelInspector: 1.0
    ChimeraX-SeqView: 2.4.6
    ChimeraX-Shape: 1.0.1
    ChimeraX-Shell: 1.0
    ChimeraX-Shortcuts: 1.1
    ChimeraX-ShowAttr: 1.0
    ChimeraX-ShowSequences: 1.0
    ChimeraX-SideView: 1.0
    ChimeraX-Smiles: 2.1
    ChimeraX-SmoothLines: 1.0
    ChimeraX-SpaceNavigator: 1.0
    ChimeraX-StdCommands: 1.6.1
    ChimeraX-STL: 1.0
    ChimeraX-Storm: 1.0
    ChimeraX-Struts: 1.0
    ChimeraX-Surface: 1.0
    ChimeraX-SwapAA: 2.0
    ChimeraX-SwapRes: 2.1
    ChimeraX-TapeMeasure: 1.0
    ChimeraX-Test: 1.0
    ChimeraX-Toolbar: 1.1
    ChimeraX-ToolshedUtils: 1.2
    ChimeraX-Tug: 1.0
    ChimeraX-UI: 1.13.7
    ChimeraX-uniprot: 2.2
    ChimeraX-UnitCell: 1.0
    ChimeraX-ViewDockX: 1.0.1
    ChimeraX-VIPERdb: 1.0
    ChimeraX-Vive: 1.1
    ChimeraX-VolumeMenu: 1.0
    ChimeraX-VTK: 1.0
    ChimeraX-WavefrontOBJ: 1.0
    ChimeraX-WebCam: 1.0
    ChimeraX-WebServices: 1.0
    ChimeraX-Zone: 1.0
    colorama: 0.4.4
    comtypes: 1.1.10
    cxservices: 1.1
    cycler: 0.11.0
    Cython: 0.29.24
    decorator: 5.1.0
    docutils: 0.17.1
    filelock: 3.0.12
    funcparserlib: 0.3.6
    grako: 3.16.5
    h5py: 3.6.0
    html2text: 2020.1.16
    idna: 3.3
    ihm: 0.21
    imagecodecs: 2021.4.28
    imagesize: 1.3.0
    ipykernel: 5.5.5
    ipython: 7.23.1
    ipython-genutils: 0.2.0
    jedi: 0.18.0
    Jinja2: 3.0.1
    jupyter-client: 6.1.12
    jupyter-core: 4.9.1
    kiwisolver: 1.3.2
    lxml: 4.6.3
    lz4: 3.1.3
    MarkupSafe: 2.0.1
    matplotlib: 3.4.3
    matplotlib-inline: 0.1.3
    msgpack: 1.0.2
    netCDF4: 1.5.7
    networkx: 2.6.3
    numexpr: 2.8.0
    numpy: 1.21.2
    openvr: 1.16.801
    packaging: 21.3
    ParmEd: 3.4.3
    parso: 0.8.3
    pickleshare: 0.7.5
    Pillow: 8.3.2
    pip: 21.2.4
    pkginfo: 1.7.1
    prompt-toolkit: 3.0.23
    psutil: 5.8.0
    pycollada: 0.7.1
    pydicom: 2.1.2
    Pygments: 2.10.0
    PyOpenGL: 3.1.5
    PyOpenGL-accelerate: 3.1.5
    pyparsing: 3.0.6
    PyQt5-commercial: 5.15.2
    PyQt5-sip: 12.8.1
    PyQtWebEngine-commercial: 5.15.2
    python-dateutil: 2.8.2
    python-igraph: 0.9.7
    pytz: 2021.3
    pywin32: 228
    pyzmq: 22.3.0
    qtconsole: 5.1.1
    QtPy: 1.11.2
    RandomWords: 0.3.0
    requests: 2.26.0
    scipy: 1.7.1
    setuptools: 57.5.0
    sfftk-rw: 0.7.1
    six: 1.16.0
    snowballstemmer: 2.2.0
    sortedcontainers: 2.4.0
    Sphinx: 4.2.0
    sphinx-autodoc-typehints: 1.12.0
    sphinxcontrib-applehelp: 1.0.2
    sphinxcontrib-blockdiag: 2.0.0
    sphinxcontrib-devhelp: 1.0.2
    sphinxcontrib-htmlhelp: 2.0.0
    sphinxcontrib-jsmath: 1.0.1
    sphinxcontrib-qthelp: 1.0.3
    sphinxcontrib-serializinghtml: 1.1.5
    suds-jurko: 0.6
    tables: 3.6.1
    texttable: 1.6.4
    tifffile: 2021.4.8
    tinyarray: 1.2.3
    tornado: 6.1
    traitlets: 5.1.1
    urllib3: 1.26.7
    versioneer: 0.21
    wcwidth: 0.2.5
    webcolors: 1.11.1
    wheel: 0.37.0
    wheel-filename: 1.3.0
    WMI: 1.5.1

Attachments (5)

isolde_3io0_frame_times.png (241.3 KB ) - added by Tristan Croll 4 years ago.
ISOLDE improved frame timings 3io0 impact of wireframe.png (239.1 KB ) - added by Tristan Croll 4 years ago.
Code refactored to limit to one new surface per graphics update. Histograms of per-frame times for (left) four transparent surfaces (right) three transparent surfaces + 1 wireframe.
mesh_edges.cpp.diff (1.8 KB ) - added by Tristan Croll 4 years ago.
Added by email2trac
test_mesh.py (464 bytes ) - added by Tom Goddard 4 years ago.
Benchmark mesh edges by rendering map at various levels.
mesh_edges.cpp-1.diff (2.0 KB ) - added by Tristan Croll 4 years ago.
Added by email2trac

Download all attachments as: .zip

Change History (49)

in reply to:  1 ; comment:1 by Tristan Croll, 4 years ago

Their full set of interactive tutorials is at https://jax.readthedocs.io/en/latest/jax-101/index.html.

________________________________
From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
Sent: 19 January 2022 19:10
To: Tristan Croll <tic20@cam.ac.uk>
Subject: [ChimeraX] #5964: ChimeraX bug report submission

#5964: ChimeraX bug report submission
---------------------------+-----------------------------
 Reporter:  Tristan Croll  |                Type:  defect
   Status:  new            |            Priority:  normal
Component:  Unassigned     |          Blocked By:
 Blocking:                 |  Notify when closed:
---------------------------+-----------------------------
 {{{
 The following bug report has been submitted:
 Platform:        Windows-10-10.0.19041
 ChimeraX Version: 1.3rc202112030319 (2021-12-03 03:19:12 UTC)
 Description
 I've just been working through the introductory JAX tutorial at
 https://colab.research.google.com/github/google/jax/blob/main/docs/jax-101/01
 -jax-basics.ipynb. Seems worth a look - developed by Google/DeepMind
 primarily for machine learning work, but it provides ridiculously simple
 GPU parallelisation of a lot of Numpy tasks. Could be applications in
 accelerating the main graphics loop?

 OpenGL version: 3.3.0 NVIDIA 497.29
 OpenGL renderer: NVIDIA GeForce RTX 2080/PCIe/SSE2
 OpenGL vendor: NVIDIA Corporation
 Manufacturer: Notebook
 Model: P7xxTM1
 OS: Microsoft Windows 10 Education (Build 19041)
 Memory: 68,654,501,888
 MaxProcessMemory: 137,438,953,344
 CPU: 16 Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
 OSLanguage: en-GB
 Locale: ('en_GB', 'cp1252')
 PyQt5 5.15.2, Qt 5.15.2
 Installed Packages:
     -: imerax-clipper
     -himerax-clipper: 0.17.0
     -himerax-isolde: 1.3.dev33
     alabaster: 0.7.12
     appdirs: 1.4.4
     Babel: 2.9.1
     backcall: 0.2.0
     blockdiag: 2.0.1
     certifi: 2021.10.8
     cftime: 1.5.1.1
     charset-normalizer: 2.0.8
     ChimeraX-AddCharge: 1.2.2
     ChimeraX-AddH: 2.1.11
     ChimeraX-AlignmentAlgorithms: 2.0
     ChimeraX-AlignmentHdrs: 3.2
     ChimeraX-AlignmentMatrices: 2.0
     ChimeraX-Alignments: 2.2.3
     ChimeraX-AlphaFold: 1.0
     ChimeraX-AltlocExplorer: 1.0.1
     ChimeraX-AmberInfo: 1.0
     ChimeraX-Arrays: 1.0
     ChimeraX-Atomic: 1.31
     ChimeraX-AtomicLibrary: 4.2
     ChimeraX-AtomSearch: 2.0
     ChimeraX-AtomSearchLibrary: 1.0
     ChimeraX-AxesPlanes: 2.0
     ChimeraX-BasicActions: 1.1
     ChimeraX-BILD: 1.0
     ChimeraX-BlastProtein: 2.0
     ChimeraX-BondRot: 2.0
     ChimeraX-BugReporter: 1.0
     ChimeraX-BuildStructure: 2.6.1
     ChimeraX-Bumps: 1.0
     ChimeraX-BundleBuilder: 1.1
     ChimeraX-ButtonPanel: 1.0
     ChimeraX-CageBuilder: 1.0
     ChimeraX-CellPack: 1.0
     ChimeraX-Centroids: 1.2
     ChimeraX-ChemGroup: 2.0
     ChimeraX-Clashes: 2.2.2
     ChimeraX-Clipper: 0.17.0
     ChimeraX-ColorActions: 1.0
     ChimeraX-ColorGlobe: 1.0
     ChimeraX-ColorKey: 1.5
     ChimeraX-CommandLine: 1.1.5
     ChimeraX-ConnectStructure: 2.0
     ChimeraX-Contacts: 1.0
     ChimeraX-Core: 1.3rc202112030319
     ChimeraX-CoreFormats: 1.1
     ChimeraX-coulombic: 1.3.2
     ChimeraX-Crosslinks: 1.0
     ChimeraX-Crystal: 1.0
     ChimeraX-CrystalContacts: 1.0
     ChimeraX-DataFormats: 1.2.2
     ChimeraX-Dicom: 1.0
     ChimeraX-DistMonitor: 1.1.5
     ChimeraX-DistUI: 1.0
     ChimeraX-Dssp: 2.0
     ChimeraX-EMDB-SFF: 1.0
     ChimeraX-ExperimentalCommands: 1.0
     ChimeraX-FileHistory: 1.0
     ChimeraX-FunctionKey: 1.0
     ChimeraX-Geometry: 1.1
     ChimeraX-gltf: 1.0
     ChimeraX-Graphics: 1.1
     ChimeraX-Hbonds: 2.1.2
     ChimeraX-Help: 1.2
     ChimeraX-HKCage: 1.3
     ChimeraX-IHM: 1.1
     ChimeraX-ImageFormats: 1.2
     ChimeraX-IMOD: 1.0
     ChimeraX-IO: 1.0.1
     ChimeraX-ISOLDE: 1.3
     ChimeraX-ItemsInspection: 1.0
     ChimeraX-Label: 1.1
     ChimeraX-ListInfo: 1.1.1
     ChimeraX-Log: 1.1.4
     ChimeraX-LookingGlass: 1.1
     ChimeraX-Maestro: 1.8.1
     ChimeraX-Map: 1.1
     ChimeraX-MapData: 2.0
     ChimeraX-MapEraser: 1.0
     ChimeraX-MapFilter: 2.0
     ChimeraX-MapFit: 2.0
     ChimeraX-MapSeries: 2.1
     ChimeraX-Markers: 1.0
     ChimeraX-Mask: 1.0
     ChimeraX-MatchMaker: 2.0.4
     ChimeraX-MDcrds: 2.6
     ChimeraX-MedicalToolbar: 1.0.1
     ChimeraX-Meeting: 1.0
     ChimeraX-MLP: 1.1
     ChimeraX-mmCIF: 2.4
     ChimeraX-MMTF: 2.1
     ChimeraX-Modeller: 1.2.6
     ChimeraX-ModelPanel: 1.2.1
     ChimeraX-ModelSeries: 1.0
     ChimeraX-Mol2: 2.0
     ChimeraX-Morph: 1.0
     ChimeraX-MouseModes: 1.1
     ChimeraX-Movie: 1.0
     ChimeraX-Neuron: 1.0
     ChimeraX-Nucleotides: 2.0.2
     ChimeraX-OpenCommand: 1.7
     ChimeraX-PDB: 2.6.5
     ChimeraX-PDBBio: 1.0
     ChimeraX-PDBLibrary: 1.0.2
     ChimeraX-PDBMatrices: 1.0
     ChimeraX-PickBlobs: 1.0
     ChimeraX-Positions: 1.0
     ChimeraX-PresetMgr: 1.0.1
     ChimeraX-PubChem: 2.1
     ChimeraX-ReadPbonds: 1.0.1
     ChimeraX-Registration: 1.1
     ChimeraX-RemoteControl: 1.0
     ChimeraX-ResidueFit: 1.0
     ChimeraX-RestServer: 1.1
     ChimeraX-RNALayout: 1.0
     ChimeraX-RotamerLibMgr: 2.0.1
     ChimeraX-RotamerLibsDunbrack: 2.0
     ChimeraX-RotamerLibsDynameomics: 2.0
     ChimeraX-RotamerLibsRichardson: 2.0
     ChimeraX-SaveCommand: 1.5
     ChimeraX-SchemeMgr: 1.0
     ChimeraX-SDF: 2.0
     ChimeraX-Segger: 1.0
     ChimeraX-Segment: 1.0
     ChimeraX-SelInspector: 1.0
     ChimeraX-SeqView: 2.4.6
     ChimeraX-Shape: 1.0.1
     ChimeraX-Shell: 1.0
     ChimeraX-Shortcuts: 1.1
     ChimeraX-ShowAttr: 1.0
     ChimeraX-ShowSequences: 1.0
     ChimeraX-SideView: 1.0
     ChimeraX-Smiles: 2.1
     ChimeraX-SmoothLines: 1.0
     ChimeraX-SpaceNavigator: 1.0
     ChimeraX-StdCommands: 1.6.1
     ChimeraX-STL: 1.0
     ChimeraX-Storm: 1.0
     ChimeraX-Struts: 1.0
     ChimeraX-Surface: 1.0
     ChimeraX-SwapAA: 2.0
     ChimeraX-SwapRes: 2.1
     ChimeraX-TapeMeasure: 1.0
     ChimeraX-Test: 1.0
     ChimeraX-Toolbar: 1.1
     ChimeraX-ToolshedUtils: 1.2
     ChimeraX-Tug: 1.0
     ChimeraX-UI: 1.13.7
     ChimeraX-uniprot: 2.2
     ChimeraX-UnitCell: 1.0
     ChimeraX-ViewDockX: 1.0.1
     ChimeraX-VIPERdb: 1.0
     ChimeraX-Vive: 1.1
     ChimeraX-VolumeMenu: 1.0
     ChimeraX-VTK: 1.0
     ChimeraX-WavefrontOBJ: 1.0
     ChimeraX-WebCam: 1.0
     ChimeraX-WebServices: 1.0
     ChimeraX-Zone: 1.0
     colorama: 0.4.4
     comtypes: 1.1.10
     cxservices: 1.1
     cycler: 0.11.0
     Cython: 0.29.24
     decorator: 5.1.0
     docutils: 0.17.1
     filelock: 3.0.12
     funcparserlib: 0.3.6
     grako: 3.16.5
     h5py: 3.6.0
     html2text: 2020.1.16
     idna: 3.3
     ihm: 0.21
     imagecodecs: 2021.4.28
     imagesize: 1.3.0
     ipykernel: 5.5.5
     ipython: 7.23.1
     ipython-genutils: 0.2.0
     jedi: 0.18.0
     Jinja2: 3.0.1
     jupyter-client: 6.1.12
     jupyter-core: 4.9.1
     kiwisolver: 1.3.2
     lxml: 4.6.3
     lz4: 3.1.3
     MarkupSafe: 2.0.1
     matplotlib: 3.4.3
     matplotlib-inline: 0.1.3
     msgpack: 1.0.2
     netCDF4: 1.5.7
     networkx: 2.6.3
     numexpr: 2.8.0
     numpy: 1.21.2
     openvr: 1.16.801
     packaging: 21.3
     ParmEd: 3.4.3
     parso: 0.8.3
     pickleshare: 0.7.5
     Pillow: 8.3.2
     pip: 21.2.4
     pkginfo: 1.7.1
     prompt-toolkit: 3.0.23
     psutil: 5.8.0
     pycollada: 0.7.1
     pydicom: 2.1.2
     Pygments: 2.10.0
     PyOpenGL: 3.1.5
     PyOpenGL-accelerate: 3.1.5
     pyparsing: 3.0.6
     PyQt5-commercial: 5.15.2
     PyQt5-sip: 12.8.1
     PyQtWebEngine-commercial: 5.15.2
     python-dateutil: 2.8.2
     python-igraph: 0.9.7
     pytz: 2021.3
     pywin32: 228
     pyzmq: 22.3.0
     qtconsole: 5.1.1
     QtPy: 1.11.2
     RandomWords: 0.3.0
     requests: 2.26.0
     scipy: 1.7.1
     setuptools: 57.5.0
     sfftk-rw: 0.7.1
     six: 1.16.0
     snowballstemmer: 2.2.0
     sortedcontainers: 2.4.0
     Sphinx: 4.2.0
     sphinx-autodoc-typehints: 1.12.0
     sphinxcontrib-applehelp: 1.0.2
     sphinxcontrib-blockdiag: 2.0.0
     sphinxcontrib-devhelp: 1.0.2
     sphinxcontrib-htmlhelp: 2.0.0
     sphinxcontrib-jsmath: 1.0.1
     sphinxcontrib-qthelp: 1.0.3
     sphinxcontrib-serializinghtml: 1.1.5
     suds-jurko: 0.6
     tables: 3.6.1
     texttable: 1.6.4
     tifffile: 2021.4.8
     tinyarray: 1.2.3
     tornado: 6.1
     traitlets: 5.1.1
     urllib3: 1.26.7
     versioneer: 0.21
     wcwidth: 0.2.5
     webcolors: 1.11.1
     wheel: 0.37.0
     wheel-filename: 1.3.0
     WMI: 1.5.1

 }}}

--
Ticket URL: <https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/5964>
ChimeraX <https://www.rbvi.ucsf.edu/chimerax/>
ChimeraX Issue Tracker

comment:2 by pett, 4 years ago

Cc: chimera-programmers added
Component: UnassignedPerformance
Owner: set to Tom Goddard
Platform: all
Project: ChimeraX
Status: newassigned
Summary: ChimeraX bug report submissionPossible numpy speedup
Type: defectenhancement

comment:3 by Tom Goddard, 4 years ago

AlphaFold uses jax and my impression of its use in AlphaFold is that it produces less stable code and hard to debug errors, in exchange for faster speed. A bad deal for ChimeraX. But I am curious what in ChimeraX would it speed up? We use numpy often, but where numpy is slow I often use C++. Knowing what numpy in ChimeraX is slowing down users would be helpful.

in reply to:  4 ; comment:4 by Tristan Croll, 4 years ago

Turns out they don't distribute Windows builds on the PyPI either. Shame. The places I thought it could help are the various calculations done on large arrays of data (e.g. the bond transforms, calculations on map values, etc.) - for those it gives GPU acceleration essentially for free. Certainly not a general-purpose thing and not particularly helpful for small arrays, but looked like it could help with some of the performance-critical steps... but I can see the deal-breakers.
________________________________
From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
Sent: 19 January 2022 19:39
Cc: chimera-programmers@cgl.ucsf.edu <chimera-programmers@cgl.ucsf.edu>; goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk>
Subject: Re: [ChimeraX] #5964: Possible numpy speedup

#5964: Possible numpy speedup
------------------------------------+-------------------------
          Reporter:  Tristan Croll  |      Owner:  Tom Goddard
              Type:  enhancement    |     Status:  assigned
          Priority:  normal         |  Milestone:
         Component:  Performance    |    Version:
        Resolution:                 |   Keywords:
        Blocked By:                 |   Blocking:
Notify when closed:                 |   Platform:  all
           Project:  ChimeraX       |
------------------------------------+-------------------------

Comment (by Tom Goddard):

 AlphaFold uses jax and my impression of its use in AlphaFold is that it
 produces less stable code and hard to debug errors, in exchange for faster
 speed.  A bad deal for ChimeraX.  But I am curious what in ChimeraX would
 it speed up?   We use numpy often, but where numpy is slow I often use
 C++.  Knowing what numpy in ChimeraX is slowing down users would be
 helpful.

--
Ticket URL: <https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/5964#comment:3>
ChimeraX <https://www.rbvi.ucsf.edu/chimerax/>
ChimeraX Issue Tracker

in reply to:  5 ; comment:5 by goddard@…, 4 years ago

Things like bond transforms and calculating map values are exactly the kind of things that are in C++ now, not in numpy.  So again I think the real question is what specific calculations need optimizing to improve user experience.  Jax in ChimeraX is a solution looking for a problem.  As usual identifying the most important problems is the hard part -- solving them is usually easy.

in reply to:  6 ; comment:6 by Tristan Croll, 4 years ago

Hmm... seems I'd forgotten just how fast the bond drawing update is these days! I get 1-2 ms for a fairly large structure. Judging from the timings in the example notebooks I suspect that on a machine with a good GPU the JAX jit compilation of an equivalent pure-Numpy implementation may bring that down to sub-millisecond - but the downside of course would be that on machines without capable GPUs it would be substantially slower than your existing code. So I can see why you wouldn't want to go that way.
________________________________
From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
Sent: 19 January 2022 23:13
To: goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk>
Cc: chimera-programmers@cgl.ucsf.edu <chimera-programmers@cgl.ucsf.edu>
Subject: Re: [ChimeraX] #5964: Possible numpy speedup

#5964: Possible numpy speedup
------------------------------------+-------------------------
          Reporter:  Tristan Croll  |      Owner:  Tom Goddard
              Type:  enhancement    |     Status:  assigned
          Priority:  normal         |  Milestone:
         Component:  Performance    |    Version:
        Resolution:                 |   Keywords:
        Blocked By:                 |   Blocking:
Notify when closed:                 |   Platform:  all
           Project:  ChimeraX       |
------------------------------------+-------------------------

Comment (by goddard@…):

 {{{
 Things like bond transforms and calculating map values are exactly the
 kind of things that are in C++ now, not in numpy.  So again I think the
 real question is what specific calculations need optimizing to improve
 user experience.  Jax in ChimeraX is a solution looking for a problem.  As
 usual identifying the most important problems is the hard part -- solving
 them is usually easy.
 }}}

--
Ticket URL: <https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/5964#comment:5>
ChimeraX <https://www.rbvi.ucsf.edu/chimerax/>
ChimeraX Issue Tracker

in reply to:  7 ; comment:7 by goddard@…, 4 years ago

Bond drawing update is computing the cylinder position matrices in C++, not with numpy.  The C++ is probably about 100 times faster than numpy in this case (I am not exaggerating this, the transform calculation would not vectorize well with all the array copies numpy likes to make).  So even if JAX makes numpy 10x faster, it will still be a lot slower than then current C++.

in reply to:  8 ; comment:8 by Tristan Croll, 4 years ago

OK, somewhat more real-world example: ISOLDE's live map updates with crystallographic datasets require reapplying a zone mask every time the map data is updated and re-contoured. That gets painfully slow when the number of vertices involved grows into the hundreds of thousands (particularly since there are typically four contours displayed - two different 2Fo-Fc contours, + and - Fo-FC). I did some playing around with optimisation quite some time back, and the fastest approach I came up with at the time was to define the mask as a fairly coarse binary grid, and decide which triangles to display based on the interpolated value at their coordinates (using Volume.interpolated_values - with trilinear interpolation to avoid jagged edges). Anyway, for a test case with 900k vertices, the interpolated_values() calculation takes 18ms for each surface. JAX provides a wrapper to the scipy.ndimage.map_coordinates() method which does trilinear interpolation when order=1 (note that they only provide GPU acceleration for orders 0 and 1):

from jax.scipy.ndimage import map_coordinates()
map_coordinates(matrix, points.T, order=1)

That times at about 2ms on my GTX2080 laptop, using the Windows jaxlib from https://github.com/cloudhan/jax-windows-builder and jax from PyPI (the vanilla Scipy takes about 55)... although I currently get different answers to v.interpolated_values(), probably because I got the transforms wrong.

Open to suggestions about other ways to improve this, by the way... it's the only real source of framerate drop in ISOLDE these days (turn it off and the rate is a steady 60fps even for big systems).

________________________________
From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
Sent: 20 January 2022 17:54
To: goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk>
Cc: chimera-programmers@cgl.ucsf.edu <chimera-programmers@cgl.ucsf.edu>
Subject: Re: [ChimeraX] #5964: Possible numpy speedup

#5964: Possible numpy speedup
------------------------------------+-------------------------
          Reporter:  Tristan Croll  |      Owner:  Tom Goddard
              Type:  enhancement    |     Status:  assigned
          Priority:  normal         |  Milestone:
         Component:  Performance    |    Version:
        Resolution:                 |   Keywords:
        Blocked By:                 |   Blocking:
Notify when closed:                 |   Platform:  all
           Project:  ChimeraX       |
------------------------------------+-------------------------

Comment (by goddard@…):

 {{{
 Bond drawing update is computing the cylinder position matrices in C++,
 not with numpy.  The C++ is probably about 100 times faster than numpy in
 this case (I am not exaggerating this, the transform calculation would not
 vectorize well with all the array copies numpy likes to make).  So even if
 JAX makes numpy 10x faster, it will still be a lot slower than then
 current C++.
 }}}

--
Ticket URL: <https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/5964#comment:7>
ChimeraX <https://www.rbvi.ucsf.edu/chimerax/>
ChimeraX Issue Tracker

in reply to:  9 ; comment:9 by Tristan Croll, 4 years ago

Make that about 3.5 ms - a more fair comparison includes pulling a value out of it, since it doesn't copy back from the GPU to CPU memory until then.

Anyway, a better solution to my problem would probably be to re-jig the contouring algorithm so it never bothers to calculate triangles outside the mask in the first place, rendering the after-the-fact culling unnecessary. Way off the topic of this thread, though - and would be my job since I'm already using a forked-off version of the contouring method modified to run in its own C++ thread.
________________________________
From: Tristan Croll <tic20@cam.ac.uk>
Sent: 21 January 2022 15:06
To: ChimeraX-bugs@cgl.ucsf.edu <ChimeraX-bugs@cgl.ucsf.edu>
Subject: Re: [ChimeraX] #5964: Possible numpy speedup

OK, somewhat more real-world example: ISOLDE's live map updates with crystallographic datasets require reapplying a zone mask every time the map data is updated and re-contoured. That gets painfully slow when the number of vertices involved grows into the hundreds of thousands (particularly since there are typically four contours displayed - two different 2Fo-Fc contours, + and - Fo-FC). I did some playing around with optimisation quite some time back, and the fastest approach I came up with at the time was to define the mask as a fairly coarse binary grid, and decide which triangles to display based on the interpolated value at their coordinates (using Volume.interpolated_values - with trilinear interpolation to avoid jagged edges). Anyway, for a test case with 900k vertices, the interpolated_values() calculation takes 18ms for each surface. JAX provides a wrapper to the scipy.ndimage.map_coordinates() method which does trilinear interpolation when order=1 (note that they only provide GPU acceleration for orders 0 and 1):

from jax.scipy.ndimage import map_coordinates()
map_coordinates(matrix, points.T, order=1)

That times at about 2ms on my GTX2080 laptop, using the Windows jaxlib from https://github.com/cloudhan/jax-windows-builder and jax from PyPI (the vanilla Scipy takes about 55)... although I currently get different answers to v.interpolated_values(), probably because I got the transforms wrong.

Open to suggestions about other ways to improve this, by the way... it's the only real source of framerate drop in ISOLDE these days (turn it off and the rate is a steady 60fps even for big systems).

________________________________
From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
Sent: 20 January 2022 17:54
To: goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk>
Cc: chimera-programmers@cgl.ucsf.edu <chimera-programmers@cgl.ucsf.edu>
Subject: Re: [ChimeraX] #5964: Possible numpy speedup

#5964: Possible numpy speedup
------------------------------------+-------------------------
          Reporter:  Tristan Croll  |      Owner:  Tom Goddard
              Type:  enhancement    |     Status:  assigned
          Priority:  normal         |  Milestone:
         Component:  Performance    |    Version:
        Resolution:                 |   Keywords:
        Blocked By:                 |   Blocking:
Notify when closed:                 |   Platform:  all
           Project:  ChimeraX       |
------------------------------------+-------------------------

Comment (by goddard@…):

 {{{
 Bond drawing update is computing the cylinder position matrices in C++,
 not with numpy.  The C++ is probably about 100 times faster than numpy in
 this case (I am not exaggerating this, the transform calculation would not
 vectorize well with all the array copies numpy likes to make).  So even if
 JAX makes numpy 10x faster, it will still be a lot slower than then
 current C++.
 }}}

--
Ticket URL: <https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/5964#comment:7>
ChimeraX <https://www.rbvi.ucsf.edu/chimerax/>
ChimeraX Issue Tracker

in reply to:  10 ; comment:10 by goddard@…, 4 years ago

There are likely many ways to speed up your zone operation.  But I don't understand even the basics of what you are doing.  Maybe you contour a very large region, and then make a new much smaller triangle list, maybe you don't make a new triangle list but use the Drawing triangle_mask capability.  I didn't understand how you decide which triangles to keep, the coarse binary grid stuff, aren't you just taking a spherical region that would involve computation of each triangle vertex distance to the center?  And I am baffled about how zone masking could be a slower computation than computing the contour surface itself when map data changes.  So I understand almost nothing about optimizing this zoning.  But still I suspect this is not something any GPU computation would speed up (unless an OpenGL shader culls the triangles for the zone).  The zoning seems so fast an operation involving just a linear pass on the triangles that the parallelization of a GPU is not likely to do anything because it is memory bandwidth limited.  If you are using the Drawing triangle_mask it may well be that the bottleneck is the some slow numpy that simply culls the full triangle array using the mask.  I see in the Drawing class the masked_elements() routine is applying the mask to nx3 triangle array using  "ta = ta[tmask, :]"  Numpy is so darn slow at everything I can imagine even that simple operation being 10x slower than trivial C++ code.

in reply to:  11 ; comment:11 by Tristan Croll, 4 years ago

This is way off the original topic and is probably better discussed in a new ticket, but let me see if I can summarise the current state of things with respect to volume masking.

First, the why: for ISOLDE masking the map down to surround some given subset of atoms (i.e. the effect of the "surface zone {map} near {sel}" command) is often necessary - most commonly to be able to clearly see the currently mobile atoms in a simulation. That's particularly important in crystallographic contexts where the map is composed of infinitely repeating and typically intertwined tiles - if you don't mask, you end up with a huge amount of extra density. To compound the problem, with crystallographic data the map is regularly updating to reflect the atom positions, and it needs to be re-contoured and re-masked every time. Obviously quite a challenge to do without seriously hanging up the graphics... but here's the progress I've made.

Firstly, I've essentially taken the contouring out of the equation by wrapping your existing C++ contouring code into a threaded version (https://github.com/tristanic/chimerax-clipper/tree/master/src/contour) so that part never holds up the graphics (the new contours are applied only when they're ready, on the 'new frame' trigger). The biggest time-sink causing the framerate to drop is the rezoning (i.e. the call to VolumeSurface.auto_remask_triangles()). I've managed some speedups on that (detailed below) but it's still not enough.

The official ChimeraX "surface zone" command uses chimerax.geometry.find_close_points() to cull the out-of-range vertices. That's fast enough for most standard uses (particularly cryo-EM) with relatively small maps and single contours displayed, but far too slow for this context. For a scenario where model m has 7512 atoms and surface s has 445k vertices:

%timeit find_close_points(m.atoms.coords, s.vertices, 3.0)
52.5 ms ± 284 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit find_close_points(m.atoms.coords, s.vertices, 6.0)
134 ms ± 594 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit find_close_points(m.atoms.coords, s.vertices, 12.0)
466 ms ± 995 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

... and multiplying that by 4 surfaces gives you ~200-2000 milliseconds for this step alone. Instead, what I currently do is the following:

1. Generate a binary mask around the atoms on a cubic grid (currently at a spacing of 1.5 A) (core code at https://github.com/tristanic/chimerax-clipper/blob/master/src/maps/_maps/mask.cpp). That's super-fast for small masking radii, but still gets slow for larger radii (although it could be sped up dramatically with little negative effect by more intelligently choosing the grid spacing - something like max(1.5, radius/3)):


%timeit mask.generate_mask(m.atoms.coords, 3, reuse_existing=False)
3.95 ms ± 12.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit mask.generate_mask(m.atoms.coords, 6, reuse_existing=False)
14.3 ms ± 21.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit mask.generate_mask(m.atoms.coords, 12, reuse_existing=False)
75.8 ms ± 214 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Anyway, the advantage here is that this mask can then be re-used for all the surfaces you plan to mask, and it makes the next step much faster:

2. Rather than using find_close_points(), you can just use mask.interpolated_values(s.vertices) > some_threshold to decide which triangles to keep - a job that's O(1) with respect to the mask and O(n) with respect to the vertices.

# 3A cutoff
%timeit mask.interpolated_values(s.vertices) > zm.threshold
8.35 ms ± 29.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

#6A cutoff
%timeit mask.interpolated_values(s.vertices) > zm.threshold
11.6 ms ± 51.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

#12A cutoff
%timeit mask.interpolated_values(s.vertices) > zm.threshold
14.9 ms ± 41.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

(speed difference is simply due to the increased number of vertices)

So for the four-map case we're down from ~200ms to (3.95+8.35*4) = 38ms for the 3A cutoff, and from ~2000ms to (75.8+14.9*4) = 135.4ms for the 12A cutoff. Fast enough to make the dynamic maps usable, but still a noticeable and annoying hitch in 2D and a complete killer in VR. I guess the most sensible next step would be to also put this in its own thread, but I've held off so far because it will make the choreography even more complex than it already is. I guess that's why I was hoping that Jax might offer an easy alternative.

________________________________
From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
Sent: 22 January 2022 01:01
To: goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk>
Cc: chimera-programmers@cgl.ucsf.edu <chimera-programmers@cgl.ucsf.edu>
Subject: Re: [ChimeraX] #5964: Possible numpy speedup

#5964: Possible numpy speedup
------------------------------------+-------------------------
          Reporter:  Tristan Croll  |      Owner:  Tom Goddard
              Type:  enhancement    |     Status:  assigned
          Priority:  normal         |  Milestone:
         Component:  Performance    |    Version:
        Resolution:                 |   Keywords:
        Blocked By:                 |   Blocking:
Notify when closed:                 |   Platform:  all
           Project:  ChimeraX       |
------------------------------------+-------------------------

Comment (by goddard@…):

 {{{
 There are likely many ways to speed up your zone operation.  But I don't
 understand even the basics of what you are doing.  Maybe you contour a
 very large region, and then make a new much smaller triangle list, maybe
 you don't make a new triangle list but use the Drawing triangle_mask
 capability.  I didn't understand how you decide which triangles to keep,
 the coarse binary grid stuff, aren't you just taking a spherical region
 that would involve computation of each triangle vertex distance to the
 center?  And I am baffled about how zone masking could be a slower
 computation than computing the contour surface itself when map data
 changes.  So I understand almost nothing about optimizing this zoning.
 But still I suspect this is not something any GPU computation would speed
 up (unless an OpenGL shader culls the triangles for the zone).  The zoning
 seems so fast an operation involving just a linear pass on the triangles
 that the parallelization of a GPU is not likely to do anything because it
 is memory bandwidth limited.  If you are using the Drawing triangle_mask
 it may well be that the bottleneck is the some slow numpy that simply
 culls the full triangle array using the mask.  I see in the Drawing class
 the masked_elements() routine is applying the mask to nx3 triangle array
 using  "ta = ta[tmask, :]"  Numpy is so darn slow at everything I can
 imagine even that simple operation being 10x slower than trivial C++ code.

 }}}

--
Ticket URL: <https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/5964#comment:10>
ChimeraX <https://www.rbvi.ucsf.edu/chimerax/>
ChimeraX Issue Tracker

in reply to:  12 ; comment:12 by goddard@…, 4 years ago

I think this is on topic for this ticket about speeding up numpy with JAX.  It is important to identify a case where it would help to concretely assess the trade offs.

Thanks for the details.  I did not get from the earlier discussion that this was a zone around a subset of atoms, thought it was the roaming spherical zone previously.  Of course a grid will be faster than find_close_points().  It might be worth trying mask.interpolated_values(s.vertices, method = 'nearest').  The default interpolation method of linear does 8 times more map lookups and adds 10-20 floating point operations per point.  Not sure if that matters at all in the time but worth a try.  Linear may make less jagged surface edges than nearest.  Another thing that may significantly reduce the mask interpolation time is avoiding reallocating the returned array of interpolated values.  The map_data.arrays.interpolated_data_values() takes a values argument (numpy float32 array equal in length to points array).  Allocating numpy arrays can be very slow so avoiding that by using the same array each time might help.  Of course your vertex array size keeps changing on each recontour, but just as a test to see how much this is slowing it down you might allocate a big enough array to handle the case you are testing.

Another thought on optimizing that mask operation.  If your atom subset is small then almost all of those surface points are far from atoms and it may be worth culling those far points faster or sooner.  To cull them sooner you could compute the contour surface only in a minimal size box that contains the atoms plus a little padding.  This will of course immensely speed up the contour surface calculation too.  Maybe you already do that.

Another major factor is how often you rezone.  If you rezone every graphics frame when the atoms have only moved a tiny amount and you don't have a new contour surface, this is just wasted time.  So reducing the updates is another approach.  This has the drawback that if you update every 10 frames you may end up seeing a glitch every 10 seconds which is visually annoying -- definitely a disaster for VR.

Maybe the most sensible approach is to put the rezoning in a separate thread as you mentioned.

Also I am not sure you've captured all the factors of the rezoning effecting the frame rate.   There is code to compute the triangle mask from the vertex mask.  Then changing the triangle mask every frame is going to probably have a significant cost in the graphics code which will be applying that mask to get a reduced triangle array, and allocating a new OpenGL vertex buffer and filling it. The time for that could be significant.


comment:13 by Tom Goddard, 4 years ago

Cc: Tom Goddard added; chimera-programmers removed
Owner: changed from Tom Goddard to Tristan Croll
Summary: Possible numpy speedupOptimizing ISOLDE map zoning using JAX

comment:14 by Tristan Croll, 4 years ago

Yes - at the moment I limit the re-masking to twice a second, or when the map values change. If the map is unchanging, then there's an option in the GUI to turn off remasking entirely during simulations - so the problem can be pretty trivially avoided for the EM case. The mask is indeed calculated in a minimal box+padding around the model. I ended up going for the linear interpolation rather than nearest-neighbor because it allowed for a *much* coarser mask while still giving quite nice-looking results - the jagged artifacts you get with a coarse mask and nearest-neighbor interpolation aren't pretty.

You're right that this doesn't cover *all* of the rate-limiting factors, but it's definitely the single most costly step - roughly 70% of the cost of a map update, I think.

comment:15 by Tristan Croll, 4 years ago

Hmm - actually, I'm forgetting what I have and haven't done. The GUI *doesn't* have the option to turn off the remasking at the moment (although it should - oversight on my part) - what it does have is the ability to turn off the live crystallographic structure factor recalculation, which has the side-effect of turning off the live remasking. Will look into correcting that.

in reply to:  16 ; comment:16 by goddard@…, 4 years ago

Your test timing with 445K vertices is taking most of the time in the interpolated_values() call, 8 msec.  As I mentioned before these linear scans of data are going to be memory bandwidth limited.  For instance your vertices are 5 Mbytes (445K * 12) of data, to read that in say 1 msec would require 5 Gbytes/sec bandwidth.  And it is doing trilinear interpolation so sampling 8 float values for each vertex, so reading another 10 Mbytes of data from the mask array (random access which could result in cache misses if the mask array does not fit in CPU cache), so a 1 msec speed would take at least 15 Gbytes/sec bandwidth.  With clock speeds of a few GHz and reading all these 4 byte values so you aren't using the full memory bus data width, it may simply be impossible to get to 1 msec.  I'd guess another factor of 2 can be speed up from your 8 msec could be obtained if you work hard enough.  JAX is unlikely to get you very far because you just add transferring all this data back and forth to the GPU, although then the GPU memory bandwidth will probably be a good deal higher and things like interpolating could potentially be done in parallel.  Still I would say you need to think about how fast is good enough, and about what user interface would allow users to turn off some of the slow stuff if they need to do that.

comment:17 by Tristan Croll, 4 years ago

Found a few opportunities for optimisation without going to the extent of threading the masking task - the more I look at that, the more I realise it would take a massive overhaul to do well, and I'm no longer convinced it's the source of the pronounced "hitches". Biggest gain was actually copying the volume data to the thread: I was using a function that takes the data in kji order and copies it (in C++, using PyBind11's wrapper around numpy slicing) into the ijk order required by the contouring code. But then I remembered that Clipper's crystallographic maps are natively in ijk order already (what Volume sees is the transposed view)... so all I had to do was provide it from Python as numpy.ascontiguousarray(data.T) and the C++ copying then became a simple linear iteration. For a 900k voxel test case that cuts the cost of that step from 3ms to 1ms per surface (times 4 surfaces in the standard view). Also found where I was doing some silly duplication of work due to not fully thinking through the change from find_close_points() to my mask-based approach. Things are definitely a bit smoother now, with most of the frames in which the maps are recontoured taking about 15 ms longer than the typical frame... but then there's a handful of frames that take substantially longer, particularly when the map is larger. Will attach some illustrative histograms using the isolde demo crystal_intro test case - on my machine the maps are updating about 3-4 times per second with an overall graphics rate reported by graphics rate true around 45-50 fps, so from the ratio of peak heights it's pretty clear where (most of) the map update frames fall. But where does the handful of much slower frames come from? Python's garbage collection cleaning up the discarded surfaces, perhaps?

by Tristan Croll, 4 years ago

Attachment: isolde_3io0_frame_times.png added

in reply to:  19 comment:18 by Tristan Croll, 4 years ago

Running cProfile over the entire ChimeraX application with ISOLDE running the demo simulation for a few minutes suggests that the main remaining source of lag is `chimerax.graphics._graphics.masked_edges`:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1  115.156  115.156  245.959  245.959 {built-in method exec_}
     1146   24.355    0.021   24.355    0.021 {built-in method chimerax.graphics._graphics.masked_edges}
  114/108    7.304    0.064    7.317    0.068 {built-in method _imp.create_dynamic}

... which if I understand correctly is only being called for the map displayed as wireframe. Since each individual surface is being contoured asynchronously, I think all the really big lag spikes are happening when multiple surfaces happen to be ready in the same frame. If I switch all maps to transparent surface mode everything gets noticeably smoother (backed up by a histogram of frame timings without any of the 80-100ms events). Anyway, is there a way for me to avoid the `masked_edges()` call, and tell it to just display all the edges?

Beyond that, I suppose the next easiest thing I could do without making enormous changes would be to add some mechanism to make sure only one contour is applied in a given graphics update. Wouldn't improve the overall frame rate, but at least the effects would be smoothed over multiple frames.



________________________________
From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
Sent: 21 February 2022 19:39
Cc: goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk>
Subject: Re: [ChimeraX] #5964: Optimizing ISOLDE map zoning using JAX

#5964: Optimizing ISOLDE map zoning using JAX
------------------------------------+---------------------------
          Reporter:  Tristan Croll  |      Owner:  Tristan Croll
              Type:  enhancement    |     Status:  assigned
          Priority:  normal         |  Milestone:
         Component:  Performance    |    Version:
        Resolution:                 |   Keywords:
        Blocked By:                 |   Blocking:
Notify when closed:                 |   Platform:  all
           Project:  ChimeraX       |
------------------------------------+---------------------------
Changes (by Tristan Croll):

 * Attachment "isolde_3io0_frame_times.png" added.


--
Ticket URL: <https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/5964>
ChimeraX <https://www.rbvi.ucsf.edu/chimerax/>
ChimeraX Issue Tracker

by Tristan Croll, 4 years ago

Code refactored to limit to one new surface per graphics update. Histograms of per-frame times for (left) four transparent surfaces (right) three transparent surfaces + 1 wireframe.

comment:19 by Tristan Croll, 4 years ago

Limiting to one surface update per new frame turned out to be easier than I expected, and it definitely does smooth things out a bit. If I set all the surfaces to transparent, then with the whole of 3io0 mobile (229 residues) I'm getting ~55 fps on my Windows machine (specs in the original post on the ticket). Switching one to wireframe brings back the occasional big delay. If the masked_edges() call can be factored out then we're at least starting to approach VR-level performance even with live maps running.

in reply to:  22 comment:20 by goddard@…, 4 years ago

I will look at how masked_edges() can be optimized.

comment:21 by Tom Goddard, 4 years ago

The _graphics.masked_edges() code is already custom written C++. It is taking a triangulated surface and computing a list of edges to draw the mesh. It is slow because it is using C++ std::set which is a dog (binary tree instead of hash map) in order to remove duplicate edges since usually a triangle will have an edge from vertex i to j and another triangle will have a vertex from j to i and we only want one of these directed edges. If we know that the surface is closed and manifold (exactly two triangles meet at eaach edge with opposite edge directions) we could just take the i<j edge and would not need a set. But the code has no way to know if the surface is closed.

I don't see any easy way to optimize the set out of this, but tell me if you have an idea.

in reply to:  24 ; comment:22 by Tristan Croll, 4 years ago

Well... switching to the hash-based `unordered_set` as per the attached diff gets a roughly 30% speedup (from 22 to 16ms) on the 3io0 demo model. For a much bigger test case (7ogu, ~2.1m triangles per surface) the overall effect is closer to 2-fold - the slow frames take 500-600ms vs 1000-1100. Not a silver bullet, but an easy improvement... I'm guessing the ultimate solution once again boils down to threading.
________________________________
From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
Sent: 23 February 2022 08:05
Cc: goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk>
Subject: Re: [ChimeraX] #5964: Optimizing ISOLDE map zoning using JAX

#5964: Optimizing ISOLDE map zoning using JAX
------------------------------------+---------------------------
          Reporter:  Tristan Croll  |      Owner:  Tristan Croll
              Type:  enhancement    |     Status:  assigned
          Priority:  normal         |  Milestone:
         Component:  Performance    |    Version:
        Resolution:                 |   Keywords:
        Blocked By:                 |   Blocking:
Notify when closed:                 |   Platform:  all
           Project:  ChimeraX       |
------------------------------------+---------------------------

Old description:


New description:

 {{{
 The following bug report has been submitted:
 Platform:        Windows-10-10.0.19041
 ChimeraX Version: 1.3rc202112030319 (2021-12-03 03:19:12 UTC)
 Description
 I've just been working through the introductory JAX tutorial at
 https://colab.research.google.com/github/google/jax/blob/main/docs/jax-101/01
 -jax-basics.ipynb. Seems worth a look - developed by Google/DeepMind
 primarily for machine learning work, but it provides ridiculously simple
 GPU parallelisation of a lot of Numpy tasks. Could be applications in
 accelerating the main graphics loop?

 OpenGL version: 3.3.0 NVIDIA 497.29
 OpenGL renderer: NVIDIA GeForce RTX 2080/PCIe/SSE2
 OpenGL vendor: NVIDIA Corporation
 Manufacturer: Notebook
 Model: P7xxTM1
 OS: Microsoft Windows 10 Education (Build 19041)
 Memory: 68,654,501,888
 MaxProcessMemory: 137,438,953,344
 CPU: 16 Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
 OSLanguage: en-GB
 Locale: ('en_GB', 'cp1252')
 PyQt5 5.15.2, Qt 5.15.2
 Installed Packages:
     -: imerax-clipper
     -himerax-clipper: 0.17.0
     -himerax-isolde: 1.3.dev33
     alabaster: 0.7.12
     appdirs: 1.4.4
     Babel: 2.9.1
     backcall: 0.2.0
     blockdiag: 2.0.1
     certifi: 2021.10.8
     cftime: 1.5.1.1
     charset-normalizer: 2.0.8
     ChimeraX-AddCharge: 1.2.2
     ChimeraX-AddH: 2.1.11
     ChimeraX-AlignmentAlgorithms: 2.0
     ChimeraX-AlignmentHdrs: 3.2
     ChimeraX-AlignmentMatrices: 2.0
     ChimeraX-Alignments: 2.2.3
     ChimeraX-AlphaFold: 1.0
     ChimeraX-AltlocExplorer: 1.0.1
     ChimeraX-AmberInfo: 1.0
     ChimeraX-Arrays: 1.0
     ChimeraX-Atomic: 1.31
     ChimeraX-AtomicLibrary: 4.2
     ChimeraX-AtomSearch: 2.0
     ChimeraX-AtomSearchLibrary: 1.0
     ChimeraX-AxesPlanes: 2.0
     ChimeraX-BasicActions: 1.1
     ChimeraX-BILD: 1.0
     ChimeraX-BlastProtein: 2.0
     ChimeraX-BondRot: 2.0
     ChimeraX-BugReporter: 1.0
     ChimeraX-BuildStructure: 2.6.1
     ChimeraX-Bumps: 1.0
     ChimeraX-BundleBuilder: 1.1
     ChimeraX-ButtonPanel: 1.0
     ChimeraX-CageBuilder: 1.0
     ChimeraX-CellPack: 1.0
     ChimeraX-Centroids: 1.2
     ChimeraX-ChemGroup: 2.0
     ChimeraX-Clashes: 2.2.2
     ChimeraX-Clipper: 0.17.0
     ChimeraX-ColorActions: 1.0
     ChimeraX-ColorGlobe: 1.0
     ChimeraX-ColorKey: 1.5
     ChimeraX-CommandLine: 1.1.5
     ChimeraX-ConnectStructure: 2.0
     ChimeraX-Contacts: 1.0
     ChimeraX-Core: 1.3rc202112030319
     ChimeraX-CoreFormats: 1.1
     ChimeraX-coulombic: 1.3.2
     ChimeraX-Crosslinks: 1.0
     ChimeraX-Crystal: 1.0
     ChimeraX-CrystalContacts: 1.0
     ChimeraX-DataFormats: 1.2.2
     ChimeraX-Dicom: 1.0
     ChimeraX-DistMonitor: 1.1.5
     ChimeraX-DistUI: 1.0
     ChimeraX-Dssp: 2.0
     ChimeraX-EMDB-SFF: 1.0
     ChimeraX-ExperimentalCommands: 1.0
     ChimeraX-FileHistory: 1.0
     ChimeraX-FunctionKey: 1.0
     ChimeraX-Geometry: 1.1
     ChimeraX-gltf: 1.0
     ChimeraX-Graphics: 1.1
     ChimeraX-Hbonds: 2.1.2
     ChimeraX-Help: 1.2
     ChimeraX-HKCage: 1.3
     ChimeraX-IHM: 1.1
     ChimeraX-ImageFormats: 1.2
     ChimeraX-IMOD: 1.0
     ChimeraX-IO: 1.0.1
     ChimeraX-ISOLDE: 1.3
     ChimeraX-ItemsInspection: 1.0
     ChimeraX-Label: 1.1
     ChimeraX-ListInfo: 1.1.1
     ChimeraX-Log: 1.1.4
     ChimeraX-LookingGlass: 1.1
     ChimeraX-Maestro: 1.8.1
     ChimeraX-Map: 1.1
     ChimeraX-MapData: 2.0
     ChimeraX-MapEraser: 1.0
     ChimeraX-MapFilter: 2.0
     ChimeraX-MapFit: 2.0
     ChimeraX-MapSeries: 2.1
     ChimeraX-Markers: 1.0
     ChimeraX-Mask: 1.0
     ChimeraX-MatchMaker: 2.0.4
     ChimeraX-MDcrds: 2.6
     ChimeraX-MedicalToolbar: 1.0.1
     ChimeraX-Meeting: 1.0
     ChimeraX-MLP: 1.1
     ChimeraX-mmCIF: 2.4
     ChimeraX-MMTF: 2.1
     ChimeraX-Modeller: 1.2.6
     ChimeraX-ModelPanel: 1.2.1
     ChimeraX-ModelSeries: 1.0
     ChimeraX-Mol2: 2.0
     ChimeraX-Morph: 1.0
     ChimeraX-MouseModes: 1.1
     ChimeraX-Movie: 1.0
     ChimeraX-Neuron: 1.0
     ChimeraX-Nucleotides: 2.0.2
     ChimeraX-OpenCommand: 1.7
     ChimeraX-PDB: 2.6.5
     ChimeraX-PDBBio: 1.0
     ChimeraX-PDBLibrary: 1.0.2
     ChimeraX-PDBMatrices: 1.0
     ChimeraX-PickBlobs: 1.0
     ChimeraX-Positions: 1.0
     ChimeraX-PresetMgr: 1.0.1
     ChimeraX-PubChem: 2.1
     ChimeraX-ReadPbonds: 1.0.1
     ChimeraX-Registration: 1.1
     ChimeraX-RemoteControl: 1.0
     ChimeraX-ResidueFit: 1.0
     ChimeraX-RestServer: 1.1
     ChimeraX-RNALayout: 1.0
     ChimeraX-RotamerLibMgr: 2.0.1
     ChimeraX-RotamerLibsDunbrack: 2.0
     ChimeraX-RotamerLibsDynameomics: 2.0
     ChimeraX-RotamerLibsRichardson: 2.0
     ChimeraX-SaveCommand: 1.5
     ChimeraX-SchemeMgr: 1.0
     ChimeraX-SDF: 2.0
     ChimeraX-Segger: 1.0
     ChimeraX-Segment: 1.0
     ChimeraX-SelInspector: 1.0
     ChimeraX-SeqView: 2.4.6
     ChimeraX-Shape: 1.0.1
     ChimeraX-Shell: 1.0
     ChimeraX-Shortcuts: 1.1
     ChimeraX-ShowAttr: 1.0
     ChimeraX-ShowSequences: 1.0
     ChimeraX-SideView: 1.0
     ChimeraX-Smiles: 2.1
     ChimeraX-SmoothLines: 1.0
     ChimeraX-SpaceNavigator: 1.0
     ChimeraX-StdCommands: 1.6.1
     ChimeraX-STL: 1.0
     ChimeraX-Storm: 1.0
     ChimeraX-Struts: 1.0
     ChimeraX-Surface: 1.0
     ChimeraX-SwapAA: 2.0
     ChimeraX-SwapRes: 2.1
     ChimeraX-TapeMeasure: 1.0
     ChimeraX-Test: 1.0
     ChimeraX-Toolbar: 1.1
     ChimeraX-ToolshedUtils: 1.2
     ChimeraX-Tug: 1.0
     ChimeraX-UI: 1.13.7
     ChimeraX-uniprot: 2.2
     ChimeraX-UnitCell: 1.0
     ChimeraX-ViewDockX: 1.0.1
     ChimeraX-VIPERdb: 1.0
     ChimeraX-Vive: 1.1
     ChimeraX-VolumeMenu: 1.0
     ChimeraX-VTK: 1.0
     ChimeraX-WavefrontOBJ: 1.0
     ChimeraX-WebCam: 1.0
     ChimeraX-WebServices: 1.0
     ChimeraX-Zone: 1.0
     colorama: 0.4.4
     comtypes: 1.1.10
     cxservices: 1.1
     cycler: 0.11.0
     Cython: 0.29.24
     decorator: 5.1.0
     docutils: 0.17.1
     filelock: 3.0.12
     funcparserlib: 0.3.6
     grako: 3.16.5
     h5py: 3.6.0
     html2text: 2020.1.16
     idna: 3.3
     ihm: 0.21
     imagecodecs: 2021.4.28
     imagesize: 1.3.0
     ipykernel: 5.5.5
     ipython: 7.23.1
     ipython-genutils: 0.2.0
     jedi: 0.18.0
     Jinja2: 3.0.1
     jupyter-client: 6.1.12
     jupyter-core: 4.9.1
     kiwisolver: 1.3.2
     lxml: 4.6.3
     lz4: 3.1.3
     MarkupSafe: 2.0.1
     matplotlib: 3.4.3
     matplotlib-inline: 0.1.3
     msgpack: 1.0.2
     netCDF4: 1.5.7
     networkx: 2.6.3
     numexpr: 2.8.0
     numpy: 1.21.2
     openvr: 1.16.801
     packaging: 21.3
     ParmEd: 3.4.3
     parso: 0.8.3
     pickleshare: 0.7.5
     Pillow: 8.3.2
     pip: 21.2.4
     pkginfo: 1.7.1
     prompt-toolkit: 3.0.23
     psutil: 5.8.0
     pycollada: 0.7.1
     pydicom: 2.1.2
     Pygments: 2.10.0
     PyOpenGL: 3.1.5
     PyOpenGL-accelerate: 3.1.5
     pyparsing: 3.0.6
     PyQt5-commercial: 5.15.2
     PyQt5-sip: 12.8.1
     PyQtWebEngine-commercial: 5.15.2
     python-dateutil: 2.8.2
     python-igraph: 0.9.7
     pytz: 2021.3
     pywin32: 228
     pyzmq: 22.3.0
     qtconsole: 5.1.1
     QtPy: 1.11.2
     RandomWords: 0.3.0
     requests: 2.26.0
     scipy: 1.7.1
     setuptools: 57.5.0
     sfftk-rw: 0.7.1
     six: 1.16.0
     snowballstemmer: 2.2.0
     sortedcontainers: 2.4.0
     Sphinx: 4.2.0
     sphinx-autodoc-typehints: 1.12.0
     sphinxcontrib-applehelp: 1.0.2
     sphinxcontrib-blockdiag: 2.0.0
     sphinxcontrib-devhelp: 1.0.2
     sphinxcontrib-htmlhelp: 2.0.0
     sphinxcontrib-jsmath: 1.0.1
     sphinxcontrib-qthelp: 1.0.3
     sphinxcontrib-serializinghtml: 1.1.5
     suds-jurko: 0.6
     tables: 3.6.1
     texttable: 1.6.4
     tifffile: 2021.4.8
     tinyarray: 1.2.3
     tornado: 6.1
     traitlets: 5.1.1
     urllib3: 1.26.7
     versioneer: 0.21
     wcwidth: 0.2.5
     webcolors: 1.11.1
     wheel: 0.37.0
     wheel-filename: 1.3.0
     WMI: 1.5.1

 }}}

--

Comment (by Tom Goddard):

 The _graphics.masked_edges() code is already custom written C++.  It is
 taking a triangulated surface and computing a list of edges to draw the
 mesh.  It is slow because it is using C++ std::set which is a dog (binary
 tree instead of hash map) in order to remove duplicate edges since usually
 a triangle will have an edge from vertex i to j and another triangle will
 have a vertex from j to i and we only want one of these directed edges.
 If we know that the surface is closed and manifold (exactly two triangles
 meet at eaach edge with opposite edge directions) we could just take the
 i<j edge and would not need a set.  But the code has no way to know if the
 surface is closed.

 I don't see any easy way to optimize the set out of this, but tell me if
 you have an idea.

--
Ticket URL: <https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/5964#comment:21>
ChimeraX <https://www.rbvi.ucsf.edu/chimerax/>
ChimeraX Issue Tracker

mesh_edges.cpp.diff

by Tristan Croll, 4 years ago

Attachment: mesh_edges.cpp.diff added

Added by email2trac

comment:23 by Tom Goddard, 4 years ago

I could change to std::unordered_set. I recall benchmarks here sometimes it is slower and sometimes faster than std::set. In theory it should be faster but every time I investigated switching some code from set to unordered_set it was a bust with erratic improvement so I did not switch.

comment:24 by Tom Goddard, 4 years ago

If you think you have done adequate test cases for unordered_set and want me to switch it say so.

in reply to:  28 comment:25 by Tristan Croll, 4 years ago

I'd say go for it - in my tests it's consistently and substantially faster for this application. Makes sense, I think: unordered_set is supposed to be O(1) on average where set is O(logN) - so you'd expect to see the biggest gains on very large sets like this.
________________________________
From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
Sent: 23 February 2022 19:10
Cc: goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk>
Subject: Re: [ChimeraX] #5964: Optimizing ISOLDE map zoning using JAX

#5964: Optimizing ISOLDE map zoning using JAX
------------------------------------+---------------------------
          Reporter:  Tristan Croll  |      Owner:  Tristan Croll
              Type:  enhancement    |     Status:  assigned
          Priority:  normal         |  Milestone:
         Component:  Performance    |    Version:
        Resolution:                 |   Keywords:
        Blocked By:                 |   Blocking:
Notify when closed:                 |   Platform:  all
           Project:  ChimeraX       |
------------------------------------+---------------------------

Comment (by Tom Goddard):

 If you think you have done adequate test cases for unordered_set and want
 me to switch it say so.

--
Ticket URL: <https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/5964#comment:24>
ChimeraX <https://www.rbvi.ucsf.edu/chimerax/>
ChimeraX Issue Tracker

comment:26 by Tom Goddard, 4 years ago

I ran a benchmark on your unordered_set change, 34% slower (206 seconds vs 154 seconds) using EMDB map 12873 with 700,000 to 10,000,000 triangle meshes on macOS 10.15.7, Intel quad-core i7, MacBookPro10,1.

Just like all my past experience, C++ unordered_set vs set is a total crap shoot.

I'm going to leave using std::set, unless convincing testing on multiple platforms is done.

My benchmark code is attached.

by Tom Goddard, 4 years ago

Attachment: test_mesh.py added

Benchmark mesh edges by rendering map at various levels.

in reply to:  31 comment:27 by Tristan Croll, 4 years ago

Sigh. That's a real shame.
________________________________
From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
Sent: 23 February 2022 20:04
Cc: goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk>
Subject: Re: [ChimeraX] #5964: Optimizing ISOLDE map zoning using JAX

#5964: Optimizing ISOLDE map zoning using JAX
------------------------------------+---------------------------
          Reporter:  Tristan Croll  |      Owner:  Tristan Croll
              Type:  enhancement    |     Status:  assigned
          Priority:  normal         |  Milestone:
         Component:  Performance    |    Version:
        Resolution:                 |   Keywords:
        Blocked By:                 |   Blocking:
Notify when closed:                 |   Platform:  all
           Project:  ChimeraX       |
------------------------------------+---------------------------

Comment (by Tom Goddard):

 I ran a benchmark on your unordered_set change, 34% slower (206 seconds vs
 154 seconds) using EMDB map 12873 with 700,000 to 10,000,000 triangle
 meshes on macOS 10.15.7, Intel quad-core i7, MacBookPro10,1.

 Just like all my past experience, C++ unordered_set vs set is a total crap
 shoot.

 I'm going to leave using std::set, unless convincing testing on multiple
 platforms is done.

 My benchmark code is attached.

--
Ticket URL: <https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/5964#comment:26>
ChimeraX <https://www.rbvi.ucsf.edu/chimerax/>
ChimeraX Issue Tracker

in reply to:  32 ; comment:28 by Tristan Croll, 4 years ago

Caveat of course is that so far I've only tested in Windows...
________________________________
From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
Sent: 23 February 2022 19:22
To: Tristan Croll <tic20@cam.ac.uk>
Cc: goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>
Subject: Re: [ChimeraX] #5964: Optimizing ISOLDE map zoning using JAX

#5964: Optimizing ISOLDE map zoning using JAX
------------------------------------+---------------------------
          Reporter:  Tristan Croll  |      Owner:  Tristan Croll
              Type:  enhancement    |     Status:  assigned
          Priority:  normal         |  Milestone:
         Component:  Performance    |    Version:
        Resolution:                 |   Keywords:
        Blocked By:                 |   Blocking:
Notify when closed:                 |   Platform:  all
           Project:  ChimeraX       |
------------------------------------+---------------------------

Comment (by Tristan Croll):

 {{{
 I'd say go for it - in my tests it's consistently and substantially faster
 for this application. Makes sense, I think: unordered_set is supposed to
 be O(1) on average where set is O(logN) - so you'd expect to see the
 biggest gains on very large sets like this.
 ________________________________
 From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
 Sent: 23 February 2022 19:10
 Cc: goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>; Tristan Croll
 <tic20@cam.ac.uk>
 Subject: Re: [ChimeraX] #5964: Optimizing ISOLDE map zoning using JAX

 #5964: Optimizing ISOLDE map zoning using JAX
 ------------------------------------+---------------------------
           Reporter:  Tristan Croll  |      Owner:  Tristan Croll
               Type:  enhancement    |     Status:  assigned
           Priority:  normal         |  Milestone:
          Component:  Performance    |    Version:
         Resolution:                 |   Keywords:
         Blocked By:                 |   Blocking:
 Notify when closed:                 |   Platform:  all
            Project:  ChimeraX       |
 ------------------------------------+---------------------------

 Comment (by Tom Goddard):

  If you think you have done adequate test cases for unordered_set and want
  me to switch it say so.

 --
 Ticket URL:
 <https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/5964#comment:24>
 ChimeraX <https://www.rbvi.ucsf.edu/chimerax/>
 ChimeraX Issue Tracker
 }}}

--
Ticket URL: <https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/5964#comment:25>
ChimeraX <https://www.rbvi.ucsf.edu/chimerax/>
ChimeraX Issue Tracker

comment:29 by Tom Goddard, 4 years ago

Out of curiosity I ran my benchmark with surface style instead of mesh 31 seconds, 5 times faster than mesh! Not clear that it is all due to the mesh edges code, could be a also other parts of the OpenGL rendering.

Another test shows it really is 5 times slower due to the std::set. I replaced the std::set with std::vector and took all edges with i0 < i1 since these meshes always have one edge (i0,i1) and one (i1,i0) it gives the right answer in this case. 36 seconds in my mesh benchmark displaying mesh, vs 154 seconds with std::set.

So optimization here would be nice. It is ashame C++ does not have a fast hashmap implementation.

in reply to:  34 comment:30 by Tristan Croll, 4 years ago

One more thing that might be worth a try: since you know going in how many triangles there are, you can use unordered_map.reserve() to prepare it up-front for the expected number of edges (n_triangles * 2 is a generous estimate, I think). Doing so (and also using a slightly different hash that should​ be collision-proof) seems to give slightly improved performance on my Windows machine - perhaps it will be more pronounced on the Mac? It's possible that the Mac implementation is just more conservative about how much more space it allocates when it needs to resize (somewhat expensive because it needs to rebuild the hash map each time).
________________________________
From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
Sent: 23 February 2022 20:24
Cc: goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk>
Subject: Re: [ChimeraX] #5964: Optimizing ISOLDE map zoning using JAX

#5964: Optimizing ISOLDE map zoning using JAX
------------------------------------+---------------------------
          Reporter:  Tristan Croll  |      Owner:  Tristan Croll
              Type:  enhancement    |     Status:  assigned
          Priority:  normal         |  Milestone:
         Component:  Performance    |    Version:
        Resolution:                 |   Keywords:
        Blocked By:                 |   Blocking:
Notify when closed:                 |   Platform:  all
           Project:  ChimeraX       |
------------------------------------+---------------------------

Comment (by Tom Goddard):

 Out of curiosity I ran my benchmark with surface style instead of mesh 31
 seconds, 5 times faster than mesh!  Not clear that it is all due to the
 mesh edges code, could be a also other parts of the OpenGL rendering.

 Another test shows it really is 5 times slower due to the std::set.  I
 replaced the std::set with  std::vector and took all edges with i0 < i1
 since these meshes always have one edge (i0,i1) and one (i1,i0) it gives
 the right answer in this case.  36 seconds in my mesh benchmark displaying
 mesh, vs 154 seconds with std::set.

 So optimization here would be nice.  It is ashame C++ does not have a fast
 hashmap implementation.

--
Ticket URL: <https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/5964#comment:29>
ChimeraX <https://www.rbvi.ucsf.edu/chimerax/>
ChimeraX Issue Tracker

mesh_edges.cpp-1.diff

by Tristan Croll, 4 years ago

Attachment: mesh_edges.cpp-1.diff added

Added by email2trac

in reply to:  36 ; comment:31 by Tristan Croll, 4 years ago

Another thought before I turn in for the night: what are the practical consequences of just not removing duplicate edges? I just tried replacing the unordered_map with a simple vector, and for my 7ogu test case the maximum time per frame reduced from ~800ms to ~300ms. Visual appearance seems identical.
________________________________
From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
Sent: 23 February 2022 21:40
Cc: goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk>
Subject: Re: [ChimeraX] #5964: Optimizing ISOLDE map zoning using JAX

#5964: Optimizing ISOLDE map zoning using JAX
------------------------------------+---------------------------
          Reporter:  Tristan Croll  |      Owner:  Tristan Croll
              Type:  enhancement    |     Status:  assigned
          Priority:  normal         |  Milestone:
         Component:  Performance    |    Version:
        Resolution:                 |   Keywords:
        Blocked By:                 |   Blocking:
Notify when closed:                 |   Platform:  all
           Project:  ChimeraX       |
------------------------------------+---------------------------
Changes (by Tristan Croll):

 * Attachment "mesh_edges.cpp-1.diff" added.

 Added by email2trac

--
Ticket URL: <https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/5964>
ChimeraX <https://www.rbvi.ucsf.edu/chimerax/>
ChimeraX Issue Tracker

comment:32 by Tom Goddard, 4 years ago

Just keeping all edges would usually produce twice as many edges one drawn right on top of the other. It will be slower to render. It will result in bad behavior in rare operations like converting a mesh to ball-and-stick model. It mostly just trades one performance issue (calculating edges) off against another (rendering speed). I am not in favor of optimizations that screw up the resulting data -- that just sabotages future work which will not expect all the duplicate edges.

I've spent too much time on this optimization already. You will have to work on it and do a careful analysis (ie different platforms and data sets) if you think it is worth your time.

in reply to:  38 comment:33 by Tristan Croll, 4 years ago

Point taken. Will try to avoid taking more of your time, but please bear with me through one more post: if I were to adjust the code to make the duplicate edge culling optional (and true by default) would you consider incorporating that? Then I could turn it off just for those maps controlled by Clipper. My reasoning is that the contour surfaces generated by Clipper, like essentially all ISOLDE's visualisations, are designed to be dynamic and ephemeral things. Not meant for production graphics or export elsewhere, not pickable - just a visual guide to the current state of the density. In the scenarios that really matter they're also on the relatively small side. Since the GPU seems to handle the extra edges with aplomb, it's an easy opportunity for really significant improvement. After turning off the removal of duplicate edges last night, I ran through fixing up the "isolde demo crystal_intro" with all atoms mobile... a buttery smooth experience, with just barely-noticeable lag on map updates.

One other question/comment: unless I misunderstand, duplicate edge culling is currently only happening for wireframe contours, not solid surfaces. How does that tally with your comments regarding rendering speed etc.?
________________________________
From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
Sent: 23 February 2022 23:17
Cc: goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk>
Subject: Re: [ChimeraX] #5964: Optimizing ISOLDE map zoning using JAX

#5964: Optimizing ISOLDE map zoning using JAX
------------------------------------+---------------------------
          Reporter:  Tristan Croll  |      Owner:  Tristan Croll
              Type:  enhancement    |     Status:  assigned
          Priority:  normal         |  Milestone:
         Component:  Performance    |    Version:
        Resolution:                 |   Keywords:
        Blocked By:                 |   Blocking:
Notify when closed:                 |   Platform:  all
           Project:  ChimeraX       |
------------------------------------+---------------------------

Comment (by Tom Goddard):

 Just keeping all edges would usually produce twice as many edges one drawn
 right on top of the other.  It will be slower to render.  It will result
 in bad behavior in rare operations like converting a mesh to ball-and-
 stick model.  It mostly just trades one performance issue (calculating
 edges) off against another (rendering speed).  I am not in favor of
 optimizations that screw up the resulting data -- that just sabotages
 future work which will not expect all the duplicate edges.

 I've spent too much time on this optimization already.  You will have to
 work on it and do a careful analysis (ie different platforms and data
 sets) if you think it is worth your time.

--
Ticket URL: <https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/5964#comment:32>
ChimeraX <https://www.rbvi.ucsf.edu/chimerax/>
ChimeraX Issue Tracker

in reply to:  39 ; comment:34 by Tristan Croll, 4 years ago

Sorry... one thing that does feel like a bug: turns out _graphics.masked_​edges() is being called twice for each wireframe surface: once for the surface drawing itself, once for the highlight (even when nothing is actually highlighted). That doesn't seem right to me?
________________________________
From: Tristan Croll <tic20@cam.ac.uk>
Sent: 24 February 2022 10:01
To: ChimeraX-bugs@cgl.ucsf.edu <ChimeraX-bugs@cgl.ucsf.edu>
Subject: Re: [ChimeraX] #5964: Optimizing ISOLDE map zoning using JAX

Point taken. Will try to avoid taking more of your time, but please bear with me through one more post: if I were to adjust the code to make the duplicate edge culling optional (and true by default) would you consider incorporating that? Then I could turn it off just for those maps controlled by Clipper. My reasoning is that the contour surfaces generated by Clipper, like essentially all ISOLDE's visualisations, are designed to be dynamic and ephemeral things. Not meant for production graphics or export elsewhere, not pickable - just a visual guide to the current state of the density. In the scenarios that really matter they're also on the relatively small side. Since the GPU seems to handle the extra edges with aplomb, it's an easy opportunity for really significant improvement. After turning off the removal of duplicate edges last night, I ran through fixing up the "isolde demo crystal_intro" with all atoms mobile... a buttery smooth experience, with just barely-noticeable lag on map updates.

One other question/comment: unless I misunderstand, duplicate edge culling is currently only happening for wireframe contours, not solid surfaces. How does that tally with your comments regarding rendering speed etc.?
________________________________
From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
Sent: 23 February 2022 23:17
Cc: goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk>
Subject: Re: [ChimeraX] #5964: Optimizing ISOLDE map zoning using JAX

#5964: Optimizing ISOLDE map zoning using JAX
------------------------------------+---------------------------
          Reporter:  Tristan Croll  |      Owner:  Tristan Croll
              Type:  enhancement    |     Status:  assigned
          Priority:  normal         |  Milestone:
         Component:  Performance    |    Version:
        Resolution:                 |   Keywords:
        Blocked By:                 |   Blocking:
Notify when closed:                 |   Platform:  all
           Project:  ChimeraX       |
------------------------------------+---------------------------

Comment (by Tom Goddard):

 Just keeping all edges would usually produce twice as many edges one drawn
 right on top of the other.  It will be slower to render.  It will result
 in bad behavior in rare operations like converting a mesh to ball-and-
 stick model.  It mostly just trades one performance issue (calculating
 edges) off against another (rendering speed).  I am not in favor of
 optimizations that screw up the resulting data -- that just sabotages
 future work which will not expect all the duplicate edges.

 I've spent too much time on this optimization already.  You will have to
 work on it and do a careful analysis (ie different platforms and data
 sets) if you think it is worth your time.

--
Ticket URL: <https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/5964#comment:32>
ChimeraX <https://www.rbvi.ucsf.edu/chimerax/>
ChimeraX Issue Tracker

comment:35 by Tom Goddard, 4 years ago

I am not liking the idea of adding say a Drawing.allow_duplicate_mesh_edges that then gets passed to the C++. This exposes some ugly optimizations at 3 levels, your ISOLDE code, graphics Python code, and graphics C++ code. Adding that to our graphics system which is already too complex for us to adequately maintain seems like a bad idea. Better is to make an optimization directly in the C++ that fixes the problem for all meshes and is contains the complexity to one piece of code that will almost never be looked at or changed. So every effort should be made to make the optimization there. Have you tried the set.reserve() call? I doubt it will help. But it is surely crazy to put in optimization code in 3 places without testing simpler ideas.

comment:36 by Tom Goddard, 4 years ago

General suggestion. When replying to tickets, delete the the previous quoted messages. Otherwise the Trac database page for this ticket is covered with repeated messages making it cumbersome to read through the comments.

comment:37 by Tristan Croll, 4 years ago

I did a bit more messing around, but didn't find anything amazing. Unordered_set.reserve() perhaps helped a little. What seemed to help more was improving the hash function - the simple edge.first <<32 | edge.second was still too "lumpy", leading to many buckets with 8-9 entries even when max_load_factor was set to 0.25 (i.e. the number of buckets was 4 times the expected number of edges). Replacing that with std::hash<size_t>{}(edge.first <<32 | edge.second} did a lot better (no more than 3 edges in any bucket) giving a further slight improvement. Nothing to write home about, though.

I also did a bit of looking around and came upon this header-only hashmap implementation (https://github.com/Tessil/robin-map) boasting some pretty impressive benchmarks (https://tessil.github.io/2016/08/29/benchmark-hopscotch-map.html) and an API nearly identical to the STL one - but either I did something horribly wrong or it's broken - when I added it and tried running the result, it froze up my entire machine for a few minutes before crashing ChimeraX. So won't be trying that again.

Is the fact that masked_edges runs twice for every mesh redraw expected behaviour? Unless I'm missing something it's running on the same set of triangles both times, so the highlight drawing could just use (or at least start from) the result of the first one?

comment:38 by Tom Goddard, 4 years ago

You are right _graphics.masked_edges() is being run twice on the same triangles with same edge and triangle mask -- so a completely duplicated calculation, every time the mesh changes when nothing is selected. That is rather poor. But yes it is expected. The code keeps a set of OpenGL buffers for the drawing and for the highlighted part of the drawing, and both are always updated when the triangles change even if nothing is selected, and both compute the buffer contents independently. Many optimizations could be done. It would of course be nice to optimize. But as the code is already very complex and updating failures caused by optimizations that didn't detect all the cases that need update are so rampant, that it will take some careful consideration to find a simple and bug-free optimization. I don't really have time for it. This is comment 38 on this ticket and I have hundreds of other issues. Sorry I know it is pathetic that the graphics is gratuitously twice as slow as it needs to be. Unfortunately a well-tuned graphics library would take a full-time developer entirely devoted to it -- so ours is far far from being optimized. If you want to investigate that labyrinth because it is worth it for you use then let me know what you find. Any fix will need to be simple and dead-obvious that it does not introduce updating bugs.

comment:39 by Tom Goddard, 4 years ago

Made a new ticket #6243 for optimizing mesh edge calculation from triangulated surface.

comment:40 by Tom Goddard, 4 years ago

I will look at this and post results to #6243. Always making twice as many opengl buffers as needed when nothing is highlighted is too horrible. I forgot I used to like optimization problems when I had more time. Sorry for being cranky in earlier comments -- too much work, too little fun. I definitely appreciate your discovery of these problems.

comment:41 by Tristan Croll, 4 years ago

Don't worry - I know that feeling far too well lately!

comment:42 by Tom Goddard, 4 years ago

Resolution: fixed
Status: assignedclosed

Fixed.

Improved the mesh edge performance, about 6x faster, tickets #6243 and #6297. Optimizations did not use JAX.

in reply to:  48 comment:43 by Tristan Croll, 4 years ago

Spectacular - thanks so much for your help with this! Just trying out the 3io0 demo model in today's daily build - with the whole model mobile the lag in updating maps is now essentially unnoticeable. Where before there were 80-100ms frames 1-2 times per second, now there are no frames over 50ms. 81% of frames are faster than 25ms, and 98% faster than 40ms. Overall framerate is steady at just over 50fps. This will make a massive difference to overall usability with crystallographic data.
________________________________
From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
Sent: 03 March 2022 00:37
Cc: goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk>
Subject: Re: [ChimeraX] #5964: Optimizing ISOLDE map zoning using JAX

#5964: Optimizing ISOLDE map zoning using JAX
------------------------------------+---------------------------
          Reporter:  Tristan Croll  |      Owner:  Tristan Croll
              Type:  enhancement    |     Status:  closed
          Priority:  normal         |  Milestone:
         Component:  Performance    |    Version:
        Resolution:  fixed          |   Keywords:
        Blocked By:                 |   Blocking:
Notify when closed:                 |   Platform:  all
           Project:  ChimeraX       |
------------------------------------+---------------------------
Changes (by Tom Goddard):

 * status:  assigned => closed
 * resolution:   => fixed


Comment:

 Fixed.

 Improved the mesh edge performance, about 6x faster, tickets #6243 and
 #6297.  Optimizations did not use JAX.

--
Ticket URL: <https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/5964#comment:42>
ChimeraX <https://www.rbvi.ucsf.edu/chimerax/>
ChimeraX Issue Tracker

comment:44 by Tom Goddard, 4 years ago

Yeah, easy optimizations like this improve the user experience with so little coding effort. If I ever object to optimizing something easy, remind me of this!

Note: See TracTickets for help on using tickets.