Opened 4 years ago

Closed 4 years ago

#5143 closed enhancement (wontfix)

Avoid recalculating idatm_types for entire model after changes?

Reported by: Tristan Croll Owned by: pett
Priority: major Milestone:
Component: Structure Editing Version:
Keywords: Cc:
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

When working on large models (particularly those with lots of rings on them), any change that triggers recalculation of idatm_type information can get unexpectedly slow (up to tens of seconds in the worst cases) because the types are always recalculated for the entire model. Since only bonded interactions are considered in the atom typing algorithm, this could be improved by only recalculating types for the connected fragments affected by the change. The following snippet uses python-igraph to partition a model into individual connected components. Running time for typical systems is pretty trivial, but gets a bit more substantial for larger ones (for all cases below, hydrogens have first been added with addh):

7lqi (800 residues, 16300 atoms, 99 connected groups): 0.034 seconds
4x2u (2156 residues, 18243 atoms, 1243 connected groups): 0.048 seconds
3mgg (1119 residues, 22111 atoms, 26 connected groups): 0.050 seconds
4wsa (2207 residues, 35755 atoms, 7 connected groups): 0.070 seconds
7cow (2405 residues, 49777 atoms, 34 connected groups): 0.089 seconds
7b9v (12878 residues, 211756 atoms, 136 connected groups): 0.309 seconds
7miz (27288 residues, 423137 atoms, 186 connected groups): 0.630 seconds
3j3q (313236 residues, 4884334 atoms, 1355 connected groups): 6.643 seconds

Seems like it would be fairly straightforward from here to identify the fragments affected by changes (find the group for added/changed atoms, or those that are/were attached to added or deleted atoms or bonds). Or am I missing something critical?

`
def connected_components_igraph(model, include_metal_pseudobonds=True):


Returns two arrays:

  • the model atoms
  • a list of integers of the same length as the atom list, where the value at index i indicates the connected group to which atom i belongs.


import igraph
import numpy
g = igraph.Graph()
atoms = model.atoms
bonds = model.bonds
bi = [atoms.indices(ba) for ba in bonds.atoms]
edges = numpy.array(bi).T
g.add_vertices(range(len(atoms)))
g.add_edges(edges)
if include_metal_pseudobonds:

pbg = model.pseudobond_group(model.PBG_METAL_COORDINATION, create_type=None)
if pbg is not None:

pb = pbg.pseudobonds
if len(pb):

pba = pb.atoms
pi = [atoms.indices(ba) for ba in pba]
edges = numpy.array(pi).T
g.add_edges(edges)

components = g.components()
membership = numpy.array(components.membership)
return atoms, membership

`

Change History (6)

comment:1 by pett, 4 years ago

Component: UnassignedStructure Editing
Priority: blockermajor
Status: assignedaccepted
Type: defectenhancement

The way atom typing works is that operations that could invalidate atom types set a dirty bit in Structure, and when atom types are requested that bit is checked and if set all types are recomputed. It would require some significant re-architecting to restrict atom typing to "pertinent" sections of the structure. There is already code (Structure::bonded_groups) for finding connected components, so there is no need to try to incorporate igraph.

I am in no way saying this won't happen (or some equivalent improvement), just placing a mild brake on expectations as to how soon it happens.

comment:2 by pett, 4 years ago

Any change I make also has to not impede simply opening a structure, where thousands of atoms and bonds can be created before the first idatm_type request.

in reply to:  3 ; comment:3 by Tristan Croll, 4 years ago

I should have realised you'd already have a fast tool for partitioning into connected fragments!

No real urgency on my part - this is more like a "seems worthwhile when there's time" suggestion.


________________________________
From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
Sent: 01 September 2021 23:46
Cc: pett@cgl.ucsf.edu <pett@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk>
Subject: Re: [ChimeraX] #5143: Avoid recalculating idatm_types for entire model after changes?

#5143: Avoid recalculating idatm_types for entire model after changes?
----------------------------------------+----------------------------
          Reporter:  Tristan Croll      |      Owner:  Eric Pettersen
              Type:  enhancement        |     Status:  accepted
          Priority:  major              |  Milestone:
         Component:  Structure Editing  |    Version:
        Resolution:                     |   Keywords:
        Blocked By:                     |   Blocking:
Notify when closed:                     |   Platform:  all
           Project:  ChimeraX           |
----------------------------------------+----------------------------

Comment (by Eric Pettersen):

 Any change I make also has to not impede simply opening a structure, where
 thousands of atoms and bonds can be created before the first idatm_type
 request.

--
Ticket URL: <https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/5143#comment:2>
ChimeraX <https://www.rbvi.ucsf.edu/chimerax/>
ChimeraX Issue Tracker

comment:4 by pett, 4 years ago

Resolution: fixed
Status: acceptedclosed

Well, I committed changes and tweaks that make IDATM computation take about 40% of the time that it used to. I don't know if that's good enough, but I'm closing this ticket and if it's not good enough then go ahead and reopen it. :-). The version number for AtomicLibrary went to 5.0 because I had to add a private variable to the Structure class.

comment:5 by pett, 4 years ago

Resolution: fixed
Status: closedreopened

I have further ideas for speed up (namely, make the dirty bit per-residue instead of per-structure), but that will take some work and I probably need to work on some other things first.

comment:6 by pett, 4 years ago

Resolution: wontfix
Status: reopenedclosed

After further reflection, the simple Residue dirty bit I was envisioning is problematic since IDATM changes can have ripple effects out to grandchild atoms. A solid implementation would in fact require dirtying the connected fragment (as suggested in the original description) which could have deleterious effects in other usage scenarios. If the 60% speedup already implemented proves to be still insufficient, probably the best solution is for the ISOLDE code to itself update the computed IDATM types of affected atoms and unset the Structure dirty bit. I can provide guidance on that if it proves necessary.

Note: See TracTickets for help on using tickets.