Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#5370 closed defect (duplicate)

matchmaker takes 45 seconds to align one small chain to virus capsid due to DSSP calculation

Reported by: goddard@… Owned by: pett
Priority: normal Milestone:
Component: Performance Version:
Keywords: Cc: Elaine Meng, Zach Pearson
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

The following bug report has been submitted:
Platform:        macOS-10.16-x86_64-i386-64bit
ChimeraX Version: 1.3.dev202110080903 (2021-10-08 09:03:01 UTC)
Description
Match maker takes a very long time to align a modest size structure with almost identical sequence -- hung for 45 seconds.

Log:
UCSF ChimeraX version: 1.3.dev202110080903 (2021-10-08)  
© 2016-2021 Regents of the University of California. All rights reserved.  
How to cite UCSF ChimeraX  

> open 6u0v format mmcif fromDatabase pdb

6u0v title:  
Atomic-Resolution Cryo-EM Structure of AAV2 VLP [more info...]  
  
Chain information for 6u0v #1  
---  
Chain | Description | UniProt  
1 2 3 4 5 6 7 8 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e
f g h i j k l m n o p q r s t u v w x y z | Capsid protein VP1 | CAPSD_AAV2S  
  

> open 6ihb format mmcif fromDatabase pdb

6ihb title:  
Adeno-Associated Virus 2 in complex with AAVR [more info...]  
  
Chain information for 6ihb #2  
---  
Chain | Description | UniProt  
A | Capsid protein VP1 | CAPSD_AAV2S  
R | Dyslexia-associated protein KIAA0319-like protein | K319L_HUMAN  
  
6ihb mmCIF Assemblies  
---  
1| complete icosahedral assembly  
2| icosahedral asymmetric unit  
3| icosahedral pentamer  
4| icosahedral 23 hexamer  
5| icosahedral asymmetric unit, std point frame  
  

> time mm #2/A #1/1

> mmaker #2/A #1/1

> matchmaker #2/A #1/1

Missing required "to" argument  

> time mm #2/A to #1/1

> mmaker #2/A to #1/1

Parameters  
---  
Chain pairing | bb  
Alignment algorithm | Needleman-Wunsch  
Similarity matrix | BLOSUM-62  
SS fraction | 0.3  
Gap open (HH/SS/other) | 18/18/6  
Gap extend | 1  
SS matrix |  |  | H | S | O  
---|---|---|---  
H | 6 | -9 | -6  
S |  | 6 | -6  
O |  |  | 4  
Iteration cutoff | 2  
  
Matchmaker 6u0v, chain 1 (#1) with 6ihb, chain A (#2), sequence alignment
score = 2558.4  
RMSD between 510 pruned atom pairs is 0.501 angstroms; (across all 517 pairs:
0.592)  
  
command time 45.16 seconds  
draw time 0.1094 seconds  




OpenGL version: 4.1 ATI-4.6.20
OpenGL renderer: AMD Radeon Pro Vega 20 OpenGL Engine
OpenGL vendor: ATI Technologies Inc.Hardware:

    Hardware Overview:

      Model Name: MacBook Pro
      Model Identifier: MacBookPro15,3
      Processor Name: 8-Core Intel Core i9
      Processor Speed: 2.4 GHz
      Number of Processors: 1
      Total Number of Cores: 8
      L2 Cache (per Core): 256 KB
      L3 Cache: 16 MB
      Hyper-Threading Technology: Enabled
      Memory: 32 GB
      System Firmware Version: 1554.140.20.0.0 (iBridge: 18.16.14759.0.1,0)

Software:

    System Software Overview:

      System Version: macOS 11.6 (20G165)
      Kernel Version: Darwin 20.6.0
      Time since boot: 6 days 11:50

Graphics/Displays:

    Intel UHD Graphics 630:

      Chipset Model: Intel UHD Graphics 630
      Type: GPU
      Bus: Built-In
      VRAM (Dynamic, Max): 1536 MB
      Vendor: Intel
      Device ID: 0x3e9b
      Revision ID: 0x0002
      Automatic Graphics Switching: Supported
      gMux Version: 5.0.0
      Metal Family: Supported, Metal GPUFamily macOS 2

    Radeon Pro Vega 20:

      Chipset Model: Radeon Pro Vega 20
      Type: GPU
      Bus: PCIe
      PCIe Lane Width: x8
      VRAM (Total): 4 GB
      Vendor: AMD (0x1002)
      Device ID: 0x69af
      Revision ID: 0x00c0
      ROM Revision: 113-D2060I-087
      VBIOS Version: 113-D20601MA0T-016
      Option ROM Version: 113-D20601MA0T-016
      EFI Driver Version: 01.01.087
      Automatic Graphics Switching: Supported
      gMux Version: 5.0.0
      Metal Family: Supported, Metal GPUFamily macOS 2
      Displays:
        Color LCD:
          Display Type: Built-In Retina LCD
          Resolution: 2880 x 1800 Retina
          Framebuffer Depth: 24-Bit Color (ARGB8888)
          Main Display: Yes
          Mirror: Off
          Online: Yes
          Automatically Adjust Brightness: Yes
          Connection Type: Internal

Locale: (None, 'UTF-8')
PyQt5 5.15.2, Qt 5.15.2
Installed Packages:
    alabaster: 0.7.12
    appdirs: 1.4.4
    appnope: 0.1.2
    Babel: 2.9.1
    backcall: 0.2.0
    blockdiag: 2.0.1
    certifi: 2021.5.30
    cftime: 1.5.1
    charset-normalizer: 2.0.6
    ChimeraX-AddCharge: 1.1.4
    ChimeraX-AddH: 2.1.10
    ChimeraX-AlignmentAlgorithms: 2.0
    ChimeraX-AlignmentHdrs: 3.2
    ChimeraX-AlignmentMatrices: 2.0
    ChimeraX-Alignments: 2.2.2
    ChimeraX-AlphaFold: 1.0
    ChimeraX-AltlocExplorer: 1.0.1
    ChimeraX-AmberInfo: 1.0
    ChimeraX-Arrays: 1.0
    ChimeraX-Atomic: 1.30.2
    ChimeraX-AtomicLibrary: 4.1.4
    ChimeraX-AtomSearch: 2.0
    ChimeraX-AtomSearchLibrary: 1.0
    ChimeraX-AxesPlanes: 2.0
    ChimeraX-BasicActions: 1.1
    ChimeraX-BILD: 1.0
    ChimeraX-BlastProtein: 2.0
    ChimeraX-BondRot: 2.0
    ChimeraX-BugReporter: 1.0
    ChimeraX-BuildStructure: 2.6
    ChimeraX-Bumps: 1.0
    ChimeraX-BundleBuilder: 1.1
    ChimeraX-ButtonPanel: 1.0
    ChimeraX-CageBuilder: 1.0
    ChimeraX-CellPack: 1.0
    ChimeraX-Centroids: 1.2
    ChimeraX-ChemGroup: 2.0
    ChimeraX-Clashes: 2.1.1
    ChimeraX-ColorActions: 1.0
    ChimeraX-ColorGlobe: 1.0
    ChimeraX-ColorKey: 1.5
    ChimeraX-CommandLine: 1.1.5
    ChimeraX-ConnectStructure: 2.0
    ChimeraX-Contacts: 1.0
    ChimeraX-Core: 1.3.dev202110080903
    ChimeraX-CoreFormats: 1.1
    ChimeraX-coulombic: 1.3.1
    ChimeraX-Crosslinks: 1.0
    ChimeraX-Crystal: 1.0
    ChimeraX-CrystalContacts: 1.0
    ChimeraX-DataFormats: 1.2.1
    ChimeraX-Dicom: 1.0
    ChimeraX-DistMonitor: 1.1.5
    ChimeraX-DistUI: 1.0
    ChimeraX-Dssp: 2.0
    ChimeraX-EMDB-SFF: 1.0
    ChimeraX-ExperimentalCommands: 1.0
    ChimeraX-FileHistory: 1.0
    ChimeraX-FunctionKey: 1.0
    ChimeraX-Geometry: 1.1
    ChimeraX-gltf: 1.0
    ChimeraX-Graphics: 1.1
    ChimeraX-Hbonds: 2.1.1
    ChimeraX-Help: 1.2
    ChimeraX-HKCage: 1.3
    ChimeraX-IHM: 1.1
    ChimeraX-ImageFormats: 1.2
    ChimeraX-IMOD: 1.0
    ChimeraX-IO: 1.0.1
    ChimeraX-ItemsInspection: 1.0
    ChimeraX-Label: 1.1
    ChimeraX-ListInfo: 1.1.1
    ChimeraX-Log: 1.1.4
    ChimeraX-LookingGlass: 1.1
    ChimeraX-Maestro: 1.8.1
    ChimeraX-Map: 1.1
    ChimeraX-MapData: 2.0
    ChimeraX-MapEraser: 1.0
    ChimeraX-MapFilter: 2.0
    ChimeraX-MapFit: 2.0
    ChimeraX-MapSeries: 2.1
    ChimeraX-Markers: 1.0
    ChimeraX-Mask: 1.0
    ChimeraX-MatchMaker: 2.0.2
    ChimeraX-MDcrds: 2.6
    ChimeraX-MedicalToolbar: 1.0.1
    ChimeraX-Meeting: 1.0
    ChimeraX-MLP: 1.1
    ChimeraX-mmCIF: 2.4
    ChimeraX-MMTF: 2.1
    ChimeraX-Modeller: 1.2.2
    ChimeraX-ModelPanel: 1.2
    ChimeraX-ModelSeries: 1.0
    ChimeraX-Mol2: 2.0
    ChimeraX-Morph: 1.0
    ChimeraX-MouseModes: 1.1
    ChimeraX-Movie: 1.0
    ChimeraX-Neuron: 1.0
    ChimeraX-Nucleotides: 2.0.2
    ChimeraX-OpenCommand: 1.7
    ChimeraX-PDB: 2.6.4
    ChimeraX-PDBBio: 1.0
    ChimeraX-PDBLibrary: 1.0.2
    ChimeraX-PDBMatrices: 1.0
    ChimeraX-Phenix: 0.3
    ChimeraX-PickBlobs: 1.0
    ChimeraX-Positions: 1.0
    ChimeraX-PresetMgr: 1.0.1
    ChimeraX-PubChem: 2.1
    ChimeraX-ReadPbonds: 1.0
    ChimeraX-Registration: 1.1
    ChimeraX-RemoteControl: 1.0
    ChimeraX-ResidueFit: 1.0
    ChimeraX-RestServer: 1.1
    ChimeraX-RNALayout: 1.0
    ChimeraX-RotamerLibMgr: 2.0
    ChimeraX-RotamerLibsDunbrack: 2.0
    ChimeraX-RotamerLibsDynameomics: 2.0
    ChimeraX-RotamerLibsRichardson: 2.0
    ChimeraX-SaveCommand: 1.5
    ChimeraX-SchemeMgr: 1.0
    ChimeraX-SDF: 2.0
    ChimeraX-Segger: 1.0
    ChimeraX-Segment: 1.0
    ChimeraX-SelInspector: 1.0
    ChimeraX-SeqView: 2.4.4
    ChimeraX-Shape: 1.0.1
    ChimeraX-Shell: 1.0
    ChimeraX-Shortcuts: 1.1
    ChimeraX-ShowAttr: 1.0
    ChimeraX-ShowSequences: 1.0
    ChimeraX-SideView: 1.0
    ChimeraX-Smiles: 2.1
    ChimeraX-SmoothLines: 1.0
    ChimeraX-SpaceNavigator: 1.0
    ChimeraX-StdCommands: 1.6
    ChimeraX-STL: 1.0
    ChimeraX-Storm: 1.0
    ChimeraX-Struts: 1.0
    ChimeraX-Surface: 1.0
    ChimeraX-SwapAA: 2.0
    ChimeraX-SwapRes: 2.1
    ChimeraX-TapeMeasure: 1.0
    ChimeraX-Test: 1.0
    ChimeraX-Toolbar: 1.1
    ChimeraX-ToolshedUtils: 1.2
    ChimeraX-Tug: 1.0
    ChimeraX-UI: 1.13.4
    ChimeraX-uniprot: 2.2
    ChimeraX-UnitCell: 1.0
    ChimeraX-ViewDockX: 1.0.1
    ChimeraX-VIPERdb: 1.0
    ChimeraX-Vive: 1.1
    ChimeraX-VolumeMenu: 1.0
    ChimeraX-VTK: 1.0
    ChimeraX-WavefrontOBJ: 1.0
    ChimeraX-WebCam: 1.0
    ChimeraX-WebServices: 1.0
    ChimeraX-Zone: 1.0
    colorama: 0.4.4
    cxservices: 1.1
    cycler: 0.10.0
    Cython: 0.29.24
    decorator: 5.1.0
    docutils: 0.17.1
    filelock: 3.0.12
    funcparserlib: 0.3.6
    grako: 3.16.5
    h5py: 3.4.0
    html2text: 2020.1.16
    idna: 3.2
    ihm: 0.21
    imagecodecs: 2021.4.28
    imagesize: 1.2.0
    ipykernel: 5.5.5
    ipython: 7.23.1
    ipython-genutils: 0.2.0
    jedi: 0.18.0
    Jinja2: 3.0.1
    joblib: 1.0.1
    jupyter-client: 6.1.12
    jupyter-core: 4.8.1
    kiwisolver: 1.3.2
    lxml: 4.6.3
    lz4: 3.1.3
    MarkupSafe: 2.0.1
    matplotlib: 3.4.3
    matplotlib-inline: 0.1.3
    mrcfile: 1.3.0
    msgpack: 1.0.2
    netCDF4: 1.5.7
    networkx: 2.6.3
    numexpr: 2.7.3
    numpy: 1.21.2
    openvr: 1.16.801
    packaging: 21.0
    pandas: 1.3.2
    ParmEd: 3.2.0
    parso: 0.8.2
    pexpect: 4.8.0
    pickleshare: 0.7.5
    Pillow: 8.3.2
    pip: 21.2.4
    pkginfo: 1.7.1
    prompt-toolkit: 3.0.20
    psutil: 5.8.0
    ptyprocess: 0.7.0
    pycollada: 0.7.1
    pydicom: 2.1.2
    Pygments: 2.10.0
    PyOpenGL: 3.1.5
    PyOpenGL-accelerate: 3.1.5
    pyparsing: 2.4.7
    PyQt5-commercial: 5.15.2
    PyQt5-sip: 12.8.1
    PyQtWebEngine-commercial: 5.15.2
    Pyro4: 4.81
    Pyro5: 5.12
    python-dateutil: 2.8.2
    pytz: 2021.3
    pyzmq: 22.3.0
    qtconsole: 5.1.1
    QtPy: 1.11.2
    RandomWords: 0.3.0
    rdkit-pypi: 2021.3.5.1
    requests: 2.26.0
    scikit-learn: 1.0
    scipy: 1.7.1
    serpent: 1.40
    setuptools: 57.5.0
    sfftk-rw: 0.7.1
    six: 1.16.0
    snowballstemmer: 2.1.0
    sortedcontainers: 2.4.0
    Sphinx: 4.2.0
    sphinx-autodoc-typehints: 1.12.0
    sphinxcontrib-applehelp: 1.0.2
    sphinxcontrib-blockdiag: 2.0.0
    sphinxcontrib-devhelp: 1.0.2
    sphinxcontrib-htmlhelp: 2.0.0
    sphinxcontrib-jsmath: 1.0.1
    sphinxcontrib-qthelp: 1.0.3
    sphinxcontrib-serializinghtml: 1.1.5
    suds-jurko: 0.6
    threadpoolctl: 3.0.0
    tifffile: 2021.4.8
    tinyarray: 1.2.3
    tornado: 6.1
    traitlets: 5.1.0
    urllib3: 1.26.7
    wcwidth: 0.2.5
    webcolors: 1.11.1
    wheel: 0.37.0
    wheel-filename: 1.3.0

Change History (15)

in reply to:  1 ; comment:1 by goddard@…, 4 years ago

If I delete the 59 extra chains of 6u0v other than chain 1

delete ~/1 & #1

then matchmaker completes in .1 seconds

time mm #2/A to #1/1
command time 0.119 seconds

So for some reason matchmaker seems to be slowed down by the 59 chains it should be ignoring.

comment:2 by pett, 4 years ago

Cc: Elaine Meng Zach Pearson added
Component: UnassignedStructure Comparison
Owner: set to pett
Platform: all
Project: ChimeraX
Status: newaccepted
Summary: ChimeraX bug report submissionsingle chain matching

comment:3 by pett, 4 years ago

Component: Structure ComparisonPerformance
Resolution: duplicate
Status: acceptedclosed

This is not actually because of chain matching, it is because DSSP on the large structure is very slow (#4759). The only way to avoid this right now is to provide the "computeSS false" arg to matchmaker.

I did add code to prune the chain matching down to chains that actually contain requested residues, but that only speeded things up by about a second and a half in my tests. Nonetheless, I will commit that change on the develop branch.

in reply to:  4 ; comment:4 by goddard@…, 4 years ago

So matchmaker is going to run DSSP on the structure by default even if we have secondary structure records already in the target?  Seems a recipe to make matchmaker very slow when the target is any large structure (e.g. a ribosome).  Does it cache the DSSP results?  It seems like this would really kill performance of AlphaFold using matchmaker on each chain of a large structure to align each AlphaFold model.  If the target is a structure not from the PDB and had no secondary-structure meta-data so ChimeraX ran dssp when it was opened, does matchmaker know that or does it run DSSP again on the whole structure every time a match is done?

in reply to:  5 ; comment:5 by Elaine Meng, 4 years ago

This was based on experiments run for our original publication on the algorithm, finding that superpositions were better when we re-calculated secondary structure for PDB entries even though they already had assignments in their files.  Apparently the assignments already in the files tended to be quite inconsistent, having come from a variety of methods.    

in reply to:  6 ; comment:6 by pett, 4 years ago

As per the Matchmaker paper, the resulting matching is measurably better if the secondary structure determination in both structures is consistent.


Yes, the results are cached.


DSSP will not be run until secondary structure information is requested.  So for large structures that don't initially display ribbons, DSSP has most likely not been run before matchmaker executes.  That said, the caching only persists within one execution of a match_maker command.  I could look to improve that, but I think the first priority is to make DSSP faster.


in reply to:  7 ; comment:7 by goddard@…, 4 years ago

Ok.  The alphafold match command matchmakers every chain in the target structure to the alphafold database models.  So I guess that would take 45 minutes of matching on this 6u0v structure, while based on my single chain test (.1 second to do the match) it would take 6 seconds.  (Mercifully the AlphaFold database does not have a match for this structure, but it probably will in the future.) Performance is unexpectedly bad.  Caching DSSP would reduce time from 45 minutes to 45 seconds.

Maybe 6u0v (249,00  protein atoms) is an especially slow DSSP case.  I tested the speed of alphafold match on yeast 80S ribosome dimer 4v7r (79,000 protein atoms, 310,000 total atoms), matching 104 chain models with AlphaFold database models.  First time when all structures had to be fetched from AlphaFold database, took 550 seconds, second time using cached AlphaFold models took 523 seconds, and with computeSS false in matchmaker took 113 seconds.  So alphafold match takes about 9 minutes with compute_ss = true versus  2 minutes with compute_ss = false.

in reply to:  8 ; comment:8 by pett, 4 years ago

For "alphafold match", where the sequences are identical or extremely similar except for missing segments, you shouldn't bother with secondary structure matching at all.  Use the "ssFraction false" option to matchmaker.


in reply to:  9 ; comment:9 by goddard@…, 4 years ago

AlphaFold match can fetch very dissimilar sequences since it will do a BLAT sequence search and find the closest hit.  I am thinking computeSS false may be a sensible compromise.  But for small structures I'd prefer to use the default matchmaker settings.  So I am undecided what to do.

in reply to:  10 ; comment:10 by pett, 4 years ago

I'm think that if AlphaFold comes up with a different fold for a dissimilar sequence, the SS matching may produce worse results.  Up to you of course.


comment:11 by Tom Goddard, 4 years ago

Besides AlphaFold there is the question of what matchmaker options the Blast Protein tool should use. This bug report was the result of running Blast and having it take 45 seconds to load and align a small protein to a virus capsid monomer. Other large complexes like ribosomes will also be very slow for loading Blast hits because of the matchmaker alignment step using dssp. Should Blast also use matchmaker option "ssFraction false"?

in reply to:  12 ; comment:12 by Elaine Meng, 4 years ago

Not in my opinion.  For Blast PDB you would want to use secondary structure to align structurally similar but sequence-distant hits effectively.  I rather suspect your HUGE complexes are going to be a miniscule proportion of use cases.  However, if you really can't stand defaults I would vote for "compute" false avoiding the recalculation of secondary structure, rather than discarding it altogether.

comment:13 by pett, 4 years ago

Agree with Elaine.

In long view you do nothing because DSSP will be faster. In the short view, you could possibly do "computeSS false" and live with the fact that the secondary structure assignments may be slightly off, but using the secondary structure is important for structurally aligning similar regions so you don't want to do "ssFraction false".

in reply to:  14 ; comment:14 by Elaine Meng, 4 years ago

I'm also skeptical of discarding secondary structure entirely for matching of alphafold database entries.  I've spent a lot of time on comparisons with known structures, and when there is some known similar structure (e.g. basically all situations where there is a query structure to match with, unless the person has generated it on their own with some other modeling) the secondary structure is in high agreement with that structure.  AlphaFold modeling is not like first-principles physics-based de novo modeling... from what I've seen, it hews significantly to the existing PDB database database structures as templates.

comment:15 by Tom Goddard, 4 years ago

Summary: single chain matchingmatchmaker takes 45 seconds to align one small chain to virus capsid due to DSSP calculation

Made alphafold match command use matchmaker computeSS false to speed it up when matching large structures. The alphafold fetch and predict commands use the default computeSS true since those only match one chain. Changed both 1.3 and 1.4 builds.

Note: See TracTickets for help on using tickets.