#5370 closed defect (duplicate)
matchmaker takes 45 seconds to align one small chain to virus capsid due to DSSP calculation
Reported by: | Owned by: | pett | |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Performance | Version: | |
Keywords: | Cc: | Elaine Meng, Zach Pearson | |
Blocked By: | Blocking: | ||
Notify when closed: | Platform: | all | |
Project: | ChimeraX |
Description
The following bug report has been submitted: Platform: macOS-10.16-x86_64-i386-64bit ChimeraX Version: 1.3.dev202110080903 (2021-10-08 09:03:01 UTC) Description Match maker takes a very long time to align a modest size structure with almost identical sequence -- hung for 45 seconds. Log: UCSF ChimeraX version: 1.3.dev202110080903 (2021-10-08) © 2016-2021 Regents of the University of California. All rights reserved. How to cite UCSF ChimeraX > open 6u0v format mmcif fromDatabase pdb 6u0v title: Atomic-Resolution Cryo-EM Structure of AAV2 VLP [more info...] Chain information for 6u0v #1 --- Chain | Description | UniProt 1 2 3 4 5 6 7 8 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z | Capsid protein VP1 | CAPSD_AAV2S > open 6ihb format mmcif fromDatabase pdb 6ihb title: Adeno-Associated Virus 2 in complex with AAVR [more info...] Chain information for 6ihb #2 --- Chain | Description | UniProt A | Capsid protein VP1 | CAPSD_AAV2S R | Dyslexia-associated protein KIAA0319-like protein | K319L_HUMAN 6ihb mmCIF Assemblies --- 1| complete icosahedral assembly 2| icosahedral asymmetric unit 3| icosahedral pentamer 4| icosahedral 23 hexamer 5| icosahedral asymmetric unit, std point frame > time mm #2/A #1/1 > mmaker #2/A #1/1 > matchmaker #2/A #1/1 Missing required "to" argument > time mm #2/A to #1/1 > mmaker #2/A to #1/1 Parameters --- Chain pairing | bb Alignment algorithm | Needleman-Wunsch Similarity matrix | BLOSUM-62 SS fraction | 0.3 Gap open (HH/SS/other) | 18/18/6 Gap extend | 1 SS matrix | | | H | S | O ---|---|---|--- H | 6 | -9 | -6 S | | 6 | -6 O | | | 4 Iteration cutoff | 2 Matchmaker 6u0v, chain 1 (#1) with 6ihb, chain A (#2), sequence alignment score = 2558.4 RMSD between 510 pruned atom pairs is 0.501 angstroms; (across all 517 pairs: 0.592) command time 45.16 seconds draw time 0.1094 seconds OpenGL version: 4.1 ATI-4.6.20 OpenGL renderer: AMD Radeon Pro Vega 20 OpenGL Engine OpenGL vendor: ATI Technologies Inc.Hardware: Hardware Overview: Model Name: MacBook Pro Model Identifier: MacBookPro15,3 Processor Name: 8-Core Intel Core i9 Processor Speed: 2.4 GHz Number of Processors: 1 Total Number of Cores: 8 L2 Cache (per Core): 256 KB L3 Cache: 16 MB Hyper-Threading Technology: Enabled Memory: 32 GB System Firmware Version: 1554.140.20.0.0 (iBridge: 18.16.14759.0.1,0) Software: System Software Overview: System Version: macOS 11.6 (20G165) Kernel Version: Darwin 20.6.0 Time since boot: 6 days 11:50 Graphics/Displays: Intel UHD Graphics 630: Chipset Model: Intel UHD Graphics 630 Type: GPU Bus: Built-In VRAM (Dynamic, Max): 1536 MB Vendor: Intel Device ID: 0x3e9b Revision ID: 0x0002 Automatic Graphics Switching: Supported gMux Version: 5.0.0 Metal Family: Supported, Metal GPUFamily macOS 2 Radeon Pro Vega 20: Chipset Model: Radeon Pro Vega 20 Type: GPU Bus: PCIe PCIe Lane Width: x8 VRAM (Total): 4 GB Vendor: AMD (0x1002) Device ID: 0x69af Revision ID: 0x00c0 ROM Revision: 113-D2060I-087 VBIOS Version: 113-D20601MA0T-016 Option ROM Version: 113-D20601MA0T-016 EFI Driver Version: 01.01.087 Automatic Graphics Switching: Supported gMux Version: 5.0.0 Metal Family: Supported, Metal GPUFamily macOS 2 Displays: Color LCD: Display Type: Built-In Retina LCD Resolution: 2880 x 1800 Retina Framebuffer Depth: 24-Bit Color (ARGB8888) Main Display: Yes Mirror: Off Online: Yes Automatically Adjust Brightness: Yes Connection Type: Internal Locale: (None, 'UTF-8') PyQt5 5.15.2, Qt 5.15.2 Installed Packages: alabaster: 0.7.12 appdirs: 1.4.4 appnope: 0.1.2 Babel: 2.9.1 backcall: 0.2.0 blockdiag: 2.0.1 certifi: 2021.5.30 cftime: 1.5.1 charset-normalizer: 2.0.6 ChimeraX-AddCharge: 1.1.4 ChimeraX-AddH: 2.1.10 ChimeraX-AlignmentAlgorithms: 2.0 ChimeraX-AlignmentHdrs: 3.2 ChimeraX-AlignmentMatrices: 2.0 ChimeraX-Alignments: 2.2.2 ChimeraX-AlphaFold: 1.0 ChimeraX-AltlocExplorer: 1.0.1 ChimeraX-AmberInfo: 1.0 ChimeraX-Arrays: 1.0 ChimeraX-Atomic: 1.30.2 ChimeraX-AtomicLibrary: 4.1.4 ChimeraX-AtomSearch: 2.0 ChimeraX-AtomSearchLibrary: 1.0 ChimeraX-AxesPlanes: 2.0 ChimeraX-BasicActions: 1.1 ChimeraX-BILD: 1.0 ChimeraX-BlastProtein: 2.0 ChimeraX-BondRot: 2.0 ChimeraX-BugReporter: 1.0 ChimeraX-BuildStructure: 2.6 ChimeraX-Bumps: 1.0 ChimeraX-BundleBuilder: 1.1 ChimeraX-ButtonPanel: 1.0 ChimeraX-CageBuilder: 1.0 ChimeraX-CellPack: 1.0 ChimeraX-Centroids: 1.2 ChimeraX-ChemGroup: 2.0 ChimeraX-Clashes: 2.1.1 ChimeraX-ColorActions: 1.0 ChimeraX-ColorGlobe: 1.0 ChimeraX-ColorKey: 1.5 ChimeraX-CommandLine: 1.1.5 ChimeraX-ConnectStructure: 2.0 ChimeraX-Contacts: 1.0 ChimeraX-Core: 1.3.dev202110080903 ChimeraX-CoreFormats: 1.1 ChimeraX-coulombic: 1.3.1 ChimeraX-Crosslinks: 1.0 ChimeraX-Crystal: 1.0 ChimeraX-CrystalContacts: 1.0 ChimeraX-DataFormats: 1.2.1 ChimeraX-Dicom: 1.0 ChimeraX-DistMonitor: 1.1.5 ChimeraX-DistUI: 1.0 ChimeraX-Dssp: 2.0 ChimeraX-EMDB-SFF: 1.0 ChimeraX-ExperimentalCommands: 1.0 ChimeraX-FileHistory: 1.0 ChimeraX-FunctionKey: 1.0 ChimeraX-Geometry: 1.1 ChimeraX-gltf: 1.0 ChimeraX-Graphics: 1.1 ChimeraX-Hbonds: 2.1.1 ChimeraX-Help: 1.2 ChimeraX-HKCage: 1.3 ChimeraX-IHM: 1.1 ChimeraX-ImageFormats: 1.2 ChimeraX-IMOD: 1.0 ChimeraX-IO: 1.0.1 ChimeraX-ItemsInspection: 1.0 ChimeraX-Label: 1.1 ChimeraX-ListInfo: 1.1.1 ChimeraX-Log: 1.1.4 ChimeraX-LookingGlass: 1.1 ChimeraX-Maestro: 1.8.1 ChimeraX-Map: 1.1 ChimeraX-MapData: 2.0 ChimeraX-MapEraser: 1.0 ChimeraX-MapFilter: 2.0 ChimeraX-MapFit: 2.0 ChimeraX-MapSeries: 2.1 ChimeraX-Markers: 1.0 ChimeraX-Mask: 1.0 ChimeraX-MatchMaker: 2.0.2 ChimeraX-MDcrds: 2.6 ChimeraX-MedicalToolbar: 1.0.1 ChimeraX-Meeting: 1.0 ChimeraX-MLP: 1.1 ChimeraX-mmCIF: 2.4 ChimeraX-MMTF: 2.1 ChimeraX-Modeller: 1.2.2 ChimeraX-ModelPanel: 1.2 ChimeraX-ModelSeries: 1.0 ChimeraX-Mol2: 2.0 ChimeraX-Morph: 1.0 ChimeraX-MouseModes: 1.1 ChimeraX-Movie: 1.0 ChimeraX-Neuron: 1.0 ChimeraX-Nucleotides: 2.0.2 ChimeraX-OpenCommand: 1.7 ChimeraX-PDB: 2.6.4 ChimeraX-PDBBio: 1.0 ChimeraX-PDBLibrary: 1.0.2 ChimeraX-PDBMatrices: 1.0 ChimeraX-Phenix: 0.3 ChimeraX-PickBlobs: 1.0 ChimeraX-Positions: 1.0 ChimeraX-PresetMgr: 1.0.1 ChimeraX-PubChem: 2.1 ChimeraX-ReadPbonds: 1.0 ChimeraX-Registration: 1.1 ChimeraX-RemoteControl: 1.0 ChimeraX-ResidueFit: 1.0 ChimeraX-RestServer: 1.1 ChimeraX-RNALayout: 1.0 ChimeraX-RotamerLibMgr: 2.0 ChimeraX-RotamerLibsDunbrack: 2.0 ChimeraX-RotamerLibsDynameomics: 2.0 ChimeraX-RotamerLibsRichardson: 2.0 ChimeraX-SaveCommand: 1.5 ChimeraX-SchemeMgr: 1.0 ChimeraX-SDF: 2.0 ChimeraX-Segger: 1.0 ChimeraX-Segment: 1.0 ChimeraX-SelInspector: 1.0 ChimeraX-SeqView: 2.4.4 ChimeraX-Shape: 1.0.1 ChimeraX-Shell: 1.0 ChimeraX-Shortcuts: 1.1 ChimeraX-ShowAttr: 1.0 ChimeraX-ShowSequences: 1.0 ChimeraX-SideView: 1.0 ChimeraX-Smiles: 2.1 ChimeraX-SmoothLines: 1.0 ChimeraX-SpaceNavigator: 1.0 ChimeraX-StdCommands: 1.6 ChimeraX-STL: 1.0 ChimeraX-Storm: 1.0 ChimeraX-Struts: 1.0 ChimeraX-Surface: 1.0 ChimeraX-SwapAA: 2.0 ChimeraX-SwapRes: 2.1 ChimeraX-TapeMeasure: 1.0 ChimeraX-Test: 1.0 ChimeraX-Toolbar: 1.1 ChimeraX-ToolshedUtils: 1.2 ChimeraX-Tug: 1.0 ChimeraX-UI: 1.13.4 ChimeraX-uniprot: 2.2 ChimeraX-UnitCell: 1.0 ChimeraX-ViewDockX: 1.0.1 ChimeraX-VIPERdb: 1.0 ChimeraX-Vive: 1.1 ChimeraX-VolumeMenu: 1.0 ChimeraX-VTK: 1.0 ChimeraX-WavefrontOBJ: 1.0 ChimeraX-WebCam: 1.0 ChimeraX-WebServices: 1.0 ChimeraX-Zone: 1.0 colorama: 0.4.4 cxservices: 1.1 cycler: 0.10.0 Cython: 0.29.24 decorator: 5.1.0 docutils: 0.17.1 filelock: 3.0.12 funcparserlib: 0.3.6 grako: 3.16.5 h5py: 3.4.0 html2text: 2020.1.16 idna: 3.2 ihm: 0.21 imagecodecs: 2021.4.28 imagesize: 1.2.0 ipykernel: 5.5.5 ipython: 7.23.1 ipython-genutils: 0.2.0 jedi: 0.18.0 Jinja2: 3.0.1 joblib: 1.0.1 jupyter-client: 6.1.12 jupyter-core: 4.8.1 kiwisolver: 1.3.2 lxml: 4.6.3 lz4: 3.1.3 MarkupSafe: 2.0.1 matplotlib: 3.4.3 matplotlib-inline: 0.1.3 mrcfile: 1.3.0 msgpack: 1.0.2 netCDF4: 1.5.7 networkx: 2.6.3 numexpr: 2.7.3 numpy: 1.21.2 openvr: 1.16.801 packaging: 21.0 pandas: 1.3.2 ParmEd: 3.2.0 parso: 0.8.2 pexpect: 4.8.0 pickleshare: 0.7.5 Pillow: 8.3.2 pip: 21.2.4 pkginfo: 1.7.1 prompt-toolkit: 3.0.20 psutil: 5.8.0 ptyprocess: 0.7.0 pycollada: 0.7.1 pydicom: 2.1.2 Pygments: 2.10.0 PyOpenGL: 3.1.5 PyOpenGL-accelerate: 3.1.5 pyparsing: 2.4.7 PyQt5-commercial: 5.15.2 PyQt5-sip: 12.8.1 PyQtWebEngine-commercial: 5.15.2 Pyro4: 4.81 Pyro5: 5.12 python-dateutil: 2.8.2 pytz: 2021.3 pyzmq: 22.3.0 qtconsole: 5.1.1 QtPy: 1.11.2 RandomWords: 0.3.0 rdkit-pypi: 2021.3.5.1 requests: 2.26.0 scikit-learn: 1.0 scipy: 1.7.1 serpent: 1.40 setuptools: 57.5.0 sfftk-rw: 0.7.1 six: 1.16.0 snowballstemmer: 2.1.0 sortedcontainers: 2.4.0 Sphinx: 4.2.0 sphinx-autodoc-typehints: 1.12.0 sphinxcontrib-applehelp: 1.0.2 sphinxcontrib-blockdiag: 2.0.0 sphinxcontrib-devhelp: 1.0.2 sphinxcontrib-htmlhelp: 2.0.0 sphinxcontrib-jsmath: 1.0.1 sphinxcontrib-qthelp: 1.0.3 sphinxcontrib-serializinghtml: 1.1.5 suds-jurko: 0.6 threadpoolctl: 3.0.0 tifffile: 2021.4.8 tinyarray: 1.2.3 tornado: 6.1 traitlets: 5.1.0 urllib3: 1.26.7 wcwidth: 0.2.5 webcolors: 1.11.1 wheel: 0.37.0 wheel-filename: 1.3.0
Change History (15)
comment:2 by , 4 years ago
Cc: | added |
---|---|
Component: | Unassigned → Structure Comparison |
Owner: | set to |
Platform: | → all |
Project: | → ChimeraX |
Status: | new → accepted |
Summary: | ChimeraX bug report submission → single chain matching |
comment:3 by , 4 years ago
Component: | Structure Comparison → Performance |
---|---|
Resolution: | → duplicate |
Status: | accepted → closed |
This is not actually because of chain matching, it is because DSSP on the large structure is very slow (#4759). The only way to avoid this right now is to provide the "computeSS false" arg to matchmaker.
I did add code to prune the chain matching down to chains that actually contain requested residues, but that only speeded things up by about a second and a half in my tests. Nonetheless, I will commit that change on the develop branch.
follow-up: 4 comment:4 by , 4 years ago
So matchmaker is going to run DSSP on the structure by default even if we have secondary structure records already in the target? Seems a recipe to make matchmaker very slow when the target is any large structure (e.g. a ribosome). Does it cache the DSSP results? It seems like this would really kill performance of AlphaFold using matchmaker on each chain of a large structure to align each AlphaFold model. If the target is a structure not from the PDB and had no secondary-structure meta-data so ChimeraX ran dssp when it was opened, does matchmaker know that or does it run DSSP again on the whole structure every time a match is done?
follow-up: 5 comment:5 by , 4 years ago
This was based on experiments run for our original publication on the algorithm, finding that superpositions were better when we re-calculated secondary structure for PDB entries even though they already had assignments in their files. Apparently the assignments already in the files tended to be quite inconsistent, having come from a variety of methods.
follow-up: 6 comment:6 by , 4 years ago
As per the Matchmaker paper, the resulting matching is measurably better if the secondary structure determination in both structures is consistent. Yes, the results are cached. DSSP will not be run until secondary structure information is requested. So for large structures that don't initially display ribbons, DSSP has most likely not been run before matchmaker executes. That said, the caching only persists within one execution of a match_maker command. I could look to improve that, but I think the first priority is to make DSSP faster.
follow-up: 7 comment:7 by , 4 years ago
Ok. The alphafold match command matchmakers every chain in the target structure to the alphafold database models. So I guess that would take 45 minutes of matching on this 6u0v structure, while based on my single chain test (.1 second to do the match) it would take 6 seconds. (Mercifully the AlphaFold database does not have a match for this structure, but it probably will in the future.) Performance is unexpectedly bad. Caching DSSP would reduce time from 45 minutes to 45 seconds. Maybe 6u0v (249,00 protein atoms) is an especially slow DSSP case. I tested the speed of alphafold match on yeast 80S ribosome dimer 4v7r (79,000 protein atoms, 310,000 total atoms), matching 104 chain models with AlphaFold database models. First time when all structures had to be fetched from AlphaFold database, took 550 seconds, second time using cached AlphaFold models took 523 seconds, and with computeSS false in matchmaker took 113 seconds. So alphafold match takes about 9 minutes with compute_ss = true versus 2 minutes with compute_ss = false.
follow-up: 8 comment:8 by , 4 years ago
For "alphafold match", where the sequences are identical or extremely similar except for missing segments, you shouldn't bother with secondary structure matching at all. Use the "ssFraction false" option to matchmaker.
follow-up: 9 comment:9 by , 4 years ago
AlphaFold match can fetch very dissimilar sequences since it will do a BLAT sequence search and find the closest hit. I am thinking computeSS false may be a sensible compromise. But for small structures I'd prefer to use the default matchmaker settings. So I am undecided what to do.
follow-up: 10 comment:10 by , 4 years ago
I'm think that if AlphaFold comes up with a different fold for a dissimilar sequence, the SS matching may produce worse results. Up to you of course.
comment:11 by , 4 years ago
Besides AlphaFold there is the question of what matchmaker options the Blast Protein tool should use. This bug report was the result of running Blast and having it take 45 seconds to load and align a small protein to a virus capsid monomer. Other large complexes like ribosomes will also be very slow for loading Blast hits because of the matchmaker alignment step using dssp. Should Blast also use matchmaker option "ssFraction false"?
follow-up: 12 comment:12 by , 4 years ago
Not in my opinion. For Blast PDB you would want to use secondary structure to align structurally similar but sequence-distant hits effectively. I rather suspect your HUGE complexes are going to be a miniscule proportion of use cases. However, if you really can't stand defaults I would vote for "compute" false avoiding the recalculation of secondary structure, rather than discarding it altogether.
comment:13 by , 4 years ago
Agree with Elaine.
In long view you do nothing because DSSP will be faster. In the short view, you could possibly do "computeSS false" and live with the fact that the secondary structure assignments may be slightly off, but using the secondary structure is important for structurally aligning similar regions so you don't want to do "ssFraction false".
follow-up: 14 comment:14 by , 4 years ago
I'm also skeptical of discarding secondary structure entirely for matching of alphafold database entries. I've spent a lot of time on comparisons with known structures, and when there is some known similar structure (e.g. basically all situations where there is a query structure to match with, unless the person has generated it on their own with some other modeling) the secondary structure is in high agreement with that structure. AlphaFold modeling is not like first-principles physics-based de novo modeling... from what I've seen, it hews significantly to the existing PDB database database structures as templates.
comment:15 by , 4 years ago
Summary: | single chain matching → matchmaker takes 45 seconds to align one small chain to virus capsid due to DSSP calculation |
---|
Made alphafold match command use matchmaker computeSS false to speed it up when matching large structures. The alphafold fetch and predict commands use the default computeSS true since those only match one chain. Changed both 1.3 and 1.4 builds.