Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#7346 closed enhancement (fixed)

Make alphafold fetch and pae commands work with version 3 of the AlphaFold database

Reported by: Tristan Croll Owned by: Tom Goddard
Priority: normal Milestone:
Component: Structure Prediction Version:
Keywords: Cc: Elaine Meng, Zach Pearson
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

The following bug report has been submitted:
Platform:        Windows-10-10.0.19041
ChimeraX Version: 1.4 (2022-06-03 23:39:42 UTC)
Description
Brings the `alphafold fetch` and `alphafold pae` commands up to date with yesterday's surprise changes to the AlphaFold DB. Tested on a few random proteins - to my surprise, their (baffling) decision to round all PAE values to the nearest integer Angstrom actually doesn't seem to affect the domain-splitting algorithm anywhere near as much as I expected... Not sure how much it will affect ISOLDE's confidence-based distance restraint weighting, but will play around with it.

As I discussed with Tom yesterday, it would be a massive favour to me if we could find a way to retrofit this back into ChimeraX 1.4 - AlphaFold models play a huge role in this release of ISOLDE, and it would be a bad look if 199 out of every 200 attempts to pull a model from the database fail. Realistically, I'm not even going to be able to look at updating ISOLDE to ChimeraX 1.5 until October at the earliest - starting the new job in mid-Aug, travelling for the first half of Sep, and we of course need to settle on a new Linux build environment.

OpenGL version: 3.3.0 NVIDIA 497.29
OpenGL renderer: NVIDIA GeForce RTX 2080/PCIe/SSE2
OpenGL vendor: NVIDIA Corporation

Python: 3.9.11
Locale: en_GB.cp1252
Qt version: PyQt6 6.3.0, Qt 6.3.0
Qt runtime version: 6.3.0
Qt platform: windows

Manufacturer: Notebook                        
Model: P7xxTM1
OS: Microsoft Windows 10 Education (Build 19041)
Memory: 68,654,501,888
MaxProcessMemory: 137,438,953,344
CPU: 16 Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
OSLanguage: en-GB

Installed Packages:
    -: imerax-isolde
    -himerax-clipper: 0.18.0
    -himerax-isolde: 1.4b2
    -imerax-isolde: 1.4b2
    absl-py: 1.0.0
    alabaster: 0.7.12
    appdirs: 1.4.4
    Babel: 2.10.1
    backcall: 0.2.0
    blockdiag: 3.0.0
    certifi: 2022.5.18.1
    cftime: 1.6.0
    charset-normalizer: 2.0.12
    ChimeraX-AddCharge: 1.2.3
    ChimeraX-AddH: 2.1.11
    ChimeraX-AlignmentAlgorithms: 2.0
    ChimeraX-AlignmentHdrs: 3.2.1
    ChimeraX-AlignmentMatrices: 2.0
    ChimeraX-Alignments: 2.4.3
    ChimeraX-AlphaFold: 1.0
    ChimeraX-AltlocExplorer: 1.0.2
    ChimeraX-AmberInfo: 1.0
    ChimeraX-Arrays: 1.0
    ChimeraX-Atomic: 1.39.1
    ChimeraX-AtomicLibrary: 7.0
    ChimeraX-AtomSearch: 2.0.1
    ChimeraX-AxesPlanes: 2.1
    ChimeraX-BasicActions: 1.1
    ChimeraX-BILD: 1.0
    ChimeraX-BlastProtein: 2.1.1
    ChimeraX-BondRot: 2.0
    ChimeraX-BugReporter: 1.0
    ChimeraX-BuildStructure: 2.7
    ChimeraX-Bumps: 1.0
    ChimeraX-BundleBuilder: 1.1
    ChimeraX-ButtonPanel: 1.0
    ChimeraX-CageBuilder: 1.0
    ChimeraX-CellPack: 1.0
    ChimeraX-Centroids: 1.2
    ChimeraX-ChemGroup: 2.0
    ChimeraX-Clashes: 2.2.4
    ChimeraX-Clipper: 0.18.0
    ChimeraX-ColorActions: 1.0
    ChimeraX-ColorGlobe: 1.0
    ChimeraX-ColorKey: 1.5.1
    ChimeraX-CommandLine: 1.2.3
    ChimeraX-ConnectStructure: 2.0.1
    ChimeraX-Contacts: 1.0
    ChimeraX-Core: 1.4
    ChimeraX-CoreFormats: 1.1
    ChimeraX-coulombic: 1.3.2
    ChimeraX-Crosslinks: 1.0
    ChimeraX-Crystal: 1.0
    ChimeraX-CrystalContacts: 1.0
    ChimeraX-DataFormats: 1.2.2
    ChimeraX-Dicom: 1.1
    ChimeraX-DistMonitor: 1.1.5
    ChimeraX-Dssp: 2.0
    ChimeraX-EMDB-SFF: 1.0
    ChimeraX-ExperimentalCommands: 1.0
    ChimeraX-FileHistory: 1.0
    ChimeraX-FunctionKey: 1.0
    ChimeraX-Geometry: 1.2
    ChimeraX-gltf: 1.0
    ChimeraX-Graphics: 1.1
    ChimeraX-Hbonds: 2.1.2
    ChimeraX-Help: 1.2
    ChimeraX-HKCage: 1.3
    ChimeraX-IHM: 1.1
    ChimeraX-ImageFormats: 1.2
    ChimeraX-IMOD: 1.0
    ChimeraX-IO: 1.0.1
    ChimeraX-ISOLDE: 1.4
    ChimeraX-ItemsInspection: 1.0
    ChimeraX-Label: 1.1.1
    ChimeraX-ListInfo: 1.1.1
    ChimeraX-Log: 1.1.5
    ChimeraX-LookingGlass: 1.1
    ChimeraX-Maestro: 1.8.1
    ChimeraX-Map: 1.1
    ChimeraX-MapData: 2.0
    ChimeraX-MapEraser: 1.0
    ChimeraX-MapFilter: 2.0
    ChimeraX-MapFit: 2.0
    ChimeraX-MapSeries: 2.1
    ChimeraX-Markers: 1.0
    ChimeraX-Mask: 1.0
    ChimeraX-MatchMaker: 2.0.6
    ChimeraX-MDcrds: 2.6
    ChimeraX-MedicalToolbar: 1.0.1
    ChimeraX-Meeting: 1.0
    ChimeraX-MLP: 1.1
    ChimeraX-mmCIF: 2.7
    ChimeraX-MMTF: 2.1
    ChimeraX-Modeller: 1.5.5
    ChimeraX-ModelPanel: 1.3.2
    ChimeraX-ModelSeries: 1.0
    ChimeraX-Mol2: 2.0
    ChimeraX-Morph: 1.0
    ChimeraX-MouseModes: 1.1
    ChimeraX-Movie: 1.0
    ChimeraX-Neuron: 1.0
    ChimeraX-Nucleotides: 2.0.2
    ChimeraX-OpenCommand: 1.9
    ChimeraX-PDB: 2.6.6
    ChimeraX-PDBBio: 1.0
    ChimeraX-PDBLibrary: 1.0.2
    ChimeraX-PDBMatrices: 1.0
    ChimeraX-PickBlobs: 1.0
    ChimeraX-Positions: 1.0
    ChimeraX-PresetMgr: 1.1
    ChimeraX-PubChem: 2.1
    ChimeraX-ReadPbonds: 1.0.1
    ChimeraX-Registration: 1.1
    ChimeraX-RemoteControl: 1.0
    ChimeraX-ResidueFit: 1.0
    ChimeraX-RestServer: 1.1
    ChimeraX-RNALayout: 1.0
    ChimeraX-RotamerLibMgr: 2.0.1
    ChimeraX-RotamerLibsDunbrack: 2.0
    ChimeraX-RotamerLibsDynameomics: 2.0
    ChimeraX-RotamerLibsRichardson: 2.0
    ChimeraX-SaveCommand: 1.5.1
    ChimeraX-SchemeMgr: 1.0
    ChimeraX-SDF: 2.0
    ChimeraX-Segger: 1.0
    ChimeraX-Segment: 1.0
    ChimeraX-SelInspector: 1.0
    ChimeraX-SeqView: 2.6
    ChimeraX-Shape: 1.0.1
    ChimeraX-Shell: 1.0
    ChimeraX-Shortcuts: 1.1
    ChimeraX-ShowAttr: 1.0
    ChimeraX-ShowSequences: 1.0
    ChimeraX-SideView: 1.0
    ChimeraX-Smiles: 2.1
    ChimeraX-SmoothLines: 1.0
    ChimeraX-SpaceNavigator: 1.0
    ChimeraX-StdCommands: 1.8
    ChimeraX-STL: 1.0
    ChimeraX-Storm: 1.0
    ChimeraX-StructMeasure: 1.0.1
    ChimeraX-Struts: 1.0.1
    ChimeraX-Surface: 1.0
    ChimeraX-SwapAA: 2.0
    ChimeraX-SwapRes: 2.1.1
    ChimeraX-TapeMeasure: 1.0
    ChimeraX-Test: 1.0
    ChimeraX-Toolbar: 1.1.1
    ChimeraX-ToolshedUtils: 1.2.1
    ChimeraX-Tug: 1.0
    ChimeraX-UI: 1.18.3
    ChimeraX-uniprot: 2.2
    ChimeraX-UnitCell: 1.0
    ChimeraX-ViewDockX: 1.1.2
    ChimeraX-VIPERdb: 1.0
    ChimeraX-Vive: 1.1
    ChimeraX-VolumeMenu: 1.0
    ChimeraX-VTK: 1.0
    ChimeraX-WavefrontOBJ: 1.0
    ChimeraX-WebCam: 1.0
    ChimeraX-WebServices: 1.1.0
    ChimeraX-Zone: 1.0
    colorama: 0.4.4
    comtypes: 1.1.10
    cxservices: 1.2
    cycler: 0.11.0
    Cython: 0.29.26
    debugpy: 1.6.0
    decorator: 5.1.1
    docutils: 0.17.1
    entrypoints: 0.4
    filelock: 3.4.2
    fonttools: 4.33.3
    funcparserlib: 1.0.0
    grako: 3.16.5
    h5py: 3.7.0
    html2text: 2020.1.16
    idna: 3.3
    ihm: 0.27
    imagecodecs: 2021.11.20
    imagesize: 1.3.0
    ipykernel: 6.6.1
    ipython: 7.31.1
    ipython-genutils: 0.2.0
    jax: 0.2.27
    jedi: 0.18.1
    Jinja2: 3.0.3
    jupyter-client: 7.1.0
    jupyter-core: 4.10.0
    kiwisolver: 1.4.2
    line-profiler: 3.4.0
    lxml: 4.7.1
    lz4: 3.1.10
    MarkupSafe: 2.1.1
    matplotlib: 3.5.1
    matplotlib-inline: 0.1.3
    mpmath: 1.2.1
    msgpack: 1.0.3
    nest-asyncio: 1.5.5
    netCDF4: 1.5.8
    networkx: 2.6.3
    numexpr: 2.8.1
    numpy: 1.22.1
    OpenMM: 7.7.0
    openvr: 1.16.802
    opt-einsum: 3.3.0
    packaging: 21.3
    ParmEd: 3.4.3
    parso: 0.8.3
    pickleshare: 0.7.5
    Pillow: 9.0.1
    pip: 21.3.1
    pkginfo: 1.8.2
    prompt-toolkit: 3.0.29
    psutil: 5.9.0
    pycollada: 0.7.2
    pydicom: 2.2.2
    Pygments: 2.11.2
    PyOpenGL: 3.1.5
    PyOpenGL-accelerate: 3.1.5
    pyparsing: 3.0.9
    PyQt6-commercial: 6.3.0
    PyQt6-Qt6: 6.3.0
    PyQt6-sip: 13.3.1
    PyQt6-WebEngine-commercial: 6.3.0
    PyQt6-WebEngine-Qt6: 6.3.0
    python-dateutil: 2.8.2
    pytz: 2022.1
    pywin32: 303
    pyzmq: 23.1.0
    qtconsole: 5.3.0
    QtPy: 2.1.0
    RandomWords: 0.3.0
    requests: 2.27.1
    scipy: 1.7.3
    setuptools: 59.8.0
    sfftk-rw: 0.7.2
    six: 1.16.0
    snowballstemmer: 2.2.0
    sortedcontainers: 2.4.0
    Sphinx: 4.3.2
    sphinx-autodoc-typehints: 1.15.2
    sphinxcontrib-applehelp: 1.0.2
    sphinxcontrib-blockdiag: 3.0.0
    sphinxcontrib-devhelp: 1.0.2
    sphinxcontrib-htmlhelp: 2.0.0
    sphinxcontrib-jsmath: 1.0.1
    sphinxcontrib-qthelp: 1.0.3
    sphinxcontrib-serializinghtml: 1.1.5
    suds-community: 1.0.0
    sympy: 1.10.1
    tables: 3.7.0
    tifffile: 2021.11.2
    tinyarray: 1.2.4
    tornado: 6.1
    traitlets: 5.1.1
    typing-extensions: 4.0.1
    urllib3: 1.26.9
    versioneer: 0.21
    wcwidth: 0.2.5
    webcolors: 1.11.1
    wheel: 0.37.1
    wheel-filename: 1.3.0
    WMI: 1.5.1
File attachment: alphafold.diff

alphafold.diff

Attachments (1)

alphafold.diff (3.9 KB ) - added by Tristan Croll 3 years ago.
Added by email2trac

Download all attachments as: .zip

Change History (23)

by Tristan Croll, 3 years ago

Attachment: alphafold.diff added

Added by email2trac

comment:1 by Tom Goddard, 3 years ago

Component: UnassignedStructure Prediction
Owner: set to Tom Goddard
Platform: all
Project: ChimeraX
Status: newassigned
Summary: ChimeraX bug report submissionMake alphafold fetch and pae commands work with version 3 of the AlphaFold database
Type: defectenhancement

I think we can find away to get version 3 of the database working in ChimeraX 1.4, probably by ISOLDE monkey patching the PAE reader and setting the database version to 3 rather than making a 1.4.1 release. I'll be looking at the details this morning. I realize this is a high priority.

One problem we will probably not be able to solve for about a month is getting alphafold sequence search working on version 3. Because version 3 has 200 million sequences while version 2 has 1 million, we are going to need to use different tools, probably mmseqs2 for the search so that it does not take an hour.

I wonder what the typical method for the ISOLDE user will be to get their AlphaFold starting model. Maybe they know a UniProt ID and can fetch. But it seems to me that in many cases their experimental sequence is not going to be exactly some UniProt sequence. It will be a mutant or an isoform. So it seems they will often need to do a sequence search. As noted above we can't do a fast sequence search. So they will need to do an EBI sequence search. From there will they then download the mmCIF file and PAE file, or will they be advised to use ChimeraX commands to fetch the UniProt id the search finds. Or is it none of those. It would make more sense given how fast the ColabFold predictions are to get an exact sequence match AlphaFold model by running AlphaFold prediction. In order for me to best help ISOLDE use of AlphaFold I need these additional details about how typical users will get alphafold models.

comment:2 by Tom Goddard, 3 years ago

In ChimeraX 1.4 there is code to try to handle new versions of the AlphaFold database. If a fetch fails it queries a UCSF status page that tells ChimeraX the latest version

https://www.rbvi.ucsf.edu/chimerax/data/status/alphafold_database.json

and it then updates the ChimeraX default settings to use that version.

So it looks like it is trivial to get ChimeraX 1.4 alphafold fetch to get version 3 files by changing that json file. Unfortunately EBI changed the PAE format so if we do that then ChimeraX gets PAE files that won't be readable. This was an unfortunate bad move by EBI to change some internal format details.

comment:3 by Tom Goddard, 3 years ago

In fact the online settings alphafold_database.json that ChimeraX uses gets checked once per day even if AlphaFold fetch is used even if no fetch error occurs.

in reply to:  5 comment:4 by Tristan Croll, 3 years ago

Understood. Happy to take that route.

For ISOLDE 1.4 I've added tutorials for both the options you describe: fetching pre-made models from the DB using alphafold match or alphafold fetch, or working with their own models made with their choice of Colab notebook.

Regarding sequence searching: if alphafold match falls back to sequence searching, I could patch in a warning saying that it will only be searching the v2 database, and if it doesn't come up with a good match they should do their own search. I'm not aware that alphafold-ebi provide a web-based sequence search yet, but I guess now they don't really need to - the user can just search UniProt instead. Each UniProt page now has a link to the matching AlphaFold model (if it exists, which it almost always will).
________________________________
From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
Sent: 29 July 2022 17:49
Cc: goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk>
Subject: Re: [ChimeraX] #7346: Make alphafold fetch and pae commands work with version 3 of the AlphaFold database (was: ChimeraX bug report submission)

#7346: Make alphafold fetch and pae commands work with version 3 of the AlphaFold
database
-------------------------------------------+-------------------------
          Reporter:  Tristan Croll         |      Owner:  Tom Goddard
              Type:  enhancement           |     Status:  assigned
          Priority:  normal                |  Milestone:
         Component:  Structure Prediction  |    Version:
        Resolution:                        |   Keywords:
        Blocked By:                        |   Blocking:
Notify when closed:                        |   Platform:  all
           Project:  ChimeraX              |
-------------------------------------------+-------------------------
Changes (by Tom Goddard):

 * status:  new => assigned
 * component:  Unassigned => Structure Prediction
 * project:   => ChimeraX
 * platform:   => all
 * owner:  (none) => Tom Goddard
 * type:  defect => enhancement


Comment:

 I think we can find away to get version 3 of the database working in
 ChimeraX 1.4, probably by ISOLDE monkey patching the PAE reader and
 setting the database version to 3 rather than making a 1.4.1 release.
 I'll be looking at the details this morning.  I realize this is a high
 priority.

 One problem we will probably not be able to solve for about a month is
 getting alphafold sequence search working on version 3.  Because version 3
 has 200 million sequences while version 2 has 1 million, we are going to
 need to use different tools, probably mmseqs2 for the search so that it
 does not take an hour.

 I wonder what the typical method for the ISOLDE user will be to get their
 AlphaFold starting model.  Maybe they know a UniProt ID and can fetch.
 But it seems to me that in many cases their experimental sequence is not
 going to be exactly some UniProt sequence.  It will be a mutant or an
 isoform.  So it seems they will often need to do a sequence search.  As
 noted above we can't do a fast sequence search.  So they will need to do
 an EBI sequence search.  From there will they then download the mmCIF file
 and PAE file, or will they be advised to use ChimeraX commands to fetch
 the UniProt id the search finds.  Or is it none of those.  It would make
 more sense given how fast the ColabFold predictions are to get an exact
 sequence match AlphaFold model by running AlphaFold prediction.  In order
 for me to best help ISOLDE use of AlphaFold I need these additional
 details about how typical users will get alphafold models.

--
Ticket URL: <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rbvi.ucsf.edu%2Ftrac%2FChimeraX%2Fticket%2F7346%23comment%3A1&amp;data=05%7C01%7Ctic20%40universityofcambridgecloud.onmicrosoft.com%7C77054d7dda0448f03ec808da71826710%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637947102205909510%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=6xIdHlX0Rh6h0pgxmlvMlCg2QLSQGdIOhQe2XRun%2BF8%3D&amp;reserved=0>
ChimeraX <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rbvi.ucsf.edu%2Fchimerax%2F&amp;data=05%7C01%7Ctic20%40universityofcambridgecloud.onmicrosoft.com%7C77054d7dda0448f03ec808da71826710%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637947102205909510%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=PMLL8bhcxZlw1neWUaURwB93vmJpkgFOGURSW9Oac%2BM%3D&amp;reserved=0>
ChimeraX Issue Tracker

comment:5 by Tom Goddard, 3 years ago

I see several ways for ChimeraX handle DB version 3.

1) Leave ChimeraX 1.4 using AFDB version 2. Update daily build to work with version 3. ISOLDE 1.4 can monkey patch fixes.
2) Release a ChimeraX 1.4.1 that works with AFDB 3.
3) Update ChimeraX 1.4 to use AFDB version 3. Fetches of PAE will get the file but give an error trying to open it.

I think option 1) is the best. Option 2 is too much work and won't even reach most of the users because they won't update from their current 1.4. Option 3 makes PAE from the database useless and reduces confidence in our software.

So how to make ISOLDE work with option 1. I think ISOLDE on startup sets the AlphaFold database version to 3. It doesn't save that preference setting because then PAE would fail when they use ChimeraX without starting ISOLDE -- it just sets it when ISOLDE is started in a session. Then ISOLDE replaces the read_json_pae_matrix() routine in the ChimeraX alphafold module, file pae.py to handle AFDB version 3 files. I can provide that fixed function, since I will probably put it in the daily build today. That function needs to handle all the flavors of AlphaFold PAE data since the user may have old PAE files, new ones, ones from ColabFold, all the different flavors.

comment:6 by Tom Goddard, 3 years ago

The schedule for ChimeraX 1.5 was set at yesterday's group meeting with code freeze on October 1 and release made in November or more likely December.

Users who want to use AFDB version 3 will be advised to use the ChimeraX daily build.

in reply to:  8 comment:7 by Tristan Croll, 3 years ago

Could also do it in ISOLDE's bundle_init() method so it happens on ChimeraX startup?
________________________________
From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
Sent: 29 July 2022 18:07
Cc: goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk>
Subject: Re: [ChimeraX] #7346: Make alphafold fetch and pae commands work with version 3 of the AlphaFold database

#7346: Make alphafold fetch and pae commands work with version 3 of the AlphaFold
database
-------------------------------------------+-------------------------
          Reporter:  Tristan Croll         |      Owner:  Tom Goddard
              Type:  enhancement           |     Status:  assigned
          Priority:  normal                |  Milestone:
         Component:  Structure Prediction  |    Version:
        Resolution:                        |   Keywords:
        Blocked By:                        |   Blocking:
Notify when closed:                        |   Platform:  all
           Project:  ChimeraX              |
-------------------------------------------+-------------------------

Comment (by Tom Goddard):

 I see several ways for ChimeraX handle DB version 3.

 1) Leave ChimeraX 1.4 using AFDB version 2.  Update daily build to work
 with version 3.  ISOLDE 1.4 can monkey patch fixes.
 2) Release a ChimeraX 1.4.1 that works with AFDB 3.
 3) Update ChimeraX 1.4 to use AFDB version 3.  Fetches of PAE will get the
 file but give an error trying to open it.

 I think option 1) is the best.  Option 2 is too much work and won't even
 reach most of the users because they won't update from their current 1.4.
 Option 3 makes PAE from the database useless and reduces confidence in our
 software.

 So how to make ISOLDE work with option 1.  I think ISOLDE on startup sets
 the AlphaFold database version to 3.  It doesn't save that preference
 setting because then PAE would fail when they use ChimeraX without
 starting ISOLDE -- it just sets it when ISOLDE is started in a session.
 Then ISOLDE replaces the read_json_pae_matrix() routine in the ChimeraX
 alphafold module, file pae.py to handle AFDB version 3 files.  I can
 provide that fixed function, since I will probably put it in the daily
 build today.  That function needs to handle all the flavors of AlphaFold
 PAE data since the user may have old PAE files, new ones, ones from
 ColabFold, all the different flavors.

--
Ticket URL: <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rbvi.ucsf.edu%2Ftrac%2FChimeraX%2Fticket%2F7346%23comment%3A5&amp;data=05%7C01%7Ctic20%40universityofcambridgecloud.onmicrosoft.com%7Caddeb0b8e2c84ddc676408da7184f5db%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637947113132974234%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=5WptaRpESdya678pvtdi3HUfsiWQauepIOvRMWpYPZI%3D&amp;reserved=0>
ChimeraX <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rbvi.ucsf.edu%2Fchimerax%2F&amp;data=05%7C01%7Ctic20%40universityofcambridgecloud.onmicrosoft.com%7Caddeb0b8e2c84ddc676408da7184f5db%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637947113132974234%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=LqS4oZEKibKnsTwimjqSfYC6i6Si9uPNG7xOcpSui1U%3D&amp;reserved=0>
ChimeraX Issue Tracker

comment:8 by Tom Goddard, 3 years ago

Should ISOLDE patch alphafold match to warn if it is using version 3 of the database when doing a search? Maybe. ISOLDE could monkey patch some routine to give this warning. ChimeraX does not know what version of the database the backend server is using. But it will get UniprotSequence object from chimerax/alphafold/search.py that reports the database version of the file that was found and if that is 2 then the warning could be issued that there may be better version 3 files that ChimeraX did not look for.

comment:9 by Tom Goddard, 3 years ago

Yes, you could do the patches in ISOLDE bundle_init(), so anyone who has even installed ISOLDE will get the benefits of AlphaFold DB version 3. But again the monkey patched code of course needs to continue to work with old PAE files -- easy to do.

comment:10 by Tom Goddard, 3 years ago

I checked in fixed code for read_json_pae_matrix() to read the version 3 PAE files (and also all the other flavors). This will be in the daily builds tomorrow

https://github.com/RBVI/ChimeraX/blob/9720fa40eef09140e629fdbbd39c641181ab4c99/src/bundles/alphafold/src/pae.py#L879

You might want to copy and paste this function into ISOLDE to replace the ChimeraX 1.4 version.

in reply to:  12 comment:11 by Tristan Croll, 3 years ago

Just checking: did you notice the file attachment to the original report? :)
________________________________
From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
Sent: 29 July 2022 18:20
Cc: goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk>
Subject: Re: [ChimeraX] #7346: Make alphafold fetch and pae commands work with version 3 of the AlphaFold database

#7346: Make alphafold fetch and pae commands work with version 3 of the AlphaFold
database
-------------------------------------------+-------------------------
          Reporter:  Tristan Croll         |      Owner:  Tom Goddard
              Type:  enhancement           |     Status:  assigned
          Priority:  normal                |  Milestone:
         Component:  Structure Prediction  |    Version:
        Resolution:                        |   Keywords:
        Blocked By:                        |   Blocking:
Notify when closed:                        |   Platform:  all
           Project:  ChimeraX              |
-------------------------------------------+-------------------------

Comment (by Tom Goddard):

 Yes, you could do the patches in ISOLDE bundle_init(), so anyone who has
 even installed ISOLDE will get the benefits of AlphaFold DB version 3.
 But again the monkey patched code of course needs to continue to work with
 old PAE files -- easy to do.

--
Ticket URL: <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rbvi.ucsf.edu%2Ftrac%2FChimeraX%2Fticket%2F7346%23comment%3A9&amp;data=05%7C01%7Ctic20%40universityofcambridgecloud.onmicrosoft.com%7C847e41f6bc784072a16f08da7186c393%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637947120869383758%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=3dQOtqBL81b%2BpUaEszvXtoz6FeAm2jyC45eilO10W10%3D&amp;reserved=0>
ChimeraX <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rbvi.ucsf.edu%2Fchimerax%2F&amp;data=05%7C01%7Ctic20%40universityofcambridgecloud.onmicrosoft.com%7C847e41f6bc784072a16f08da7186c393%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637947120869383758%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=UHFCrOLiRtvhv4DYxsddZhic1rfh8ehAAQbCfQuyWpw%3D&amp;reserved=0>
ChimeraX Issue Tracker

comment:12 by Tom Goddard, 3 years ago

Oops! Missed your attachment. But the code is trivial, 5 minutes work. And I noticed yesterday that "alphafold pae" isn't using the version option, ugh, so about to fix that.

comment:13 by Tom Goddard, 3 years ago

Fixed alphafold fetch command not using the version option when fetching pae, ticket #7352.

comment:14 by Tom Goddard, 3 years ago

Let me know if there is more that needs to be done to help ISOLDE use AlphaFold database version 3.

Although I don't know the best way to use AlphaFold in ISOLDE, I still have the feeling that for refinement of atomic models in a cryoEM map it probably makes sense for the user to be running AlphaFold prediction. First try to run it on their whole complex if the number of residues is manageable (say under 1200), or if too many residues, run predictions on some subcomplexes. Using the AlphaFold database may be slightly faster to get a starting model, possibly good for a tutorial, but given the many hours or days of effort that a refinement will take it makes sense to me that real work would run predictions of multimers in most cases rather than using precomputed AlphaFold database models. Even if that is true, we of course want the AlphaFold database models to be as usable as possible.

in reply to:  16 comment:15 by Tristan Croll, 3 years ago

Thanks - will see about integrating it tomorrow.

ISOLDE currently has four different interactive tutorials using AlphaFold:

- EMD-9118 (6mhu) - tetramer, builds quite straightforwardly from four AlphaFold-DB chains
- 2rd0 - old, really rough crystallographic model (heterodimer). AlphaFold-DB models are used as references to generate restraints - in this case, it doesn't really matter if the two AlphaFold models pack together well (as long as their conformation is reasonably close) since the restraints are applied on a per-chain basis.
- EMD-30827 (7drt) - heterodimer. Would be really difficult to build from individual AlphaFold-DB monomers, since the chains undergo large conformational changes on binding so they clash horribly when docked to the density. ColabFold-Multimer model is much better (far from perfect, still quite a bit of work needed to fit the map - but an easy starting point).
- 3now - monomer crystal structure, but with a flexible hinge. Tutorial demonstrates using a molecular replacement solution generated by independently placing two rigid fragments to quickly move forward by using it (and its preliminary map) as a guide to flexibly fit the complete model. Actually works fine from the AlphaFold-DB model, but that has a small number of point mutations compared to the crystallised construct so I ran a custom AlphaFold job and used the result.

Ultimately I think that people will be using both approaches quite heavily. In many cases a practiced user will be able to have the relevant AlphaFold-DB model fitted and well on the way to final in the time it takes to run a new Colab prediction. But there will of course be many important cases where it makes more sense to run multimer predictions first.
________________________________
From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
Sent: 29 July 2022 21:14
Cc: goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk>
Subject: Re: [ChimeraX] #7346: Make alphafold fetch and pae commands work with version 3 of the AlphaFold database

#7346: Make alphafold fetch and pae commands work with version 3 of the AlphaFold
database
-------------------------------------------+-------------------------
          Reporter:  Tristan Croll         |      Owner:  Tom Goddard
              Type:  enhancement           |     Status:  assigned
          Priority:  normal                |  Milestone:
         Component:  Structure Prediction  |    Version:
        Resolution:                        |   Keywords:
        Blocked By:                        |   Blocking:
Notify when closed:                        |   Platform:  all
           Project:  ChimeraX              |
-------------------------------------------+-------------------------

Comment (by Tom Goddard):

 Let me know if there is more that needs to be done to help ISOLDE use
 AlphaFold database version 3.

 Although I don't know the best way to use AlphaFold in ISOLDE, I still
 have the feeling that for refinement of atomic models in a cryoEM map it
 probably makes sense for the user to be running AlphaFold prediction.
 First try to run it on their whole complex if the number of residues is
 manageable (say under 1200), or if too many residues, run predictions on
 some subcomplexes.  Using the AlphaFold database may be slightly faster to
 get a starting model, possibly good for a tutorial, but given the many
 hours or days of effort that a refinement will take it makes sense to me
 that real work would run predictions of multimers in most cases rather
 than using precomputed AlphaFold database models.  Even if that is true,
 we of course want the AlphaFold database models to be as usable as
 possible.

--
Ticket URL: <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rbvi.ucsf.edu%2Ftrac%2FChimeraX%2Fticket%2F7346%23comment%3A14&amp;data=05%7C01%7Ctic20%40universityofcambridgecloud.onmicrosoft.com%7C5b460814e61a45e05f2b08da719f06d9%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637947225083895811%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=%2FEGWA1BD3kBvBaJLgScB3eHPDy3WsFTDecunLWTn0Xk%3D&amp;reserved=0>
ChimeraX <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rbvi.ucsf.edu%2Fchimerax%2F&amp;data=05%7C01%7Ctic20%40universityofcambridgecloud.onmicrosoft.com%7C5b460814e61a45e05f2b08da719f06d9%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637947225083895811%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=VkJlzCBrHlq2Dc0iNypP9gK%2Bwl0HiU8cFwD7KxGRO6g%3D&amp;reserved=0>
ChimeraX Issue Tracker

comment:16 by Tom Goddard, 3 years ago

That's a good selection of tutorials. I agree that running AlphaFold predictions especially of multimers and using the EBI AlphaFold database will both be important for ISOLDE users.

comment:17 by Tom Goddard, 3 years ago

I've made another ticket #7358 to figure out how to update the ChimeraX sequence search capabilities for the new AlphaFold database version.

in reply to:  19 comment:18 by Tristan Croll, 3 years ago

OK - "alphafold fetch" and "alphafold pae" monkeypatches (copied below, called from the ISOLDE bundle's initialize() method) appear to be working correctly. Also patched "alphafold match" to print the following warning whenever it falls back to web-based sequence search:

ChimeraX AlphaFold sequence search currently only searches v2 of the AlphaFold sequence database, covering the original release of 40 reference genomes. To search the complete database, go to https://www.ebi.ac.uk/Tools/sss/fasta/ in your web browserand tick (only) "AlphaFold DB" under "Structures" in the first window. Due to the size of the database, expect that to take a while.

Will do some extra testing, but it looks like I should be good to release on Monday. Thanks very much for jumping on this so quickly!

def alphafold_monkeypatch(session):
    from chimerax.alphafold.database import _alphafold_database_settings
    settings = _alphafold_database_settings(session)
    settings.database_version = 3

    def read_json_pae_matrix(path):
        '''Open AlphaFold database distance error PAE JSON file returning a numpy matrix.'''
        f = open(path, 'r')
        import json
        j = json.load(f)
        f.close()


        if isinstance(j, dict) and 'pae' in j:
            # ColabFold 1.3 produces a JSON file different from AlphaFold database.
            from numpy import array, float32
            pae = array(j['pae'], float32)
            return pae

        if not isinstance(j, list):
            from chimerax.core.errors import UserError
            raise UserError(f'JSON file "{path}" is not AlphaFold predicted aligned error data, expected a top level list')
        d = j[0]


        if not isinstance(d, dict):
            from chimerax.core.errors import UserError
            raise UserError(f'JSON file "{path}" is not AlphaFold predicted aligned error data, expected a top level list containing a dictionary')

        if 'residue1' in d and 'residue2' in d and 'distance' in d:
            # AlphaFold Database versions 1 and 2 use this format
            # Read PAE into numpy array
            from numpy import array, zeros, float32, int32
            r1 = array(d['residue1'], dtype=int32)
            r2 = array(d['residue2'], dtype=int32)
            ea = array(d['distance'], dtype=float32)
            # me = d['max_predicted_aligned_error']
            n = r1.max()
            pae = zeros((n,n), float32)
            pae[r1-1,r2-1] = ea
            return pae

        if 'predicted_aligned_error' in d:
            # AlphaFold Database version 3 uses this format.
            from numpy import array, float32
            pae = array(d['predicted_aligned_error'], dtype=float32)
            pae[pae==0]=0.2
            return pae

        keys = ', '.join(str(k) for k in d.keys())
        from chimerax.core.errors import UserError
        raise UserError(f'JSON file "{path}" is not AlphaFold predicted aligned error data, expected a dictionary with keys "predicted_aligned_error" or "residue1", "residue2" and "distance", got keys {keys}')

    from chimerax.alphafold import pae
    pae.read_json_pae_matrix = read_json_pae_matrix

    def alphafold_pae(session, structure = None, file = None, uniprot_id = None,
                    version = None, palette = None, range = None, plot = None,
                    color_domains = False, connect_max_pae = 5, cluster = 0.5, min_size = 10):
        '''Load AlphaFold predicted aligned error file and show plot or color domains.'''

        if uniprot_id:
            from chimerax.alphafold.database import alphafold_pae_url
            pae_url = alphafold_pae_url(session, uniprot_id, database_version=version)
            file_name = pae_url.split('/')[-1]
            from chimerax.core.fetch import fetch_file
            file = fetch_file(session, pae_url, 'AlphaFold PAE %s' % uniprot_id,
                            file_name, 'AlphaFold', error_status = False)

        if file:
            from chimerax.alphafold.pae import AlphaFoldPAE
            pae = AlphaFoldPAE(file, structure)
            if structure:
                if structure.num_residues != pae.matrix_size:
                    from chimerax.core.errors import UserError
                    raise UserError('Number of residues in structure "%s" is %d which does not match PAE matrix size %d.'
                                    % (str(structure), structure.num_residues, pae.matrix_size) +
                                    '\n\nThis can happen if the AlphaFold model has been trimmed to match an experimental structure, or if residues have been deleted.  The full-length AlphaFold model must be used to show predicted aligned error.')
                structure.alphafold_pae = pae
        elif structure is None:
            from chimerax.core.errors import UserError
            raise UserError('No structure or PAE file specified.')
        else:
            pae = getattr(structure, 'alphafold_pae', None)
            if pae is None:
                from chimerax.core.errors import UserError
                raise UserError('No predicted aligned error (PAE) data opened for structure #%s'
                                % structure.id_string)

        if plot is None:
            plot = not color_domains    # Plot by default if not coloring domains.

        if plot:
            from chimerax.core.colors import colormap_with_range
            from chimerax.alphafold.pae import AlphaFoldPAEPlot
            colormap = colormap_with_range(palette, range, default_colormap_name = 'pae',
                                        full_range = (0,30))
            p = getattr(structure, '_alphafold_pae_plot', None)
            if p is None or p.closed():
                p = AlphaFoldPAEPlot(session, 'AlphaFold Predicted Aligned Error', pae,
                                    colormap=colormap)
                if structure:
                    structure._alphafold_pae_plot = p
            else:
                p.display(True)
                if palette is not None or range is not None:
                    p.set_colormap(colormap)
                if divider_lines is not None:
                    p.show_chain_dividers(divider_lines)

        pae.set_default_domain_clustering(connect_max_pae, cluster)
        if color_domains:
            if structure is None:
                from chimerax.core.errors import UserError
                raise UserError('Must specify structure to color domains.')
            pae.color_domains(connect_max_pae, cluster, min_size)

    # -----------------------------------------------------------------------------
    #
    def register_alphafold_pae_command(logger):
        from chimerax.core.commands import CmdDesc, register, OpenFileNameArg, ColormapArg, ColormapRangeArg, BoolArg, FloatArg, IntArg
        from chimerax.atomic import AtomicStructureArg, UniProtIdArg
        desc = CmdDesc(
            optional = [('structure', AtomicStructureArg)],
            keyword = [('file', OpenFileNameArg),
                    ('uniprot_id', UniProtIdArg),
                    ('palette', ColormapArg),
                    ('range', ColormapRangeArg),
                    ('plot', BoolArg),
                    ('color_domains', BoolArg),
                    ('connect_max_pae', FloatArg),
                    ('cluster', FloatArg),
                    ('min_size', IntArg),
                    ('version', IntArg)],
            synopsis = 'Show AlphaFold predicted aligned error'
        )

        register('alphafold pae', desc, alphafold_pae, logger=logger)

    from chimerax.alphafold import pae
    pae.alphafold_pae = alphafold_pae
    pae.register_alphafold_pae_command = register_alphafold_pae_command

    def alphafold_fetch(session, uniprot_id, color_confidence=True,
                        align_to=None, trim=True, pae=False, ignore_cache=False,
                        add_to_session=True, version=None, in_file_history=True, **kw):

        uniprot_name = uniprot_id if '_' in uniprot_id else None
        from chimerax.alphafold.fetch import _parse_uniprot_id
        uniprot_id = _parse_uniprot_id(uniprot_id)
        from chimerax.alphafold import database
        url = database.alphafold_model_url(session, uniprot_id, version)
        file_name = url.split('/')[-1]

        from chimerax.core.fetch import fetch_file
        filename = fetch_file(session, url, 'AlphaFold %s' % uniprot_id, file_name, 'AlphaFold',
                            ignore_cache=ignore_cache, error_status = False)

        model_name = 'AlphaFold %s' % (uniprot_name or uniprot_id)
        models, status = session.open_command.open_data(filename, format = 'mmCIF',
                                                        name = model_name,
                                                        in_file_history = in_file_history,
                                                        **kw)
        from chimerax.alphafold.match import _set_alphafold_model_attributes
        _set_alphafold_model_attributes(models, uniprot_id, uniprot_name)

        if color_confidence:
            from chimerax.alphafold.fetch import _color_by_confidence
            for s in models:
                # Set initial style so confidence coloring is not replaced.
                s.apply_auto_styling()
                s._auto_style = False
                _color_by_confidence(s)

        if pae:
            trim = False    # Cannot associate PAE if structure is trimmed

        if align_to is not None:
            from chimerax.alphafold.fetch import _align_and_trim, _log_chain_info
            _align_and_trim(models, align_to, trim)
            _log_chain_info(models, align_to.name)

        if add_to_session:
            session.models.add(models)

        if pae:
            from chimerax.alphafold.pae import alphafold_pae
            alphafold_pae(session, structure = models[0], uniprot_id = uniprot_id)

        return models, status

    from chimerax.alphafold import fetch
    fetch.alphafold_fetch = alphafold_fetch

    def alphafold_sequence_search(sequences, min_length=20, local=False, log=None):
        '''
        Search all AlphaFold database sequences using blat.
        Return best match uniprot ids.
        '''
        from chimerax.alphafold.search import _search_sequences_local, _search_sequences_web, _plural
        useqs = list(set(seq for seq in sequences if len(seq) >= min_length))
        if len(useqs) == 0:
            return [None] * len(sequences)

        if log is not None:
            log.status('Searching AlphaFold database for %d sequence%s'
                    % (len(useqs), _plural(useqs)))

        if local:
            seq_uniprot_ids = _search_sequences_local(useqs)
        else:
            session.logger.warning('ChimeraX AlphaFold sequence search currently only searches v2 of the '
                'AlphaFold sequence database, covering the original release of 40 reference genomes. '
                'To search the complete database, go to https://www.ebi.ac.uk/Tools/sss/fasta/ in your web browser'
                'and tick (only) "AlphaFold DB" under "Structures" in the first window. Due to the '
                'size of the database, expect that to take a while.')
            seq_uniprot_ids = _search_sequences_web(useqs)
        seq_uids = [seq_uniprot_ids.get(seq) for seq in sequences]

        return seq_uids

    from chimerax.alphafold import search
    search.alphafold_sequence_search = alphafold_sequence_search

________________________________
From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
Sent: 29 July 2022 23:36
Cc: goddard@cgl.ucsf.edu <goddard@cgl.ucsf.edu>; Tristan Croll <tic20@cam.ac.uk>
Subject: Re: [ChimeraX] #7346: Make alphafold fetch and pae commands work with version 3 of the AlphaFold database

#7346: Make alphafold fetch and pae commands work with version 3 of the AlphaFold
database
-------------------------------------------+-------------------------
          Reporter:  Tristan Croll         |      Owner:  Tom Goddard
              Type:  enhancement           |     Status:  assigned
          Priority:  normal                |  Milestone:
         Component:  Structure Prediction  |    Version:
        Resolution:                        |   Keywords:
        Blocked By:                        |   Blocking:
Notify when closed:                        |   Platform:  all
           Project:  ChimeraX              |
-------------------------------------------+-------------------------

Comment (by Tom Goddard):

 I've made another ticket #7358 to figure out how to update the ChimeraX
 sequence search capabilities for the new AlphaFold database version.

--
Ticket URL: <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rbvi.ucsf.edu%2Ftrac%2FChimeraX%2Fticket%2F7346%23comment%3A17&amp;data=05%7C01%7Ctic20%40universityofcambridgecloud.onmicrosoft.com%7Ce02af9f9477d43bed45b08da71b2e0b4%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637947310340592054%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=XKiqjY5XSn81P0o2VuEAYXM8DMHF9upl3DYqvP3N1D8%3D&amp;reserved=0>
ChimeraX <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rbvi.ucsf.edu%2Fchimerax%2F&amp;data=05%7C01%7Ctic20%40universityofcambridgecloud.onmicrosoft.com%7Ce02af9f9477d43bed45b08da71b2e0b4%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637947310340592054%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=%2Fec1d4eg9sO1HSjHWQIlx2arBl455FFgkUG7h1o8d6A%3D&amp;reserved=0>
ChimeraX Issue Tracker

comment:19 by Tom Goddard, 3 years ago

Cc: Elaine Meng Zach Pearson added
Resolution: fixed
Status: assignedclosed

Done.

ChimeraX daily builds now fetch from EBI Alphafold database version 3 (214 million structures) instead of version 2 (1 million structures) by default. Fetch from the version 2 database is still possible with the "alphafold fetch" or "alphafold search" command version option. Older ChimeraX daily builds and ChimeraX 1.4 will continue to use database version 2 because the PAE files changed format in version 3 and older ChimeraX could not read them.

The "alphafold match" command also uses database version 3 to do sequence searches and now uses the K-mer search method described here

https://www.rbvi.ucsf.edu/chimerax/data/kmer-aug2022/kmer_search.html

instead of using BLAT because BLAT was too slow with the larger version 3 database.

In the future when the AlphaFold database is updated again we hope older ChimeraX will be able to use the new database without updating ChimeraX. ChimeraX periodically checks the current version of the AlphaFold database and will be able to use a newer version as long as EBI does not change the file formats or method for accessing database files from EBI.

in reply to:  21 comment:20 by Elaine Meng, 3 years ago

Is the "alphafold search" default also changed to version 3?
Thx
Elaine

in reply to:  22 ; comment:21 by goddard@…, 3 years ago

Yes.  The default AlphaFold database is version 3 for all commands starting in tomorrow's daily build.  Older ChimeraX versions will continue to use version 2 forever.

in reply to:  23 ; comment:22 by goddard@…, 3 years ago

One exception is that ISOLDE for ChimeraX 1.4 may sneakily make ChimeraX 1.4 use the AlphaFold version 3 database.  But I think Tristan can document that in ISOLDE if he likes and it is probably not necessary for us to discuss that in the ChimeraX AlphaFold documentation.  Tristan did that because AlphaFold database models are very important from cryoEM model building and version 3 has many more models than version 2.
Note: See TracTickets for help on using tickets.