Opened 3 years ago

Closed 3 years ago

#7725 closed enhancement (fixed)

Update to AlphaFold database version 4

Reported by: Tom Goddard Owned by: Tom Goddard
Priority: moderate Milestone: 1.5
Component: Structure Prediction Version:
Keywords: Cc: Zach Pearson, Eric Pettersen
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

Begin forwarded message:

From: Sameer Velankar via pdb-l <pdb-l@…>
Subject: pdb-l: AlphaFold Protein Structure Database - releasing updated structures
Date: October 4, 2022 at 10:04:19 AM PDT
To: pdb-l@…
Cc: Tim Green <tfgg@…>
Reply-To: Sameer Velankar <sameer@…>

Dear All,

We plan to release an updated set of AlphaFold structure predictions on AFDB in the coming weeks. This will update a subset of predictions affected by a temporary numerical bug (miscompilation). This temporary issue resulted in low accuracy predictions with correspondingly low pLDDT for ~4% of the total structure predictions available in the database. As these predictions had low pLDDT, we hope that this bug will not have significantly impacted any work.

This does not impact the AlphaFold model or open source code.

We’ve made a full list of impacted structure predictions available at - https://ftp.ebi.ac.uk/pub/databases/alphafold/v3_affected_accessions.txt <https://ftp.ebi.ac.uk/pub/databases/alphafold/v3_affected_accessions.txt>.

This set of structures were part of the recent 200M expansion and didn’t impact any of the previous releases (model proteomes released in V1 and V2, or UniProtKB/Swiss-Prot released in V2).

The new release (V4) will include:
Updated coordinates for affected structures (~4% of total structures). You can still access all old coordinates as V3 files, and easily compare V3 and V4 coordinates.
Minor metadata changes in the mmCIF files for the rest of the structures (these files will be released as V4)

We’ll be in touch after releasing updated structures soon.

As always, we value community engagement and feedback and would like to thank Igor Tolstoy (NCBI) for surfacing the large pLDDT differences in highly similar sequences that alerted us to the issue. Please get in touch via alphafold@… <mailto:alphafold@…> if you have any feedback or require support with this change.

Best Wishes,
Sameer Velankar (EMBL-EBI) and Tim Green (DeepMind)
The archive of messages, sent to pdb-l@…, can be found at:

https://lists.wwpdb.org/empathy/list/pdb-l.lists.wwpdb.org

To subscribe via email, send a message with subject or body 'subscribe' to:

pdb-l-request@…

and follow the instructions in the newly received email.

To unsubscribe via email, send a message with subject or body 'unsubscribe' to:

pdb-l-request@…

and follow the instructions in the newly received email.

Change History (18)

comment:1 by Tom Goddard, 3 years ago

When the EBI updates the database we will make the necessary changes to use version 4. This should be pretty easy. Ideally we will do it before the 1.5 release. But in theory the changes all can be done by modifying web server files. In past version updates that theory has failed, because EBI also changed the naming of files. But we will see what they do.

It sounds like they will not change any of the sequences or uniprot identifiers for the entries so hopefully we will not need to update our AlphaFold DB sequence database for BLAST searches. But I think the BLAST web service code will need to be updated to say it is searching the version 4 database in the results it returns.

comment:2 by Tom Goddard, 3 years ago

EBI released version 4 of the AlphaFold database on November 1, 2022 (https://ftp.ebi.ac.uk/pub/databases/alphafold/CHANGELOG.txt).

Unfortunately they deleted about 1000 sequences because they had bad predicted structures (CA-CA distance > 10 Angstroms). So I need to rebuild the BLAST and kmer search databases. I've started downloading the sequences fasta file (92 Gbytes) on plato to

/wynton/group/ferrin/databases/mol/AlphaFold/v4/alphafold.fasta

but it is transferring slowly 400 KB/sec and will take 3 days if it succeeds. I added instructions about how to rebuild the databases in

/wynton/group/ferrin/databases/mol/AlphaFold/README

and will do that once the sequences download. Then we will need to update to version 4 in the ChimeraX code and the BLAST backend code.

I tested fetching a version 4 prediction and PAE file using the alphafold fetch "version 4" option and PAE gui and it all worked so I don't expect problems with version 4 files.

comment:3 by Tom Goddard, 3 years ago

Zach can you update the BLAST web service back end to use AlphaFold database version 4?

Here is the change you made when you updated the AlphaFold database to version 3

https://github.com/RBVI/cxwebservices/commit/3e11ab174dbe8d590519e5cb546efa54d408f530

Let me know when the update is done and I will test. Thanks.

comment:4 by Zach Pearson, 3 years ago

I bumped the default database to 4 and restarted the test backend.

in reply to:  5 ; comment:5 by goddard@…, 3 years ago

How is the test backend used?

comment:6 by Zach Pearson, 3 years ago

Your question prompted me to make the prereqs/cxservices Makefile a little more ergonomic. First pull the develop branch. Then:

CXSERVICES_DEPLOYMENT_VER=test make app-reinstall

comment:7 by Tom Goddard, 3 years ago

Tested and updated alphafold database kmer search cgi script to use version 4 of the database. Update is on preview web site, will be on production site tomorrow.

comment:8 by Zach Pearson, 3 years ago

Let me know when you're ready for me to move webservices-test code to webservices

comment:9 by Tom Goddard, 3 years ago

After reinstalling prereqs/cxservices using

CXSERVICES_DEPLOYMENT_VER=test make app-reinstall

I get the following error trying to do a blast search of the alphafold database. I am on the CGL VPN.

alphafold search MIRSKEPHNNLCLLYNQGLMPYLDAHRWQRSLLNERIHDPSLDDVLILLEHPPVYTLGQGSNSDFIKFDIDQGEYDVHRVERGGEVTYHCPGQLVGYPILNLQRYRKDLHWYLRQLEEVIIRVLTVYGLQGERIPAFTGVWLQGRKVAAIGIKVSRWITMHGFALNVCPDMKGFERIVPCGISDKPVGSLAEWIPGITCQEVRFYVAQCFAEVFGVELIESQPQDFFRPE version 4
/Users/goddard/ucsf/chimerax/ChimeraX.app/Contents/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/urllib3/connectionpool.py:1045: InsecureRequestWarning: Unverified HTTPS request is being made to host 'webservices-test.rbvi.ucsf.edu'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
warnings.warn(
Exception in thread Thread-14:
Traceback (most recent call last):
File "/Users/goddard/ucsf/chimerax/ChimeraX.app/Contents/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/threading.py", line 973, in _bootstrap_inner
self.run()
File "/Users/goddard/ucsf/chimerax/ChimeraX.app/Contents/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/threading.py", line 910, in run
self._target(*self._args, **self._kwargs)
File "/Users/goddard/ucsf/chimerax/ChimeraX.app/Contents/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/chimerax/core/tasks.py", line 213, in _run_thread
self.run(*args, **kw)
File "/Users/goddard/ucsf/chimerax/ChimeraX.app/Contents/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/chimerax/webservices/cxservices_job.py", line 123, in run
reason = json.loads(e.body)['description']
File "/Users/goddard/ucsf/chimerax/ChimeraX.app/Contents/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "/Users/goddard/ucsf/chimerax/ChimeraX.app/Contents/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/Users/goddard/ucsf/chimerax/ChimeraX.app/Contents/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

comment:10 by Tom Goddard, 3 years ago

Zach are you looking at the test cxservices problem?

comment:11 by Zach Pearson, 3 years ago

Yes, have been since your reply. Sorry for the lack of an update; I didn't want to bother you until I had something. I fixed the immediate problem giving tracebacks in ChimeraX, but now it waits indefinitely for the job to finish since the redis workers are dropping out whenever jobs are enqueued.

in reply to:  12 ; comment:12 by Tom Goddard, 3 years ago

Thanks!  Let me know when I can test AlphaFold database version 4 blast.

comment:13 by Zach Pearson, 3 years ago

I have no idea why, but the test server is working again -- at least for AlphaFold version 3. Go ahead and test version 4.

comment:14 by Tom Goddard, 3 years ago

In #8002 I tested a blast search of version 4 and the backend seemed to work but the sequence title line had the wrong format, apparently because makeblastdb mangled the fasta file names, probably because of the --parse_seqids option. Probably need to rebuild the blast database without that option.

Yes the --parse_seqids munges the title lines so I have started rebuilding the AlphaFold version 4 database, takes about 5 hours due to plato ultra slow network disks.

Last edited 3 years ago by Tom Goddard (previous) (diff)

comment:15 by Tom Goddard, 3 years ago

I tested blast search with the AlphaFold database version 4 and it is working with the test server. Zach can you push the server-side change to production to allow using version 4 database? Thanks! I will update the blast client side to use version 4 once the production server has the change.

comment:16 by Tom Goddard, 3 years ago

By the way, using the test server spews an ssl warning 511 times, once each second for the 8.5 minute blast search. The SSL warning is not a problem since this is a test server. But querying whether the jobs is done over 500 times seems not great since every AlphaFold database search is going to take over 5 minutes. Might be best to check every 10 or 20 seconds for AlphaFold blast. But PDB blast should be every second.

open 7t0a

alphafold search /A version 4

/Users/goddard/ucsf/chimerax/ChimeraX.app/Contents/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/urllib3/connectionpool.py:1045: InsecureRequestWarning: Unverified HTTPS request is being made to host 'webservices-test.rbvi.ucsf.edu'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
warnings.warn(
Webservices job id: QQWMX2L5NCC26E3G
/Users/goddard/ucsf/chimerax/ChimeraX.app/Contents/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/urllib3/connectionpool.py:1045: InsecureRequestWarning: Unverified HTTPS request is being made to host 'webservices-test.rbvi.ucsf.edu'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
warnings.warn(
[Repeated 511 time(s)]

comment:17 by Zach Pearson, 3 years ago

I opened a new ticket to track improvements to job status polling here: https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/8006#ticket with you as the reporter.

I also updated the backend.

comment:18 by Tom Goddard, 3 years ago

Resolution: fixed
Status: assignedclosed

Fixed in 1.5 and daily build.

I've updated the alphafold and blastprotein bundles to use AlphaFold database version 4 by default and updated the web file (/usr/local/projects/chimerax/www/data/status/alphafold_database3.json) on plato queried by older ChimeraX to also use version 4.

Tested AlphaFold blast and kmer search and blast protein gui, all working with version 4.

Note: See TracTickets for help on using tickets.