Opened 3 years ago
Last modified 3 years ago
#7354 assigned enhancement
AlphaFold database "version" number differs from "release" number causing confusion
| Reported by: | Tom Goddard | Owned by: | Tom Goddard |
|---|---|---|---|
| Priority: | moderate | Milestone: | |
| Component: | Structure Prediction | Version: | |
| Keywords: | Cc: | Elaine Meng, Tristan Croll | |
| Blocked By: | Blocking: | ||
| Notify when closed: | Platform: | all | |
| Project: | ChimeraX |
Description
AlphaFold database has had 4 releases numbered 1,2,3,4 on their FAQ (https://alphafold.ebi.ac.uk) but those correspond to versions 1,2,3 with releases 2 and 3 both using version number 2. The version number is used in the file names and is what is needed to locate database files. But users are most likely to see the release numbers and think that is the version number.
Probably the best we can do is use version number since release number cannot be used to fetch files, document this behavior, and advise EBI to make their version numbers match their release numbers in the future.
From: Elaine Meng Subject: Re: [ChimeraX] #7352: alphafold fetch command ignores version option when fetching PAE data Date: July 29, 2022 at 12:06:05 PM PDT To: Tom Goddard Thanks -- Might be a little tricky to document but I'll figure something out. Currently "my" manpage gives this list of versions, but then the 200-million-seq one would be version 4, is that right? The database contains models for protein sequences in UniProt (single-chain models only, not complexes): version 1 – predictions for ~360,000 sequences, reference proteomes of 21 species including Homo sapiens; used by ChimeraX 1.3 version 2 – predictions for ~800,000 sequences, v1 + most of the manually curated entries in UniProt version 3 – predictions for ~1 million sequences, v2 + sequences relevant to neglected tropical disease or antimicrobial resistance; used by ChimeraX 1.4+ Also I need to correct the above to say that alphafold match uses version2, not version 3. I'm assuming version 3 is the default for alphafold fetch/pae unless you tell me otherwise. Elaine
Begin forwarded message: From: Tom Goddard Subject: Re: [ChimeraX] #7352: alphafold fetch command ignores version option when fetching PAE data Date: July 29, 2022 at 12:15:03 PM PDT To: Elaine Meng Hi Elaine, It is worse than you think! The EBI AlphaFold database has had 4 releases as described in their FAQ "Which proteins are included?" at the bottom of their main web page https://alphafold.ebi.ac.uk. But the version numbers on the files are 1, 2, and 3 with 2 being used for both releases 2 and 3. Yeah, I see this is totally screwed up. A release 4 Alphafold DB file (200 million entries) has a file name like AF-A0A4T0DZS4-F1-model_v3.cif that ends in "_v3", ie version 3. But any user is only likely to see the on web pages the "release" number. Can we slap EBI a few times? At any rate I think we need to stick to the "version" number, because the files are found by version number, not by release number. And the current version number in ChimeraX 1.4 and daily builds is version 2 (1 million sequences including releases 2 and 3). Tom