Opened 3 years ago
Last modified 15 months ago
#7387 reopened enhancement
BLAST of Uniref50, UniRef90, UniRef100 appears to be using old databases from 2012
| Reported by: | Tom Goddard | Owned by: | Zach Pearson |
|---|---|---|---|
| Priority: | high | Milestone: | |
| Component: | Sequence | Version: | |
| Keywords: | Cc: | Elaine Meng, Eric Pettersen, Greg Couch, Scooter | |
| Blocked By: | Blocking: | ||
| Notify when closed: | Platform: | all | |
| Project: | ChimeraX |
Description (last modified by )
Not sure if these are the uniref databases being used by blast protein. They are from 2012. The UniRef100 file is 8 Gbytes, while the current UniRef100 is 83 Gbytes. So these old databases only have 1/10 of the sequences. They should be updated.
Change History (26)
comment:1 by , 3 years ago
comment:2 by , 3 years ago
| Cc: | added |
|---|
comment:4 by , 3 years ago
| Cc: | added; removed |
|---|---|
| Owner: | changed from to |
comment:5 by , 3 years ago
My user account is not allowed to write files in that directory. Additionally, that script doesn't exist(!)
Reassigning to Scooter.
follow-up: 6 comment:6 by , 3 years ago
The uniref databases can be downloaded by ftp here, uniref100, uniref90 and uniref50 are each a single large gziped fasta file. Then a simple makeblastdb command makes the database files. The database directory is owned by sacsdb with group sacs and is not writable by group sacs. $ ls -ld /databases/mol/blast/db_uniref drwxr-xr-x. 2 sacsdb sacs 42 Jul 28 2012 /databases/mol/blast/db_uniref
comment:7 by , 3 years ago
| Priority: | moderate → high |
|---|
comment:8 by , 3 years ago
| Milestone: | → 1.5 |
|---|
comment:9 by , 3 years ago
Newer data, from 16 June 2021, is in /wynton/group/databases/UniProt/uniref/uniref{100,50,90}.
comment:11 by , 3 years ago
| Cc: | removed |
|---|---|
| Owner: | changed from to |
comment:12 by , 3 years ago
I've re-created the missing get_uniref_blast script and am testing it now.
comment:13 by , 3 years ago
| Milestone: | 1.6 |
|---|
comment:15 by , 16 months ago
Did these databases ever get updated?? Easy for me to say, I guess, but using 2012 versions seems like a pretty bad thing that should be relatively easy to ameliorate. Elaine
comment:16 by , 15 months ago
| Resolution: | → fixed |
|---|---|
| Status: | assigned → closed |
As a side effect of all the databases on plato getting nuked, they all got updated and the scripts that update them were fixed.
comment:17 by , 15 months ago
BLAST is not working for uniref, nr, or esmfold and those databases don't appear to be in blast/db_current on plato.
comment:18 by , 15 months ago
| Resolution: | fixed |
|---|---|
| Status: | closed → reopened |
yes as Tom G said, I also got failures for uniref(N) and esmfold today. The others may be OK now, I could search pdb, and alphafold, and nr also is running now that Scooter has gotten it into place, although I may not wait around for that job to complete. Tested in UCSF ChimeraX version: 1.9.dev202408060516 (2024-08-06)
comment:19 by , 15 months ago
| Cc: | added |
|---|
The error in the logs is:
BLAST Database error: No alias or index file found for protein database [/databases/mol/blast/db/uniref100]
Here's all the files in /databases/mol/blast/db containing the string uniref:
03:39:29 zjp@crick cxservices → ls /databases/mol/blast/db | grep uniref lrwxrwxrwx. 1 sacsdb sacs 28 Aug 3 17:38 makeblastdb.log -> ../db_uniref/makeblastdb.log lrwxrwxrwx. 1 sacsdb sacs 28 Aug 3 17:38 uniref100.fasta -> ../db_uniref/uniref100.fasta lrwxrwxrwx. 1 sacsdb sacs 32 Aug 3 17:38 uniref100.fasta.pdb -> ../db_uniref/uniref100.fasta.pdb lrwxrwxrwx. 1 sacsdb sacs 37 Aug 3 17:38 uniref100.fasta.pdb-lock -> ../db_uniref/uniref100.fasta.pdb-lock lrwxrwxrwx. 1 sacsdb sacs 27 Aug 3 17:38 uniref50.fasta -> ../db_uniref/uniref50.fasta lrwxrwxrwx. 1 sacsdb sacs 31 Aug 3 17:38 uniref50.fasta.pdb -> ../db_uniref/uniref50.fasta.pdb lrwxrwxrwx. 1 sacsdb sacs 36 Aug 3 17:38 uniref50.fasta.pdb-lock -> ../db_uniref/uniref50.fasta.pdb-lock lrwxrwxrwx. 1 sacsdb sacs 27 Aug 3 17:38 uniref90.fasta -> ../db_uniref/uniref90.fasta lrwxrwxrwx. 1 sacsdb sacs 31 Aug 3 17:38 uniref90.fasta.pdb -> ../db_uniref/uniref90.fasta.pdb lrwxrwxrwx. 1 sacsdb sacs 36 Aug 3 17:38 uniref90.fasta.pdb-lock -> ../db_uniref/uniref90.fasta.pdb-lock
comment:21 by , 15 months ago
Those "pdb" files are the database, supposedly. At least that's what makeblastdb outputs. I'll look into it a bit more.
comment:22 by , 15 months ago
Maybe I need to append '.fasta' to the names of those databases in webservices like I do for AlphaFold and ESMFold.
Also, it's not really readily apparent why requests for v0 of ESMFold try to look for it in a folder called ESMFold/v1, so I just symlinked v1 to v0.
comment:23 by , 15 months ago
OK, so the problem seems to be beegfs. I can build the blast database in /tmp, but not directly in /databases. Still looking at it.
comment:25 by , 15 months ago
Thanks, Scooter! I just verified that I can blast esmfold and uniref50 (assuming the other unirefs will be the same). However, only the uniref50 job returned results even though the task manager reports both are finished. Here are the IDs in case they are useful in debugging. Or maybe it's known that esmfold won't really work yet. blastprotein /A database esmfold Webservices job id: OPBU9S578S8KRPNX blastprotein /A database uniref50 Webservices job id: JANU7RB1SUM2ZGUX As an aside, it might be useful to enhance blastprotein to report something about database version or when it was last updated, but I have no idea of feasibility/difficulty so I'll leave it up to others to ticket or ignore. Elaine
comment:26 by , 15 months ago
OK, in today's test I was able to blast esmfold and get results, thanks! Can this be closed? Feel free to close it if everything in this ticket is done. Elaine
Forgot to include the path to the 2012 databases on plato