Opened 3 years ago
Last modified 3 years ago
#7388 assigned enhancement
BLAST of NR database uses 180 Gbytes of resident memory on server
| Reported by: | Tom Goddard | Owned by: | Zach Pearson |
|---|---|---|---|
| Priority: | moderate | Milestone: | |
| Component: | Sequence | Version: | |
| Keywords: | Cc: | chimerax-programmers, Scooter Morris | |
| Blocked By: | Blocking: | ||
| Notify when closed: | Platform: | all | |
| Project: | ChimeraX |
Description
This ticket is about high server memory use by BLAST searches of NR database. I'm not sure there is anything simple we can do about it. Mostly it is good to know about the problem in case we experience server problems.
A single BLAST sequence search of the NR (non-redundant) database uses 180 Gbytes of resident memory on the server (franklin.cgl.ucsf.edu) and takes 50 minutes searching a 268 amino acid sequence (PDB 7u0u chain A). This is using quite large compute resources, about half of the 384 Gbytes of server memory. Here is the output of "top" near the completion of the job.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 102779 apache 20 0 247.7g 180.6g 180.6g D 94.9 48.0 49:37.17 blastp
I'm not sure what would happen if a few NR searches ran at the same time. Maybe the server would function poorly with lots of thrashing. Blast is using 1 thread.
I've been looking at how to do sequence searches of the AlphaFold database of 214 million sequences (ticket #7358) which is large (93 GB fasta file, and 65 GB blast database), but not as large as NR which has 179 GB blast database (nr*.psq files in /databases/mol/blast/db_current on plato). It did not find any blastp command-line options that can limit the memory use. I also tested BLAT on plato with AlphaFold database which also brought the whole database (93 GB) resident in memory at the same time and is used by the ChimeraX "alphafold match" command. BLAT also has no options to reduce memory use. I also tested mmseqs2 which used even more memory (280 GB) using index files about 7 times larger than the sequence data. mmseqs2 has some options that may limit memory use. I'm noting what I find about those AlphaFold database search methods in ticket #7358.
Change History (2)
comment:1 by , 3 years ago
follow-up: 2 comment:2 by , 3 years ago
BLAST is going to use nearly the same amount of memory on every run, dependent on just the size of the database being searched. If you limit its memory it will just kill the process. All that will achieve is that blast will never work. The only viable solution would be if blast has options that make it use less memory.
There may not be a way to limit memory usage in BLAST, but Linux definitely lets us limit a program's memory usage.
Scooter or Greg: is there a viable path towards using a cgroup to limit the memory BLAST uses?
https://unix.stackexchange.com/questions/44985/limit-memory-usage-for-a-single-linux-process