Opened 3 years ago

Last modified 3 years ago

#7388 assigned enhancement

BLAST of NR database uses 180 Gbytes of resident memory on server

Reported by: Tom Goddard Owned by: Zach Pearson
Priority: moderate Milestone:
Component: Sequence Version:
Keywords: Cc: chimerax-programmers, Scooter Morris
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

This ticket is about high server memory use by BLAST searches of NR database. I'm not sure there is anything simple we can do about it. Mostly it is good to know about the problem in case we experience server problems.

A single BLAST sequence search of the NR (non-redundant) database uses 180 Gbytes of resident memory on the server (franklin.cgl.ucsf.edu) and takes 50 minutes searching a 268 amino acid sequence (PDB 7u0u chain A). This is using quite large compute resources, about half of the 384 Gbytes of server memory. Here is the output of "top" near the completion of the job.

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                  
102779 apache    20   0  247.7g 180.6g 180.6g D  94.9 48.0  49:37.17 blastp                                   

I'm not sure what would happen if a few NR searches ran at the same time. Maybe the server would function poorly with lots of thrashing. Blast is using 1 thread.

I've been looking at how to do sequence searches of the AlphaFold database of 214 million sequences (ticket #7358) which is large (93 GB fasta file, and 65 GB blast database), but not as large as NR which has 179 GB blast database (nr*.psq files in /databases/mol/blast/db_current on plato). It did not find any blastp command-line options that can limit the memory use. I also tested BLAT on plato with AlphaFold database which also brought the whole database (93 GB) resident in memory at the same time and is used by the ChimeraX "alphafold match" command. BLAT also has no options to reduce memory use. I also tested mmseqs2 which used even more memory (280 GB) using index files about 7 times larger than the sequence data. mmseqs2 has some options that may limit memory use. I'm noting what I find about those AlphaFold database search methods in ticket #7358.

Change History (2)

comment:1 by Zach Pearson, 3 years ago

There may not be a way to limit memory usage in BLAST, but Linux definitely lets us limit a program's memory usage.

Scooter or Greg: is there a viable path towards using a cgroup to limit the memory BLAST uses?
https://unix.stackexchange.com/questions/44985/limit-memory-usage-for-a-single-linux-process

in reply to:  2 ; comment:2 by goddard@…, 3 years ago

BLAST is going to use nearly the same amount of memory on every run, dependent on just the size of the database being searched.  If you limit its memory it will just kill the process.  All that will achieve is you that blast will never work.  The only viable solution would be if blast has options that make it use less memory.

Version 0, edited 3 years ago by goddard@… (next)
Note: See TracTickets for help on using tickets.