Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#7435 closed enhancement (fixed)

Speed up Blast Protein search by using 4 threads

Reported by: Tom Goddard Owned by: Zach Pearson
Priority: moderate Milestone:
Component: Sequence Version:
Keywords: Cc:
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

Have blastp use 4 threads instead of the default 1 thread when running a Blast Protein search of the large databases, NR and AlphaFold v3. Probably would be nice (and simpler) to use 4 threads for all of the databases. The main effect may be parallelizing the database file read speed on the slow beegfs file system. The plato nodes have 56 cores and I think it is rare to see more than one blast job running on plato at a time.

This sped up the sequence search by a factor of 4 in my tests. It requires adding the "--num_threads 4" option to the blastp command in run_job() in file
cxwebservices/task_runners/blast.py

I tested blastp run on watson with ChimeraX defaults on a length 268 sequence (7u0u chain A) searching the AlphaFold version 3 database. Took 19 minutes with 1 thread, 5 minutes with 4 threads.

Change History (3)

comment:1 by Zach Pearson, 3 years ago

Resolution: fixed
Status: assignedclosed

I've made the change. :) Glad you found a way to make this faster.

comment:2 by Tom Goddard, 3 years ago

Beautiful! I just tested with 7u0u chain A searching AlphaFold database version 3 and now takes under 5 minutes while before it was 19 minutes.

comment:3 by Tom Goddard, 3 years ago

Searching 7u0u chain A against NR database took 20 minutes and 30 seconds. I believe this took 50 minutes before. So a little more than a factor of 2 speed up.

Note: See TracTickets for help on using tickets.