Opened 2 years ago

Last modified 17 months ago

#10017 assigned defect

Better separation between BLAST jobs and the GUI

Reported by: Zach Pearson Owned by: Zach Pearson
Priority: moderate Milestone:
Component: Infrastructure Version:
Keywords: Cc: Tom Goddard
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

From Tom:

I was trying to BLAST 31 protein sequences today using some ChimeraX Python and analyze which PDB files contained more than one of these proteins - a pretty simple task. But I had to spend an hour hacking down blastprotein code to get something usable -- the current code seems to only be intended for use in the GUI context. It would be nice if sequence searches were easy to do from Python without the GUI. Below is code I slapped together by starting from blastprotein/job.py and deleting. Ideally there would be a Python API that just does a search and gives Python results in ChimeraX.

Change History (2)

comment:1 by Zach Pearson, 17 months ago

Cc: Tom Goddard added

Hi Tom,

Do you think recent changes to blastprotein close this ticket or is there more you'd like to see?

comment:2 by Tom Goddard, 17 months ago

Not sure what recent BlastProtein changes you are refering to. The request in this ticket is that I be able to make a Python call that runs a BLAST search and have the Python call return the results in a way (e.g. a non-gui class instance) that can be used from Python. It seems this ticket doesn't have the code I referred to in the description that illustrates the hoops I had to jump through to run a job from Python. Here is that code. I expected to call an existing Python API to run a job and get the results, and not have to write a page of my own code. So whether this ticket can be closed depends on whether their is now a Python function to run a blast job and get back the results. It should be able to block until it completes, and a nice option would be to provide a callback to call on completion for non-blocking use.

from chimerax.webservices.cxservices_job import CxServicesJob
class BlastProtein(CxServicesJob):
   inet_error = "Could not start BLAST job. Please check your internet connection and try again."
   service_name = "blast"

   def __init__(self, session, sequence, database = 'pdb', cutoff = 1e-3, matrix = 'BLOSUM62',
                max_seqs = 10000, version = None):

       super().__init__(session)

       self.sequence = sequence.replace('?', 'X')                  # string
       if self.sequence.count('X') == len(self.sequence):
           from chimerax.core.errors import UserError
           raise UserError("Sequence consists entirely of unknown amino acids.")

       self.database = database                          # string
       self.cutoff = cutoff                              # float
       self.matrix = matrix                              # string
       self.max_seqs = max_seqs                          # int
       if version is None:
           from chimerax.blastprotein.data_model import CurrentDBVersions
           version = CurrentDBVersions[self.database]
       self.version = version                            # DB Version

       self.params = {
           "db": self.database,
           "evalue": str(self.cutoff),
           "matrix": self.matrix,
           "blimit": str(self.max_seqs),
           "input_seq": self.sequence,
           "version": self.version
       }

   def blast(self):
       from urllib3.exceptions import MaxRetryError
       try:
           super().start(self.service_name, self.params, blocking = True)
       except MaxRetryError:
           self.session.logger.warning(self.inet_error)

       if not self.exited_normally():
           from chimerax.core.errors import UserError
           raise UserError(f"BLAST job {self.id} failed")

       results = self.get_results()

       from chimerax.blastprotein.data_model import get_database
       blast_results = get_database(self.database)
       blast_results.parse("query", self.sequence, results)
       hits = []
       for m in blast_results.parser.matches:
           name = m.match if m.match else m.name
           if name != 'query':
               hits.append((name, m.evalue, m.score, m.description))
       return hits
Note: See TracTickets for help on using tickets.