Opened 2 years ago
Last modified 17 months ago
#10017 assigned defect
Better separation between BLAST jobs and the GUI
Reported by: | Zach Pearson | Owned by: | Zach Pearson |
---|---|---|---|
Priority: | moderate | Milestone: | |
Component: | Infrastructure | Version: | |
Keywords: | Cc: | Tom Goddard | |
Blocked By: | Blocking: | ||
Notify when closed: | Platform: | all | |
Project: | ChimeraX |
Description
From Tom:
I was trying to BLAST 31 protein sequences today using some ChimeraX Python and analyze which PDB files contained more than one of these proteins - a pretty simple task. But I had to spend an hour hacking down blastprotein code to get something usable -- the current code seems to only be intended for use in the GUI context. It would be nice if sequence searches were easy to do from Python without the GUI. Below is code I slapped together by starting from blastprotein/job.py and deleting. Ideally there would be a Python API that just does a search and gives Python results in ChimeraX.
Change History (2)
comment:1 by , 17 months ago
Cc: | added |
---|
comment:2 by , 17 months ago
Not sure what recent BlastProtein changes you are refering to. The request in this ticket is that I be able to make a Python call that runs a BLAST search and have the Python call return the results in a way (e.g. a non-gui class instance) that can be used from Python. It seems this ticket doesn't have the code I referred to in the description that illustrates the hoops I had to jump through to run a job from Python. Here is that code. I expected to call an existing Python API to run a job and get the results, and not have to write a page of my own code. So whether this ticket can be closed depends on whether their is now a Python function to run a blast job and get back the results. It should be able to block until it completes, and a nice option would be to provide a callback to call on completion for non-blocking use.
from chimerax.webservices.cxservices_job import CxServicesJob class BlastProtein(CxServicesJob): inet_error = "Could not start BLAST job. Please check your internet connection and try again." service_name = "blast" def __init__(self, session, sequence, database = 'pdb', cutoff = 1e-3, matrix = 'BLOSUM62', max_seqs = 10000, version = None): super().__init__(session) self.sequence = sequence.replace('?', 'X') # string if self.sequence.count('X') == len(self.sequence): from chimerax.core.errors import UserError raise UserError("Sequence consists entirely of unknown amino acids.") self.database = database # string self.cutoff = cutoff # float self.matrix = matrix # string self.max_seqs = max_seqs # int if version is None: from chimerax.blastprotein.data_model import CurrentDBVersions version = CurrentDBVersions[self.database] self.version = version # DB Version self.params = { "db": self.database, "evalue": str(self.cutoff), "matrix": self.matrix, "blimit": str(self.max_seqs), "input_seq": self.sequence, "version": self.version } def blast(self): from urllib3.exceptions import MaxRetryError try: super().start(self.service_name, self.params, blocking = True) except MaxRetryError: self.session.logger.warning(self.inet_error) if not self.exited_normally(): from chimerax.core.errors import UserError raise UserError(f"BLAST job {self.id} failed") results = self.get_results() from chimerax.blastprotein.data_model import get_database blast_results = get_database(self.database) blast_results.parse("query", self.sequence, results) hits = [] for m in blast_results.parser.matches: name = m.match if m.match else m.name if name != 'query': hits.append((name, m.evalue, m.score, m.description)) return hits
Hi Tom,
Do you think recent changes to blastprotein close this ticket or is there more you'd like to see?