Opened 4 years ago

Closed 4 years ago

#6348 closed defect (fixed)

Parsing BlastProtein results failed: 'hits'

Reported by: goddard@… Owned by: Zach Pearson
Priority: normal Milestone: 1.4
Component: Sequence Version:
Keywords: Cc:
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

The following bug report has been submitted:
Platform:        macOS-12.2.1-arm64-arm-64bit
ChimeraX Version: 1.4.dev202203071951 (2022-03-07 19:51:09 UTC)
Description
Doing an AlphaFold search from gui with 7PUA chain Ug I get this error pasing result "hits".

Log:
UCSF ChimeraX version: 1.4.dev202203071951 (2022-03-07)  
© 2016-2022 Regents of the University of California. All rights reserved.  
How to cite UCSF ChimeraX  

> open 7pua format mmcif fromDatabase pdb

Summary of feedback from opening 7pua fetched from pdb  
---  
warnings | Unknown polymer entity '88' near line 240142  
Missing or incomplete entity_poly_seq table. Inferred polymer connectivity.  
  
7pua title:  
Middle assembly intermediate of the Trypanosoma brucei mitoribosomal small
subunit [more info...]  
  
Chain information for 7pua #1  
---  
Chain | Description | UniProt  
CA | 9S rRNA |  
CB | 9S rRNA |  
CC | uS3m |  
CE | Ribosomal_S5_C domain-containing protein | D0A335_TRYB9  
CF | bS6m | Q38BW5_TRYB2  
CH | 30S ribosomal protein S8, putative | C9ZZU9_TRYB9  
CI | uS9m | Q57W62_TRYB2  
CJ | LysM domain-containing protein | C9ZPU0_TRYB9  
CK | uS11m | Q389T7_TRYB2  
CL | uS12m |  
CN | uS14m | Q580I0_TRYB2  
CO | uS15m | Q4GZ99_TRYB2  
CP | bS16m | Q384N9_TRYB2  
CQ | 30S Ribosomal protein S17, putative | C9ZYU9_TRYB9  
CR | bS18m | D0A3A4_TRYB9  
CS | uS19m | A0A3L6L621_9TRYP  
CU | bS21m | Q580M9_TRYB2  
Ca | mS22 | Q38DR3_TRYB2  
Cb | mS23 | C9ZNU0_TRYB9  
Cd | mS26 | Q38DK6_TRYB2  
Cg | mS29 | Q585C2_TRYB2  
Ci | mS33 | Q57WW0_TRYB2  
Cj | mS34 | Q57UK0_TRYB2  
Ck | mS35 | Q387C7_TRYB2  
Cm | mS37 | Q38C96_TRYB2  
Cn | mS38 | Q57VQ9_TRYB2  
Cp | Protein FYV4, mitochondrial | Q389L3_TRYB2  
DB | mS49 | C9ZJE4_TRYB9  
DC | mS50 | A0A3L6L3Q2_9TRYP  
DD | mS51 | D0A752_TRYB9  
DE | mS52 | Q386Q7_TRYB2  
DF | mS53 | C9ZXX4_TRYB9  
DG | mS54 | C9ZNY4_TRYB9  
DH | mS55 | A0A3L6LGC8_9TRYP  
DI | mS56 | A0A3L6L6C6_9TRYP  
DJ | mS57 | Q584U8_TRYB2  
DK | mS58 | A0A3L6L3U6_9TRYP  
DL | mS59 | D0A232_TRYB9  
DO | mS62 | Q383D1_TRYB2  
DP | mS63 | C9ZXR0_TRYB9  
DR | mS65 | C9ZPP1_TRYB9  
DT | Rhodanese domain-containing protein | Q586G5_TRYB2  
DU | Ubiquitin-like domain-containing protein | C9ZKC1_TRYB9  
DV | mS69 | Q57UZ6_TRYB2  
DW | mS70 | D0A8P6_TRYB9  
DX | mS71 | Q383G5_TRYB2  
DY | mS72 | Q57YD4_TRYB2  
DZ | mS73 | Q587C4_TRYB2  
Da | mS74 | A0A3L6KY67_9TRYP  
F2 | PPR_long domain-containing protein | C9ZIL7_TRYB9  
F3 | mt-SAF3 | Q38E61_TRYB2  
F5 | mt-SAF5 | D0AAF7_TRYB9  
F6 | DUF4460 domain-containing protein | Q38FQ8_TRYB2  
F7 | mt-SAF7 | Q57UW6_TRYB2  
F9 | mt-SAF9 | C9ZSL5_TRYB9  
FM FN | mt-SAF21 | C9ZJW2_TRYB9  
FO | mt-SAF22 | Q389F9_TRYB2  
FP | mt-SAF23 | D0A963_TRYB9  
FW | LMWPc domain-containing protein | Q38FI3_TRYB2  
FX | mt-SAF27 | Q57YK5_TRYB2  
FZ | mt-SAF29 | A0A3L6KWY8_9TRYP  
Fb | mt-SAF31 | Q581U4_TRYB2  
Fc | Acyl carrier protein | Q57WW9_TRYB2  
Fd | DUF4379 domain-containing protein | Q581T4_TRYB2  
Ff | DNA photolyase, putative | D0A9A9_TRYB9  
Fg | Acyl transferase-like protein, putative | C9ZYZ4_TRYB9  
Fh | mt-SAF37 | Q38A63_TRYB2  
Fi | mt-SAF38 | C9ZNX5_TRYB9  
IA | Translation initiation factor IF-2, putative | C9ZSN7_TRYB9  
IB | mt-SAF39 | D0A0V4_TRYB9  
U8 | Unk8 |  
UC | UnkC |  
UD | UnkD |  
UF | UnkF |  
UG | UnkG |  
UI | UnkI |  
UK | UnkK |  
UM UN UQ | Unk |  
UP | UnkP |  
Ua | Unka |  
Ug | Unkg |  
  
Non-standard residues in 7pua #1  
---  
ATP — adenosine-5'-triphosphate  
FDA — dihydroflavine-adenine dinucleotide  
MG — magnesium ion  
PM8 —
S-(2-{[N-(2-hydroxy-4-{[hydroxy(oxido)phosphino]oxy}-3,3-dimethylbutanoyl)-β-alanyl]amino}ethyl)
decanethioate  
PO4 — phosphate ion  
ZN — zinc ion  
  

> ui tool show AlphaFold

> alphafold match #1/Ug

No AlphaFold model with similar sequence for chain Ug  

Opened 0 AlphaFold model  

> alphafold search #1/Ug

Webservices job id:
72VCVYYLUXPSD1GMUU6A105YR17ES8ZQ8H8GQG37T10SPD347CXAGGCTQ0II3C21  
BlastProtein finished.  
Parsing BLAST results.  
Parsing BlastProtein results failed: 'hits'  




OpenGL version: 4.1 Metal - 76.3
OpenGL renderer: Apple M1 Max
OpenGL vendor: Apple

Locale: UTF-8
Qt version: PyQt6 6.2.2, Qt 6.2.2
Qt platform: cocoa
Hardware:

    Hardware Overview:

      Model Name: MacBook Pro
      Model Identifier: MacBookPro18,2
      Chip: Apple M1 Max
      Total Number of Cores: 10 (8 performance and 2 efficiency)
      Memory: 32 GB
      System Firmware Version: 7429.81.3
      OS Loader Version: 7429.81.3

Software:

    System Software Overview:

      System Version: macOS 12.2.1 (21D62)
      Kernel Version: Darwin 21.3.0
      Time since boot: 2 days 6:19

Graphics/Displays:

    Apple M1 Max:

      Chipset Model: Apple M1 Max
      Type: GPU
      Bus: Built-In
      Total Number of Cores: 32
      Vendor: Apple (0x106b)
      Metal Family: Supported, Metal GPUFamily Apple 7
      Displays:
        Color LCD:
          Display Type: Built-in Liquid Retina XDR Display
          Resolution: 3456 x 2234 Retina
          Main Display: Yes
          Mirror: Off
          Online: Yes
          Automatically Adjust Brightness: No
          Connection Type: Internal


Installed Packages:
    alabaster: 0.7.12
    appdirs: 1.4.4
    appnope: 0.1.2
    Babel: 2.9.1
    backcall: 0.2.0
    blockdiag: 3.0.0
    certifi: 2021.5.30
    charset-normalizer: 2.0.12
    ChimeraX-AddCharge: 1.2.3
    ChimeraX-AddH: 2.1.11
    ChimeraX-AlignmentAlgorithms: 2.0
    ChimeraX-AlignmentHdrs: 3.2.1
    ChimeraX-AlignmentMatrices: 2.0
    ChimeraX-Alignments: 2.2.3
    ChimeraX-AlphaFold: 1.0
    ChimeraX-AltlocExplorer: 1.0.1
    ChimeraX-AmberInfo: 1.0
    ChimeraX-Arrays: 1.0
    ChimeraX-Atomic: 1.36.2
    ChimeraX-AtomicLibrary: 6.1.1
    ChimeraX-AtomSearch: 2.0.1
    ChimeraX-AxesPlanes: 2.1
    ChimeraX-BasicActions: 1.1
    ChimeraX-BILD: 1.0
    ChimeraX-BlastProtein: 2.0
    ChimeraX-BondRot: 2.0
    ChimeraX-BugReporter: 1.0
    ChimeraX-BuildStructure: 2.6.1
    ChimeraX-Bumps: 1.0
    ChimeraX-BundleBuilder: 1.1
    ChimeraX-ButtonPanel: 1.0
    ChimeraX-CageBuilder: 1.0
    ChimeraX-CellPack: 1.0
    ChimeraX-Centroids: 1.2
    ChimeraX-ChemGroup: 2.0
    ChimeraX-Clashes: 2.2.2
    ChimeraX-ColorActions: 1.0
    ChimeraX-ColorGlobe: 1.0
    ChimeraX-ColorKey: 1.5.1
    ChimeraX-CommandLine: 1.2.1
    ChimeraX-ConnectStructure: 2.0.1
    ChimeraX-Contacts: 1.0
    ChimeraX-Core: 1.4.dev202203071951
    ChimeraX-CoreFormats: 1.1
    ChimeraX-coulombic: 1.3.2
    ChimeraX-Crosslinks: 1.0
    ChimeraX-Crystal: 1.0
    ChimeraX-CrystalContacts: 1.0
    ChimeraX-DataFormats: 1.2.2
    ChimeraX-Dicom: 1.0
    ChimeraX-DistMonitor: 1.1.5
    ChimeraX-Dssp: 2.0
    ChimeraX-ExperimentalCommands: 1.0
    ChimeraX-FileHistory: 1.0
    ChimeraX-FunctionKey: 1.0
    ChimeraX-Geometry: 1.1
    ChimeraX-gltf: 1.0
    ChimeraX-Graphics: 1.1
    ChimeraX-Hbonds: 2.1.2
    ChimeraX-Help: 1.2
    ChimeraX-HKCage: 1.3
    ChimeraX-IHM: 1.1
    ChimeraX-ImageFormats: 1.2
    ChimeraX-IMOD: 1.0
    ChimeraX-IO: 1.0.1
    ChimeraX-ItemsInspection: 1.0
    ChimeraX-Label: 1.1
    ChimeraX-ListInfo: 1.1.1
    ChimeraX-Log: 1.1.5
    ChimeraX-LookingGlass: 1.1
    ChimeraX-Maestro: 1.8.1
    ChimeraX-Map: 1.1
    ChimeraX-MapData: 2.0
    ChimeraX-MapEraser: 1.0
    ChimeraX-MapFilter: 2.0
    ChimeraX-MapFit: 2.0
    ChimeraX-MapSeries: 2.1
    ChimeraX-Markers: 1.0
    ChimeraX-Mask: 1.0
    ChimeraX-MatchMaker: 2.0.6
    ChimeraX-MDcrds: 2.6
    ChimeraX-MedicalToolbar: 1.0.1
    ChimeraX-Meeting: 1.0
    ChimeraX-MLP: 1.1
    ChimeraX-mmCIF: 2.7
    ChimeraX-MMTF: 2.1
    ChimeraX-Modeller: 1.5.2
    ChimeraX-ModelPanel: 1.3.2
    ChimeraX-ModelSeries: 1.0
    ChimeraX-Mol2: 2.0
    ChimeraX-Morph: 1.0
    ChimeraX-MouseModes: 1.1
    ChimeraX-Movie: 1.0
    ChimeraX-Neuron: 1.0
    ChimeraX-Nucleotides: 2.0.2
    ChimeraX-OpenCommand: 1.8
    ChimeraX-PDB: 2.6.6
    ChimeraX-PDBBio: 1.0
    ChimeraX-PDBLibrary: 1.0.2
    ChimeraX-PDBMatrices: 1.0
    ChimeraX-PickBlobs: 1.0
    ChimeraX-Positions: 1.0
    ChimeraX-PresetMgr: 1.1
    ChimeraX-PubChem: 2.1
    ChimeraX-ReadPbonds: 1.0.1
    ChimeraX-Registration: 1.1
    ChimeraX-RemoteControl: 1.0
    ChimeraX-ResidueFit: 1.0
    ChimeraX-RestServer: 1.1
    ChimeraX-RNALayout: 1.0
    ChimeraX-RotamerLibMgr: 2.0.1
    ChimeraX-RotamerLibsDunbrack: 2.0
    ChimeraX-RotamerLibsDynameomics: 2.0
    ChimeraX-RotamerLibsRichardson: 2.0
    ChimeraX-SaveCommand: 1.5
    ChimeraX-SchemeMgr: 1.0
    ChimeraX-SDF: 2.0
    ChimeraX-Segger: 1.0
    ChimeraX-Segment: 1.0
    ChimeraX-SelInspector: 1.0
    ChimeraX-SeqView: 2.4.7
    ChimeraX-Shape: 1.0.1
    ChimeraX-Shell: 1.0
    ChimeraX-Shortcuts: 1.1
    ChimeraX-ShowAttr: 1.0
    ChimeraX-ShowSequences: 1.0
    ChimeraX-SideView: 1.0
    ChimeraX-Smiles: 2.1
    ChimeraX-SmoothLines: 1.0
    ChimeraX-SpaceNavigator: 1.0
    ChimeraX-StdCommands: 1.8
    ChimeraX-STL: 1.0
    ChimeraX-Storm: 1.0
    ChimeraX-StructMeasure: 1.0.1
    ChimeraX-Struts: 1.0.1
    ChimeraX-Surface: 1.0
    ChimeraX-SwapAA: 2.0
    ChimeraX-SwapRes: 2.1.1
    ChimeraX-TapeMeasure: 1.0
    ChimeraX-Test: 1.0
    ChimeraX-Toolbar: 1.1
    ChimeraX-ToolshedUtils: 1.2.1
    ChimeraX-Tug: 1.0
    ChimeraX-UI: 1.16.2
    ChimeraX-uniprot: 2.2
    ChimeraX-UnitCell: 1.0
    ChimeraX-ViewDockX: 1.1.2
    ChimeraX-VIPERdb: 1.0
    ChimeraX-Vive: 1.1
    ChimeraX-VolumeMenu: 1.0
    ChimeraX-VTK: 1.0
    ChimeraX-WavefrontOBJ: 1.0
    ChimeraX-WebCam: 1.0
    ChimeraX-WebServices: 1.0
    ChimeraX-Zone: 1.0
    colorama: 0.4.4
    cxservices: 1.1
    cycler: 0.11.0
    Cython: 0.29.26
    debugpy: 1.5.1
    decorator: 5.1.1
    docutils: 0.17.1
    entrypoints: 0.4
    filelock: 3.4.2
    fonttools: 4.29.1
    funcparserlib: 1.0.0a0
    grako: 3.16.5
    html2text: 2020.1.16
    idna: 3.3
    ihm: 0.26
    imagesize: 1.3.0
    ipykernel: 6.6.1
    ipython: 7.31.1
    ipython-genutils: 0.2.0
    jedi: 0.18.1
    Jinja2: 3.0.3
    jupyter-client: 7.1.0
    jupyter-core: 4.9.2
    kiwisolver: 1.3.2
    line-profiler: 3.4.0
    lxml: 4.7.1
    lz4: 3.1.10
    MarkupSafe: 2.1.0
    matplotlib: 3.5.1
    matplotlib-inline: 0.1.3
    msgpack: 1.0.3
    nest-asyncio: 1.5.4
    networkx: 2.6.3
    numpy: 1.22.1
    openvr: 1.16.802
    packaging: 21.0
    ParmEd: 3.4.3
    parso: 0.8.3
    pexpect: 4.8.0
    pickleshare: 0.7.5
    Pillow: 9.0.0
    pip: 21.3.1
    pkginfo: 1.8.2
    prompt-toolkit: 3.0.28
    psutil: 5.9.0
    ptyprocess: 0.7.0
    pycollada: 0.7.2
    pydicom: 2.2.2
    Pygments: 2.11.2
    PyOpenGL: 3.1.5
    PyOpenGL-accelerate: 3.1.5
    pyparsing: 3.0.7
    PyQt6: 6.2.2
    PyQt6-Qt6: 6.2.3
    PyQt6-sip: 13.2.0
    PyQt6-WebEngine: 6.2.1
    PyQt6-WebEngine-Qt6: 6.2.3
    python-dateutil: 2.8.2
    pytz: 2021.3
    pyzmq: 22.3.0
    qtconsole: 5.2.2
    QtPy: 2.0.1
    requests: 2.27.1
    scipy: 1.7.3
    setuptools: 59.8.0
    six: 1.16.0
    snowballstemmer: 2.2.0
    sortedcontainers: 2.4.0
    Sphinx: 4.3.2
    sphinx-autodoc-typehints: 1.15.2
    sphinxcontrib-applehelp: 1.0.2
    sphinxcontrib-blockdiag: 3.0.0
    sphinxcontrib-devhelp: 1.0.2
    sphinxcontrib-htmlhelp: 2.0.0
    sphinxcontrib-jsmath: 1.0.1
    sphinxcontrib-qthelp: 1.0.3
    sphinxcontrib-serializinghtml: 1.1.5
    suds-community: 1.0.0
    tifffile: 2021.11.2
    tinyarray: 1.2.4
    tornado: 6.1
    traitlets: 5.1.1
    urllib3: 1.26.8
    wcwidth: 0.2.5
    webcolors: 1.11.1
    wheel: 0.37.1
    wheel-filename: 1.3.0

Change History (9)

comment:1 by Tom Goddard, 4 years ago

Component: UnassignedSequence
Milestone: 1.4
Owner: set to Zach Pearson
Platform: all
Project: ChimeraX
Status: newassigned
Summary: ChimeraX bug report submissionParsing BlastProtein results failed: 'hits'

This report is for a daily build. Also ChimeraX 1.3 gives the same error.

comment:2 by Zach Pearson, 4 years ago

I think this traceback occurs because there are no actual results.

If you look at the JSON output from blastp, we get the search query and right underneath:

"message": "Query_1 unnamed protein product: Query_1 unnamed protein product: Could not calculate ungapped Karlin-Altschul parameters due to an invalid query sequence or its translation. Please verify the query sequence(s) and/or filtering options No hits found \u000ANo hits found"

It's certainly a problem we don't handle this edge case; I'm on it.

in reply to:  3 ; comment:3 by goddard@…, 4 years ago

The warning message should probably say something like “BLAST search for #1/Ug found no matches.”

Also the logged messages after blast returns is currently

BlastProtein finished.
Parsing BLAST results.

It would be better if those only went to the status line and some more useful information went to the log like

“BLAST search for #1/Ug found 37 matches”

in reply to:  4 ; comment:4 by goddard@…, 4 years ago

The theory that the error message is because there are no BLAST hits sounds reasonable.  But other sequences with no blast hits such as 7qnq chain A does not give that error searching AlphaFold with Blast Protein, and instead gives a not useful empty Blast Protein Results table.  It would probably be better to not show the empty table and just log the message “BLAST search of #1/A found 0 matches” maybe as a warning so it is highlighted in red.

comment:5 by Zach Pearson, 4 years ago

That's odd, maybe if it runs and gets no hits we get an empty hits dict and if it runs and there's an error it gives us the message and nothing else. Can you send me the job ID you got for 7qnq/A so I can compare the results?

I agree about the status messages.

comment:6 by Zach Pearson, 4 years ago

OK, I got ahold of some 7qnq/A results and see that there is an empty 'hits' dict, but apparently if there's an error running BLAST it won't write that, just the search, stat, and message sections. I'll update the blastprotein code to handle that.

comment:7 by Zach Pearson, 4 years ago

Do you think the user will want to know, say, that the error message for the job was (in this case)

Could not calculate ungapped Karlin-Altschul parameters due to an invalid query sequence or its translation. Please verify the query sequence(s) and/or filtering options

or should we simply say "No hits found"?

in reply to:  8 ; comment:8 by goddard@…, 4 years ago

Always good to return the error information and show it to the user.  Otherwise they have no idea how to fix the problem.  What does that error mean though?  Maybe there is an illegal character (X?) in the sequence that ChimeraX needs to warn the user about.

comment:9 by Zach Pearson, 4 years ago

Resolution: fixed
Status: assignedclosed

This commit should fix the problem. We now check if the sequence against which a blastp search has been requested consists of all Xs, and warn the user we won't submit the job if that's the case.

Additionally, the error message above is returned by blastp when a highly repetitive sequence is submitted, for some definition of highly repetitive. If we see a number of reports where this happens I'll inspect the sequences to see how many Xs there were and determine a good threshold for warning the user in some way, e.g. "submitting highly repetitive sequence which may not return good data".

Note: See TracTickets for help on using tickets.