Opened 4 years ago

Closed 4 years ago

#6483 closed defect (fixed)

Blast alphafold search fails: invalid literal for int() with base 10

Reported by: goddard@… Owned by: Zach Pearson
Priority: high Milestone:
Component: Sequence Version:
Keywords: Cc:
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

The following bug report has been submitted:
Platform:        macOS-12.3-arm64-arm-64bit
ChimeraX Version: 1.4.dev202203271748 (2022-03-27 17:48:18 UTC)
Description
AlphaFold search failing parsing results in BlastProtein.
Would like to demo this tomorrow morning.  Frustrating.

Log:
UCSF ChimeraX version: 1.4.dev202203271748 (2022-03-27)  
© 2016-2022 Regents of the University of California. All rights reserved.  
How to cite UCSF ChimeraX  

> ui tool show AlphaFold

> alphafold search
> MQPPPPGPLGDCLRDWEDLQQDFQNIQETHRLYRLKLEELTKLQNNCTSSITRQKKRLQELALALKKCKPSLPAEAEGAAQELENQMKERQGLFFDMEAYLPKKNGLYLSLVLGNVNVTLLSKQAKFAYKDEYEKFKLYLTIILILISFTCRFLLNSRVTDAAFNFLLVWYYCTLTIRESILINNGSRIKGWWVFHHYVSTFLSGVMLTWPDGLMYQKFRNQFLSFSMYQSFVQFLQYYYQSGCLYRLRALGERHTMDLTVEGFQSWMWRGLTFLLPFLFFGHFWQLFNALTLFNLAQDPQCKEWQVLMCGFPFLLLFLGNFFTTLRVVHHKFHSQRHGSKKD

Webservices job id:
FGEISJUY6534B266VO5Q0XRRPXIG4XO9FWY09P9M0KYAVNTOMHI7H9L9SOAWE2JU  

Parsing BlastProtein results failed: invalid literal for int() with base 10:
''  




OpenGL version: 4.1 Metal - 76.3
OpenGL renderer: Apple M1 Max
OpenGL vendor: Apple

Locale: UTF-8
Qt version: PyQt6 6.2.3, Qt 6.2.3
Qt platform: cocoa
Hardware:

    Hardware Overview:

      Model Name: MacBook Pro
      Model Identifier: MacBookPro18,2
      Chip: Apple M1 Max
      Total Number of Cores: 10 (8 performance and 2 efficiency)
      Memory: 32 GB
      System Firmware Version: 7459.101.2
      OS Loader Version: 7459.101.2

Software:

    System Software Overview:

      System Version: macOS 12.3 (21E230)
      Kernel Version: Darwin 21.4.0
      Time since boot: 11 days 15:28

Graphics/Displays:

    Apple M1 Max:

      Chipset Model: Apple M1 Max
      Type: GPU
      Bus: Built-In
      Total Number of Cores: 32
      Vendor: Apple (0x106b)
      Metal Family: Supported, Metal GPUFamily Apple 7
      Displays:
        Color LCD:
          Display Type: Built-in Liquid Retina XDR Display
          Resolution: 3456 x 2234 Retina
          Main Display: Yes
          Mirror: Off
          Online: Yes
          Automatically Adjust Brightness: No
          Connection Type: Internal


Installed Packages:
    alabaster: 0.7.12
    appdirs: 1.4.4
    appnope: 0.1.2
    Babel: 2.9.1
    backcall: 0.2.0
    blockdiag: 3.0.0
    certifi: 2021.10.8
    charset-normalizer: 2.0.12
    ChimeraX-AddCharge: 1.2.3
    ChimeraX-AddH: 2.1.11
    ChimeraX-AlignmentAlgorithms: 2.0
    ChimeraX-AlignmentHdrs: 3.2.1
    ChimeraX-AlignmentMatrices: 2.0
    ChimeraX-Alignments: 2.3
    ChimeraX-AlphaFold: 1.0
    ChimeraX-AltlocExplorer: 1.0.1
    ChimeraX-AmberInfo: 1.0
    ChimeraX-Arrays: 1.0
    ChimeraX-Atomic: 1.36.3
    ChimeraX-AtomicLibrary: 6.1.1
    ChimeraX-AtomSearch: 2.0.1
    ChimeraX-AxesPlanes: 2.1
    ChimeraX-BasicActions: 1.1
    ChimeraX-BILD: 1.0
    ChimeraX-BlastProtein: 2.0
    ChimeraX-BondRot: 2.0
    ChimeraX-BugReporter: 1.0
    ChimeraX-BuildStructure: 2.6.1
    ChimeraX-Bumps: 1.0
    ChimeraX-BundleBuilder: 1.1
    ChimeraX-ButtonPanel: 1.0
    ChimeraX-CageBuilder: 1.0
    ChimeraX-CellPack: 1.0
    ChimeraX-Centroids: 1.2
    ChimeraX-ChemGroup: 2.0
    ChimeraX-Clashes: 2.2.2
    ChimeraX-ColorActions: 1.0
    ChimeraX-ColorGlobe: 1.0
    ChimeraX-ColorKey: 1.5.1
    ChimeraX-CommandLine: 1.2.2
    ChimeraX-ConnectStructure: 2.0.1
    ChimeraX-Contacts: 1.0
    ChimeraX-Core: 1.4.dev202203271748
    ChimeraX-CoreFormats: 1.1
    ChimeraX-coulombic: 1.3.2
    ChimeraX-Crosslinks: 1.0
    ChimeraX-Crystal: 1.0
    ChimeraX-CrystalContacts: 1.0
    ChimeraX-DataFormats: 1.2.2
    ChimeraX-Dicom: 1.0
    ChimeraX-DistMonitor: 1.1.5
    ChimeraX-Dssp: 2.0
    ChimeraX-ExperimentalCommands: 1.0
    ChimeraX-FileHistory: 1.0
    ChimeraX-FunctionKey: 1.0
    ChimeraX-Geometry: 1.1
    ChimeraX-gltf: 1.0
    ChimeraX-Graphics: 1.1
    ChimeraX-Hbonds: 2.1.2
    ChimeraX-Help: 1.2
    ChimeraX-HKCage: 1.3
    ChimeraX-IHM: 1.1
    ChimeraX-ImageFormats: 1.2
    ChimeraX-IMOD: 1.0
    ChimeraX-IO: 1.0.1
    ChimeraX-ItemsInspection: 1.0
    ChimeraX-Label: 1.1
    ChimeraX-ListInfo: 1.1.1
    ChimeraX-Log: 1.1.5
    ChimeraX-LookingGlass: 1.1
    ChimeraX-Maestro: 1.8.1
    ChimeraX-Map: 1.1
    ChimeraX-MapData: 2.0
    ChimeraX-MapEraser: 1.0
    ChimeraX-MapFilter: 2.0
    ChimeraX-MapFit: 2.0
    ChimeraX-MapSeries: 2.1
    ChimeraX-Markers: 1.0
    ChimeraX-Mask: 1.0
    ChimeraX-MatchMaker: 2.0.6
    ChimeraX-MDcrds: 2.6
    ChimeraX-MedicalToolbar: 1.0.1
    ChimeraX-Meeting: 1.0
    ChimeraX-MLP: 1.1
    ChimeraX-mmCIF: 2.7
    ChimeraX-MMTF: 2.1
    ChimeraX-Modeller: 1.5.5
    ChimeraX-ModelPanel: 1.3.2
    ChimeraX-ModelSeries: 1.0
    ChimeraX-Mol2: 2.0
    ChimeraX-Morph: 1.0
    ChimeraX-MouseModes: 1.1
    ChimeraX-Movie: 1.0
    ChimeraX-Neuron: 1.0
    ChimeraX-Nucleotides: 2.0.2
    ChimeraX-OpenCommand: 1.8
    ChimeraX-PDB: 2.6.6
    ChimeraX-PDBBio: 1.0
    ChimeraX-PDBLibrary: 1.0.2
    ChimeraX-PDBMatrices: 1.0
    ChimeraX-PickBlobs: 1.0
    ChimeraX-Positions: 1.0
    ChimeraX-PresetMgr: 1.1
    ChimeraX-PubChem: 2.1
    ChimeraX-ReadPbonds: 1.0.1
    ChimeraX-Registration: 1.1
    ChimeraX-RemoteControl: 1.0
    ChimeraX-ResidueFit: 1.0
    ChimeraX-RestServer: 1.1
    ChimeraX-RNALayout: 1.0
    ChimeraX-RotamerLibMgr: 2.0.1
    ChimeraX-RotamerLibsDunbrack: 2.0
    ChimeraX-RotamerLibsDynameomics: 2.0
    ChimeraX-RotamerLibsRichardson: 2.0
    ChimeraX-SaveCommand: 1.5
    ChimeraX-SchemeMgr: 1.0
    ChimeraX-SDF: 2.0
    ChimeraX-Segger: 1.0
    ChimeraX-Segment: 1.0
    ChimeraX-SelInspector: 1.0
    ChimeraX-SeqView: 2.5
    ChimeraX-Shape: 1.0.1
    ChimeraX-Shell: 1.0
    ChimeraX-Shortcuts: 1.1
    ChimeraX-ShowAttr: 1.0
    ChimeraX-ShowSequences: 1.0
    ChimeraX-SideView: 1.0
    ChimeraX-Smiles: 2.1
    ChimeraX-SmoothLines: 1.0
    ChimeraX-SpaceNavigator: 1.0
    ChimeraX-StdCommands: 1.8
    ChimeraX-STL: 1.0
    ChimeraX-Storm: 1.0
    ChimeraX-StructMeasure: 1.0.1
    ChimeraX-Struts: 1.0.1
    ChimeraX-Surface: 1.0
    ChimeraX-SwapAA: 2.0
    ChimeraX-SwapRes: 2.1.1
    ChimeraX-TapeMeasure: 1.0
    ChimeraX-Test: 1.0
    ChimeraX-Toolbar: 1.1
    ChimeraX-ToolshedUtils: 1.2.1
    ChimeraX-Tug: 1.0
    ChimeraX-UI: 1.16.3
    ChimeraX-uniprot: 2.2
    ChimeraX-UnitCell: 1.0
    ChimeraX-ViewDockX: 1.1.2
    ChimeraX-VIPERdb: 1.0
    ChimeraX-Vive: 1.1
    ChimeraX-VolumeMenu: 1.0
    ChimeraX-VTK: 1.0
    ChimeraX-WavefrontOBJ: 1.0
    ChimeraX-WebCam: 1.0
    ChimeraX-WebServices: 1.0
    ChimeraX-Zone: 1.0
    colorama: 0.4.4
    cxservices: 1.1
    cycler: 0.11.0
    Cython: 0.29.26
    debugpy: 1.5.1
    decorator: 5.1.1
    docutils: 0.17.1
    entrypoints: 0.4
    filelock: 3.4.2
    fonttools: 4.31.2
    funcparserlib: 1.0.0a0
    grako: 3.16.5
    html2text: 2020.1.16
    idna: 3.3
    ihm: 0.27
    imagesize: 1.3.0
    ipykernel: 6.6.1
    ipython: 7.31.1
    ipython-genutils: 0.2.0
    jedi: 0.18.1
    Jinja2: 3.0.3
    jupyter-client: 7.1.0
    jupyter-core: 4.9.2
    kiwisolver: 1.4.0
    line-profiler: 3.4.0
    lxml: 4.7.1
    lz4: 3.1.10
    MarkupSafe: 2.1.1
    matplotlib: 3.5.1
    matplotlib-inline: 0.1.3
    msgpack: 1.0.3
    nest-asyncio: 1.5.4
    networkx: 2.6.3
    numpy: 1.22.1
    openvr: 1.16.802
    packaging: 21.0
    ParmEd: 3.4.3
    parso: 0.8.3
    pexpect: 4.8.0
    pickleshare: 0.7.5
    Pillow: 9.0.1
    pip: 21.3.1
    pkginfo: 1.8.2
    prompt-toolkit: 3.0.28
    psutil: 5.9.0
    ptyprocess: 0.7.0
    pycollada: 0.7.2
    pydicom: 2.2.2
    Pygments: 2.11.2
    PyOpenGL: 3.1.5
    PyOpenGL-accelerate: 3.1.5
    pyparsing: 3.0.7
    PyQt6: 6.2.3
    PyQt6-Qt6: 6.2.4
    PyQt6-sip: 13.2.0
    PyQt6-WebEngine: 6.2.1
    PyQt6-WebEngine-Qt6: 6.2.4
    python-dateutil: 2.8.2
    pytz: 2022.1
    pyzmq: 22.3.0
    qtconsole: 5.2.2
    QtPy: 2.0.1
    requests: 2.27.1
    scipy: 1.7.3
    setuptools: 59.8.0
    six: 1.16.0
    snowballstemmer: 2.2.0
    sortedcontainers: 2.4.0
    Sphinx: 4.3.2
    sphinx-autodoc-typehints: 1.15.2
    sphinxcontrib-applehelp: 1.0.2
    sphinxcontrib-blockdiag: 3.0.0
    sphinxcontrib-devhelp: 1.0.2
    sphinxcontrib-htmlhelp: 2.0.0
    sphinxcontrib-jsmath: 1.0.1
    sphinxcontrib-qthelp: 1.0.3
    sphinxcontrib-serializinghtml: 1.1.5
    suds-community: 1.0.0
    tifffile: 2021.11.2
    tinyarray: 1.2.4
    tornado: 6.1
    traitlets: 5.1.1
    urllib3: 1.26.9
    wcwidth: 0.2.5
    webcolors: 1.11.1
    wheel: 0.37.1
    wheel-filename: 1.3.0

Change History (9)

comment:1 by Tom Goddard, 4 years ago

Component: UnassignedSequence
Owner: set to Zach Pearson
Platform: all
Priority: normalhigh
Project: ChimeraX
Status: newassigned
Summary: ChimeraX bug report submissionBlast alphafold search fails: invalid literal for int() with base 10

comment:2 by Zach Pearson, 4 years ago

Found the line I haven't been parsing correctly. Working to fix now.

comment:3 by Zach Pearson, 4 years ago

Not sure why, but if you inspect the results from your job:

            {
              "num": 27,
              "description": [
                {
                  "id": "gnl|BL_ORD_ID|746227",
                  "accession": "746227",
                  "title": "xx|Q6ETC8|deleted deleted OS=deleted"
                }
              ],
              "len": 354,
              "hsps": [
                {
                  "num": 1,
                  "bit_score": 117.857,
                  "score": 294,
                  "evalue": 1.0104e-28,
                  "identity": 89,
                  "positive": 150,
                  "query_from": 55,
                  "query_to": 343,
                  "hit_from": 41,
                  "hit_to": 347,
                  "align_len": 309,
                  "gaps": 22,
                  "qseq": "KKRLQELALALKKCKPSLPAEAEGAAQELENQMKERQGLFF---DMEAYLP-KKNGLYLSLVLGNVNVTLLSKQAKFAYKDEYEKFK---LYLTIILILISFTCRFLLNSRVTDA----AFNFLLVWYYCTLTIRESILINNGSRIKGWWVFHHYVSTFLSGVMLTW-----PDG
                  "hseq": "RRRATALDADLRRLQASLSTLAPTTLDKVEEEL-ERARVTISDSDVAAFLPSKRNGKFLKTFVGPVNVRVARKEDKLRVKDEYNNYRDRAAYMFLLFPSILLLLRWWIWDGCLPALAVQMYQAWLLFLYTSFALRENVLIVNGSDIRPWWIYHHYLAMLMALVSLTWEIKGQPDC
                  "midline": "++R   L   L++ + SL   A     ++E ++ ER  +     D+ A+LP K+NG +L   +G VNV +  K+ K   KDEY  ++    Y+ ++   I    R+ +      A     +   L++ Y +  +RE++LI NGS I+ WW++HHY++  ++ V LTW
                }
              ]
            },

it's that "xx|Q6ETC8|deleted deleted OS=deleted" line that's causing trouble. Perhaps we can exclude that result if we come across lines like it.

in reply to:  4 ; comment:4 by goddard@…, 4 years ago

Those hits should not be omitted.  They don't have a current uniprot sequence and that is why some fields say deleted.  There are a few thousand of these in the alphafold database.

in reply to:  5 ; comment:5 by goddard@…, 4 years ago

I made those title lines in the AlphaFold blast database and tried to put them in the same format so parsing code would not get tripped up.  The parsing code should be very robust and substitute blanks fields if it does not find what it expects.

comment:6 by Zach Pearson, 4 years ago

Sure, I agree. Just wrapping the offending code in a try-except is enough to get the results to finish and display -- we just have to agree on how to interpret 'deleted' in the description and the species field. If it doesn't have a uniprot sequence, "No sequence"?

in reply to:  7 ; comment:7 by goddard@…, 4 years ago

If we don't know the species I think the best choice is to leave it blank.  Same with other fields where we don't know the value.

comment:8 by Zach Pearson, 4 years ago

OK, got it so that unknown fields are blanked and if we don't have something like 'Q93XX0_ARATH' in the title's third field (split by pipe symbols) we would take the 'Q93XX0' part that appears in the title's second field.

comment:9 by Zach Pearson, 4 years ago

Resolution: fixed
Status: assignedclosed

Should be fixed by this commit.

Note: See TracTickets for help on using tickets.