Opened 2 years ago

Last modified 18 months ago

#9999 assigned defect

BlastProtein returns more results than the 100 maximum requested

Reported by: goddard@… Owned by: Zach Pearson
Priority: normal Milestone:
Component: Sequence Version:
Keywords: Cc:
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

The following bug report has been submitted:
Platform:        macOS-13.5.2-x86_64-i386-64bit
ChimeraX Version: 1.7.dev202310140743 (2023-10-14 07:43:20 UTC)
Description
Blast protein returns more hits than the maximum specified 100, in the case of PDB 6c0w chain B it returns 1452 hits.

Log:
Could not find tool "Tabbed Toolbar"  
UCSF ChimeraX version: 1.7.dev202310140743 (2023-10-14)  
© 2016-2023 Regents of the University of California. All rights reserved.  
How to cite UCSF ChimeraX  

> open 6c0w

6c0w title:  
Cryo-EM structure of human kinetochore protein CENP-N with the centromeric
nucleosome containing CENP-A [more info...]  
  
Chain information for 6c0w #1  
---  
Chain | Description | UniProt  
A E | Histone H3-like centromeric protein A | CENPA_HUMAN 1-140  
B F | Histone H4 | H4_HUMAN 0-101  
C G | Histone H2A | H2A1C_HUMAN 0-129  
D H | Histone H2B | H2B1C_HUMAN 0-125  
I | 147 mer DNA |  
J | 147 mer DNA |  
K | Centromere protein N | CENPN_HUMAN 1-289  
  

> ui tool show "Blast Protein"

> blastprotein /B database pdb cutoff 1e-3 matrix BLOSUM62 maxSeqs 100 version
> None name bp1

Webservices job id: 750O6W4OEP3CV1ZQ  




OpenGL version: 4.1 ATI-4.14.1
OpenGL renderer: AMD Radeon Pro 580 OpenGL Engine
OpenGL vendor: ATI Technologies Inc.

Python: 3.11.2
Locale: UTF-8
Qt version: PyQt6 6.3.1, Qt 6.3.1
Qt runtime version: 6.3.2
Qt platform: cocoa
Hardware:

    Hardware Overview:

      Model Name: iMac
      Model Identifier: iMac18,3
      Processor Name: Quad-Core Intel Core i7
      Processor Speed: 4.2 GHz
      Number of Processors: 1
      Total Number of Cores: 4
      L2 Cache (per Core): 256 KB
      L3 Cache: 8 MB
      Hyper-Threading Technology: Enabled
      Memory: 32 GB
      System Firmware Version: 515.0.0.0.0
      OS Loader Version: 577.140.2~15
      SMC Version (system): 2.41f2

Software:

    System Software Overview:

      System Version: macOS 13.5.2 (22G91)
      Kernel Version: Darwin 22.6.0
      Time since boot: 38 days, 17 hours, 31 minutes

Graphics/Displays:

    Radeon Pro 580:

      Chipset Model: Radeon Pro 580
      Type: GPU
      Bus: PCIe
      PCIe Lane Width: x16
      VRAM (Total): 8 GB
      Vendor: AMD (0x1002)
      Device ID: 0x67df
      Revision ID: 0x00c0
      ROM Revision: 113-D000AA-931
      VBIOS Version: 113-D0001A1X-025
      EFI Driver Version: 01.00.931
      Metal Support: Metal 2
      Displays:
        iMac:
          Display Type: Built-In Retina LCD
          Resolution: Retina 5K (5120 x 2880)
          Framebuffer Depth: 30-Bit Color (ARGB2101010)
          Main Display: Yes
          Mirror: Off
          Online: Yes
          Automatically Adjust Brightness: Yes
          Connection Type: Internal


Installed Packages:
    alabaster: 0.7.13
    appdirs: 1.4.4
    appnope: 0.1.3
    asttokens: 2.4.0
    Babel: 2.13.0
    backcall: 0.2.0
    beautifulsoup4: 4.11.2
    blockdiag: 3.0.0
    blosc2: 2.0.0
    build: 0.10.0
    certifi: 2022.12.7
    cftime: 1.6.2
    charset-normalizer: 3.3.0
    ChimeraX-AddCharge: 1.5.12
    ChimeraX-AddH: 2.2.5
    ChimeraX-AlignmentAlgorithms: 2.0.1
    ChimeraX-AlignmentHdrs: 3.4.1
    ChimeraX-AlignmentMatrices: 2.1
    ChimeraX-Alignments: 2.12.1
    ChimeraX-AlphaFold: 1.0
    ChimeraX-AltlocExplorer: 1.1.1
    ChimeraX-AmberInfo: 1.0
    ChimeraX-Arrays: 1.1
    ChimeraX-Atomic: 1.49
    ChimeraX-AtomicLibrary: 11.0
    ChimeraX-AtomSearch: 2.0.1
    ChimeraX-AxesPlanes: 2.3.2
    ChimeraX-BasicActions: 1.1.2
    ChimeraX-BILD: 1.0
    ChimeraX-BlastProtein: 2.1.2
    ChimeraX-BondRot: 2.0.4
    ChimeraX-BugReporter: 1.0.1
    ChimeraX-BuildStructure: 2.10.5
    ChimeraX-Bumps: 1.0
    ChimeraX-BundleBuilder: 1.2.2
    ChimeraX-ButtonPanel: 1.0.1
    ChimeraX-CageBuilder: 1.0.1
    ChimeraX-CellPack: 1.0
    ChimeraX-Centroids: 1.3.2
    ChimeraX-ChangeChains: 1.1
    ChimeraX-CheckWaters: 1.3.1
    ChimeraX-ChemGroup: 2.0.1
    ChimeraX-Clashes: 2.2.4
    ChimeraX-ColorActions: 1.0.3
    ChimeraX-ColorGlobe: 1.0
    ChimeraX-ColorKey: 1.5.4
    ChimeraX-CommandLine: 1.2.5
    ChimeraX-ConnectStructure: 2.0.1
    ChimeraX-Contacts: 1.0.1
    ChimeraX-Core: 1.7.dev202310140743
    ChimeraX-CoreFormats: 1.2
    ChimeraX-coulombic: 1.4.2
    ChimeraX-Crosslinks: 1.0
    ChimeraX-Crystal: 1.0
    ChimeraX-CrystalContacts: 1.0.1
    ChimeraX-DataFormats: 1.2.3
    ChimeraX-Dicom: 1.2
    ChimeraX-DistMonitor: 1.4
    ChimeraX-DockPrep: 1.1.2
    ChimeraX-Dssp: 2.0
    ChimeraX-EMDB-SFF: 1.0
    ChimeraX-ESMFold: 1.0
    ChimeraX-FileHistory: 1.0.1
    ChimeraX-FunctionKey: 1.0.1
    ChimeraX-Geometry: 1.3
    ChimeraX-gltf: 1.0
    ChimeraX-Graphics: 1.1.1
    ChimeraX-Hbonds: 2.4
    ChimeraX-Help: 1.2.2
    ChimeraX-HKCage: 1.3
    ChimeraX-IHM: 1.1
    ChimeraX-ImageFormats: 1.2
    ChimeraX-IMOD: 1.0
    ChimeraX-IO: 1.0.1
    ChimeraX-ItemsInspection: 1.0.1
    ChimeraX-IUPAC: 1.0
    ChimeraX-Label: 1.1.8
    ChimeraX-ListInfo: 1.2.1
    ChimeraX-Log: 1.1.6
    ChimeraX-LookingGlass: 1.1
    ChimeraX-Maestro: 1.9.1
    ChimeraX-Map: 1.1.4
    ChimeraX-MapData: 2.0
    ChimeraX-MapEraser: 1.0.1
    ChimeraX-MapFilter: 2.0.1
    ChimeraX-MapFit: 2.0
    ChimeraX-MapSeries: 2.1.1
    ChimeraX-Markers: 1.0.1
    ChimeraX-Mask: 1.0.2
    ChimeraX-MatchMaker: 2.1.2
    ChimeraX-MCopy: 1.0
    ChimeraX-MDcrds: 2.6
    ChimeraX-MedicalToolbar: 1.0.2
    ChimeraX-Meeting: 1.0.1
    ChimeraX-MLP: 1.1.1
    ChimeraX-mmCIF: 2.12.1
    ChimeraX-MMTF: 2.2
    ChimeraX-Modeller: 1.5.12
    ChimeraX-ModelPanel: 1.4
    ChimeraX-ModelSeries: 1.0.1
    ChimeraX-Mol2: 2.0.3
    ChimeraX-Mole: 1.0
    ChimeraX-Morph: 1.0.2
    ChimeraX-MouseModes: 1.2
    ChimeraX-Movie: 1.0
    ChimeraX-Neuron: 1.0
    ChimeraX-Nifti: 1.1
    ChimeraX-NIHPresets: 1.1.9
    ChimeraX-NRRD: 1.1
    ChimeraX-Nucleotides: 2.0.3
    ChimeraX-OpenCommand: 1.13
    ChimeraX-PDB: 2.7.2
    ChimeraX-PDBBio: 1.0.1
    ChimeraX-PDBLibrary: 1.0.2
    ChimeraX-PDBMatrices: 1.0
    ChimeraX-PickBlobs: 1.0.1
    ChimeraX-Positions: 1.0
    ChimeraX-PresetMgr: 1.1
    ChimeraX-PubChem: 2.1
    ChimeraX-ReadPbonds: 1.0.1
    ChimeraX-Registration: 1.1.2
    ChimeraX-RemoteControl: 1.0
    ChimeraX-RenderByAttr: 1.1
    ChimeraX-RenumberResidues: 1.1
    ChimeraX-ResidueFit: 1.0.1
    ChimeraX-RestServer: 1.2
    ChimeraX-RNALayout: 1.0
    ChimeraX-RotamerLibMgr: 4.0
    ChimeraX-RotamerLibsDunbrack: 2.0
    ChimeraX-RotamerLibsDynameomics: 2.0
    ChimeraX-RotamerLibsRichardson: 2.0
    ChimeraX-SaveCommand: 1.5.1
    ChimeraX-SchemeMgr: 1.0
    ChimeraX-SDF: 2.0.1
    ChimeraX-Segger: 1.0
    ChimeraX-Segment: 1.0.1
    ChimeraX-SelInspector: 1.0
    ChimeraX-SeqView: 2.11
    ChimeraX-Shape: 1.0.1
    ChimeraX-Shell: 1.0.1
    ChimeraX-Shortcuts: 1.1.1
    ChimeraX-ShowSequences: 1.0.2
    ChimeraX-SideView: 1.0.1
    ChimeraX-Smiles: 2.1.2
    ChimeraX-SmoothLines: 1.0
    ChimeraX-SpaceNavigator: 1.0
    ChimeraX-StdCommands: 1.12.2
    ChimeraX-STL: 1.0.1
    ChimeraX-Storm: 1.0
    ChimeraX-StructMeasure: 1.1.2
    ChimeraX-Struts: 1.0.1
    ChimeraX-Surface: 1.0.1
    ChimeraX-SwapAA: 2.0.1
    ChimeraX-SwapRes: 2.2.2
    ChimeraX-TapeMeasure: 1.0
    ChimeraX-TaskManager: 1.0
    ChimeraX-Test: 1.0
    ChimeraX-Toolbar: 1.1.2
    ChimeraX-ToolshedUtils: 1.2.4
    ChimeraX-Topography: 1.0
    ChimeraX-ToQuest: 1.0
    ChimeraX-Tug: 1.0.1
    ChimeraX-UI: 1.33.1
    ChimeraX-uniprot: 2.3
    ChimeraX-UnitCell: 1.0.1
    ChimeraX-ViewDockX: 1.3.2
    ChimeraX-VIPERdb: 1.0
    ChimeraX-Vive: 1.1
    ChimeraX-VolumeMenu: 1.0.1
    ChimeraX-VTK: 1.0
    ChimeraX-WavefrontOBJ: 1.0
    ChimeraX-WebCam: 1.0.2
    ChimeraX-WebServices: 1.1.2
    ChimeraX-Zone: 1.0.1
    colorama: 0.4.6
    comm: 0.1.4
    contourpy: 1.1.1
    cxservices: 1.2.2
    cycler: 0.12.1
    Cython: 0.29.33
    debugpy: 1.8.0
    decorator: 5.1.1
    docutils: 0.19
    executing: 2.0.0
    filelock: 3.9.0
    fonttools: 4.43.1
    funcparserlib: 1.0.1
    glfw: 2.6.2
    grako: 3.16.5
    h5py: 3.10.0
    html2text: 2020.1.16
    idna: 3.4
    ihm: 0.38
    imagecodecs: 2023.9.18
    imagesize: 1.4.1
    ipykernel: 6.23.2
    ipython: 8.14.0
    ipython-genutils: 0.2.0
    ipywidgets: 8.1.1
    jedi: 0.18.2
    Jinja2: 3.1.2
    jupyter-client: 8.2.0
    jupyter-core: 5.4.0
    jupyterlab-widgets: 3.0.9
    kiwisolver: 1.4.5
    line-profiler: 4.0.2
    lxml: 4.9.2
    lz4: 4.3.2
    MarkupSafe: 2.1.3
    matplotlib: 3.7.2
    matplotlib-inline: 0.1.6
    msgpack: 1.0.4
    nest-asyncio: 1.5.8
    netCDF4: 1.6.2
    networkx: 3.1
    nibabel: 5.0.1
    nptyping: 2.5.0
    numexpr: 2.8.7
    numpy: 1.25.1
    openvr: 1.23.701
    packaging: 21.3
    ParmEd: 3.4.3
    parso: 0.8.3
    pep517: 0.13.0
    pexpect: 4.8.0
    pickleshare: 0.7.5
    Pillow: 10.0.1
    pip: 23.0
    pkginfo: 1.9.6
    platformdirs: 3.11.0
    prompt-toolkit: 3.0.39
    psutil: 5.9.5
    ptyprocess: 0.7.0
    pure-eval: 0.2.2
    py-cpuinfo: 9.0.0
    pycollada: 0.7.2
    pydicom: 2.3.0
    Pygments: 2.16.1
    pynrrd: 1.0.0
    PyOpenGL: 3.1.7
    PyOpenGL-accelerate: 3.1.7
    pyopenxr: 1.0.2801
    pyparsing: 3.0.9
    pyproject-hooks: 1.0.0
    PyQt6-commercial: 6.3.1
    PyQt6-Qt6: 6.3.2
    PyQt6-sip: 13.4.0
    PyQt6-WebEngine-commercial: 6.3.1
    PyQt6-WebEngine-Qt6: 6.3.2
    python-dateutil: 2.8.2
    pytz: 2023.3.post1
    pyzmq: 25.1.1
    qtconsole: 5.4.3
    QtPy: 2.4.0
    RandomWords: 0.4.0
    requests: 2.31.0
    scipy: 1.11.1
    setuptools: 67.4.0
    setuptools-scm: 7.0.5
    sfftk-rw: 0.7.3
    six: 1.16.0
    snowballstemmer: 2.2.0
    sortedcontainers: 2.4.0
    soupsieve: 2.5
    sphinx: 6.1.3
    sphinx-autodoc-typehints: 1.22
    sphinxcontrib-applehelp: 1.0.7
    sphinxcontrib-blockdiag: 3.0.0
    sphinxcontrib-devhelp: 1.0.5
    sphinxcontrib-htmlhelp: 2.0.4
    sphinxcontrib-jsmath: 1.0.1
    sphinxcontrib-qthelp: 1.0.6
    sphinxcontrib-serializinghtml: 1.1.9
    stack-data: 0.6.3
    superqt: 0.5.0
    tables: 3.8.0
    tcia-utils: 1.5.1
    tifffile: 2023.7.18
    tinyarray: 1.2.4
    tomli: 2.0.1
    tornado: 6.3.3
    traitlets: 5.9.0
    typing-extensions: 4.8.0
    tzdata: 2023.3
    urllib3: 2.0.6
    wcwidth: 0.2.8
    webcolors: 1.12
    wheel: 0.38.4
    wheel-filename: 1.4.1
    widgetsnbextension: 4.0.9

Change History (6)

comment:1 by Tom Goddard, 2 years ago

Component: UnassignedSequence
Owner: set to Zach Pearson
Platform: all
Project: ChimeraX
Status: newassigned
Summary: ChimeraX bug report submissionBlastProtein returns more results than the 100 maximum requested

comment:2 by Zach Pearson, 23 months ago

Looks like this only affects the command. When I run a BLAST job through the GUI, I get 100 results back. Interesting, and a good lead!

comment:3 by Zach Pearson, 23 months ago

I was too optimistic and only got 100 back by luck on some other query.

comment:4 by Tom Goddard, 23 months ago

The example given this ticket, PDB 6c0w, chain B, gives 1480 results back. Maybe it is limited to 100 PDB entries, but it doesn't seem like that since that would mean on average every PDB had 14 sequence matches. It may be the identical sequences in different PDB entries count as only one hit. I suspect it is probably working correctly, only we don't understand what the maximum sequence option in blastp is doing. The goal may be to find what that option actually does and document it.

comment:5 by Zach Pearson, 18 months ago

I'm not sure I understand what max_target_seqs does. I did a BLAST of 6c0w/B limited to 100 entries, and got the same number of results you did. When I looked on the server, however, the BLAST JSON results file has 85 "hit" objects, and each hit list several different PDB IDs. For example, the first "hit" object has 7XVL_{B,F,L,P,V,Z,f,j}, 7XVM_{B,F,L,P}, 7XX5_{B,F,L,P}, 7XX6_{B,F,L,P,V,Z,f,j}, and 7XX7_{B,F,L,P} -- that's 24 matches! They have identical taxids (9606), and then the "hsps" list gives them all the same number, bit_score, score, evalue, etc.

comment:6 by Tom Goddard, 18 months ago

My guess his that max_target_seqs means the maximum number of unique sequences. Probably many PDB chains have exactly the same sequence so all those contribute just 1 sequence toward the limit. Especially with multiple chains in the same PDB entry (e.g. chain ids A,B,C...) for multimeric structures with many copies of the same protein, there will be separate match for each chain but just one sequence hit. We should try to figure out if that is the way it counts them and document it. No need to offer a different maximum that would strictly count chains I think -- best to stick what BLAST itself does.

Note: See TracTickets for help on using tickets.