Opened 2 years ago
Last modified 18 months ago
#9999 assigned defect
BlastProtein returns more results than the 100 maximum requested
Reported by: | Owned by: | Zach Pearson | |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Sequence | Version: | |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Notify when closed: | Platform: | all | |
Project: | ChimeraX |
Description
The following bug report has been submitted: Platform: macOS-13.5.2-x86_64-i386-64bit ChimeraX Version: 1.7.dev202310140743 (2023-10-14 07:43:20 UTC) Description Blast protein returns more hits than the maximum specified 100, in the case of PDB 6c0w chain B it returns 1452 hits. Log: Could not find tool "Tabbed Toolbar" UCSF ChimeraX version: 1.7.dev202310140743 (2023-10-14) © 2016-2023 Regents of the University of California. All rights reserved. How to cite UCSF ChimeraX > open 6c0w 6c0w title: Cryo-EM structure of human kinetochore protein CENP-N with the centromeric nucleosome containing CENP-A [more info...] Chain information for 6c0w #1 --- Chain | Description | UniProt A E | Histone H3-like centromeric protein A | CENPA_HUMAN 1-140 B F | Histone H4 | H4_HUMAN 0-101 C G | Histone H2A | H2A1C_HUMAN 0-129 D H | Histone H2B | H2B1C_HUMAN 0-125 I | 147 mer DNA | J | 147 mer DNA | K | Centromere protein N | CENPN_HUMAN 1-289 > ui tool show "Blast Protein" > blastprotein /B database pdb cutoff 1e-3 matrix BLOSUM62 maxSeqs 100 version > None name bp1 Webservices job id: 750O6W4OEP3CV1ZQ OpenGL version: 4.1 ATI-4.14.1 OpenGL renderer: AMD Radeon Pro 580 OpenGL Engine OpenGL vendor: ATI Technologies Inc. Python: 3.11.2 Locale: UTF-8 Qt version: PyQt6 6.3.1, Qt 6.3.1 Qt runtime version: 6.3.2 Qt platform: cocoa Hardware: Hardware Overview: Model Name: iMac Model Identifier: iMac18,3 Processor Name: Quad-Core Intel Core i7 Processor Speed: 4.2 GHz Number of Processors: 1 Total Number of Cores: 4 L2 Cache (per Core): 256 KB L3 Cache: 8 MB Hyper-Threading Technology: Enabled Memory: 32 GB System Firmware Version: 515.0.0.0.0 OS Loader Version: 577.140.2~15 SMC Version (system): 2.41f2 Software: System Software Overview: System Version: macOS 13.5.2 (22G91) Kernel Version: Darwin 22.6.0 Time since boot: 38 days, 17 hours, 31 minutes Graphics/Displays: Radeon Pro 580: Chipset Model: Radeon Pro 580 Type: GPU Bus: PCIe PCIe Lane Width: x16 VRAM (Total): 8 GB Vendor: AMD (0x1002) Device ID: 0x67df Revision ID: 0x00c0 ROM Revision: 113-D000AA-931 VBIOS Version: 113-D0001A1X-025 EFI Driver Version: 01.00.931 Metal Support: Metal 2 Displays: iMac: Display Type: Built-In Retina LCD Resolution: Retina 5K (5120 x 2880) Framebuffer Depth: 30-Bit Color (ARGB2101010) Main Display: Yes Mirror: Off Online: Yes Automatically Adjust Brightness: Yes Connection Type: Internal Installed Packages: alabaster: 0.7.13 appdirs: 1.4.4 appnope: 0.1.3 asttokens: 2.4.0 Babel: 2.13.0 backcall: 0.2.0 beautifulsoup4: 4.11.2 blockdiag: 3.0.0 blosc2: 2.0.0 build: 0.10.0 certifi: 2022.12.7 cftime: 1.6.2 charset-normalizer: 3.3.0 ChimeraX-AddCharge: 1.5.12 ChimeraX-AddH: 2.2.5 ChimeraX-AlignmentAlgorithms: 2.0.1 ChimeraX-AlignmentHdrs: 3.4.1 ChimeraX-AlignmentMatrices: 2.1 ChimeraX-Alignments: 2.12.1 ChimeraX-AlphaFold: 1.0 ChimeraX-AltlocExplorer: 1.1.1 ChimeraX-AmberInfo: 1.0 ChimeraX-Arrays: 1.1 ChimeraX-Atomic: 1.49 ChimeraX-AtomicLibrary: 11.0 ChimeraX-AtomSearch: 2.0.1 ChimeraX-AxesPlanes: 2.3.2 ChimeraX-BasicActions: 1.1.2 ChimeraX-BILD: 1.0 ChimeraX-BlastProtein: 2.1.2 ChimeraX-BondRot: 2.0.4 ChimeraX-BugReporter: 1.0.1 ChimeraX-BuildStructure: 2.10.5 ChimeraX-Bumps: 1.0 ChimeraX-BundleBuilder: 1.2.2 ChimeraX-ButtonPanel: 1.0.1 ChimeraX-CageBuilder: 1.0.1 ChimeraX-CellPack: 1.0 ChimeraX-Centroids: 1.3.2 ChimeraX-ChangeChains: 1.1 ChimeraX-CheckWaters: 1.3.1 ChimeraX-ChemGroup: 2.0.1 ChimeraX-Clashes: 2.2.4 ChimeraX-ColorActions: 1.0.3 ChimeraX-ColorGlobe: 1.0 ChimeraX-ColorKey: 1.5.4 ChimeraX-CommandLine: 1.2.5 ChimeraX-ConnectStructure: 2.0.1 ChimeraX-Contacts: 1.0.1 ChimeraX-Core: 1.7.dev202310140743 ChimeraX-CoreFormats: 1.2 ChimeraX-coulombic: 1.4.2 ChimeraX-Crosslinks: 1.0 ChimeraX-Crystal: 1.0 ChimeraX-CrystalContacts: 1.0.1 ChimeraX-DataFormats: 1.2.3 ChimeraX-Dicom: 1.2 ChimeraX-DistMonitor: 1.4 ChimeraX-DockPrep: 1.1.2 ChimeraX-Dssp: 2.0 ChimeraX-EMDB-SFF: 1.0 ChimeraX-ESMFold: 1.0 ChimeraX-FileHistory: 1.0.1 ChimeraX-FunctionKey: 1.0.1 ChimeraX-Geometry: 1.3 ChimeraX-gltf: 1.0 ChimeraX-Graphics: 1.1.1 ChimeraX-Hbonds: 2.4 ChimeraX-Help: 1.2.2 ChimeraX-HKCage: 1.3 ChimeraX-IHM: 1.1 ChimeraX-ImageFormats: 1.2 ChimeraX-IMOD: 1.0 ChimeraX-IO: 1.0.1 ChimeraX-ItemsInspection: 1.0.1 ChimeraX-IUPAC: 1.0 ChimeraX-Label: 1.1.8 ChimeraX-ListInfo: 1.2.1 ChimeraX-Log: 1.1.6 ChimeraX-LookingGlass: 1.1 ChimeraX-Maestro: 1.9.1 ChimeraX-Map: 1.1.4 ChimeraX-MapData: 2.0 ChimeraX-MapEraser: 1.0.1 ChimeraX-MapFilter: 2.0.1 ChimeraX-MapFit: 2.0 ChimeraX-MapSeries: 2.1.1 ChimeraX-Markers: 1.0.1 ChimeraX-Mask: 1.0.2 ChimeraX-MatchMaker: 2.1.2 ChimeraX-MCopy: 1.0 ChimeraX-MDcrds: 2.6 ChimeraX-MedicalToolbar: 1.0.2 ChimeraX-Meeting: 1.0.1 ChimeraX-MLP: 1.1.1 ChimeraX-mmCIF: 2.12.1 ChimeraX-MMTF: 2.2 ChimeraX-Modeller: 1.5.12 ChimeraX-ModelPanel: 1.4 ChimeraX-ModelSeries: 1.0.1 ChimeraX-Mol2: 2.0.3 ChimeraX-Mole: 1.0 ChimeraX-Morph: 1.0.2 ChimeraX-MouseModes: 1.2 ChimeraX-Movie: 1.0 ChimeraX-Neuron: 1.0 ChimeraX-Nifti: 1.1 ChimeraX-NIHPresets: 1.1.9 ChimeraX-NRRD: 1.1 ChimeraX-Nucleotides: 2.0.3 ChimeraX-OpenCommand: 1.13 ChimeraX-PDB: 2.7.2 ChimeraX-PDBBio: 1.0.1 ChimeraX-PDBLibrary: 1.0.2 ChimeraX-PDBMatrices: 1.0 ChimeraX-PickBlobs: 1.0.1 ChimeraX-Positions: 1.0 ChimeraX-PresetMgr: 1.1 ChimeraX-PubChem: 2.1 ChimeraX-ReadPbonds: 1.0.1 ChimeraX-Registration: 1.1.2 ChimeraX-RemoteControl: 1.0 ChimeraX-RenderByAttr: 1.1 ChimeraX-RenumberResidues: 1.1 ChimeraX-ResidueFit: 1.0.1 ChimeraX-RestServer: 1.2 ChimeraX-RNALayout: 1.0 ChimeraX-RotamerLibMgr: 4.0 ChimeraX-RotamerLibsDunbrack: 2.0 ChimeraX-RotamerLibsDynameomics: 2.0 ChimeraX-RotamerLibsRichardson: 2.0 ChimeraX-SaveCommand: 1.5.1 ChimeraX-SchemeMgr: 1.0 ChimeraX-SDF: 2.0.1 ChimeraX-Segger: 1.0 ChimeraX-Segment: 1.0.1 ChimeraX-SelInspector: 1.0 ChimeraX-SeqView: 2.11 ChimeraX-Shape: 1.0.1 ChimeraX-Shell: 1.0.1 ChimeraX-Shortcuts: 1.1.1 ChimeraX-ShowSequences: 1.0.2 ChimeraX-SideView: 1.0.1 ChimeraX-Smiles: 2.1.2 ChimeraX-SmoothLines: 1.0 ChimeraX-SpaceNavigator: 1.0 ChimeraX-StdCommands: 1.12.2 ChimeraX-STL: 1.0.1 ChimeraX-Storm: 1.0 ChimeraX-StructMeasure: 1.1.2 ChimeraX-Struts: 1.0.1 ChimeraX-Surface: 1.0.1 ChimeraX-SwapAA: 2.0.1 ChimeraX-SwapRes: 2.2.2 ChimeraX-TapeMeasure: 1.0 ChimeraX-TaskManager: 1.0 ChimeraX-Test: 1.0 ChimeraX-Toolbar: 1.1.2 ChimeraX-ToolshedUtils: 1.2.4 ChimeraX-Topography: 1.0 ChimeraX-ToQuest: 1.0 ChimeraX-Tug: 1.0.1 ChimeraX-UI: 1.33.1 ChimeraX-uniprot: 2.3 ChimeraX-UnitCell: 1.0.1 ChimeraX-ViewDockX: 1.3.2 ChimeraX-VIPERdb: 1.0 ChimeraX-Vive: 1.1 ChimeraX-VolumeMenu: 1.0.1 ChimeraX-VTK: 1.0 ChimeraX-WavefrontOBJ: 1.0 ChimeraX-WebCam: 1.0.2 ChimeraX-WebServices: 1.1.2 ChimeraX-Zone: 1.0.1 colorama: 0.4.6 comm: 0.1.4 contourpy: 1.1.1 cxservices: 1.2.2 cycler: 0.12.1 Cython: 0.29.33 debugpy: 1.8.0 decorator: 5.1.1 docutils: 0.19 executing: 2.0.0 filelock: 3.9.0 fonttools: 4.43.1 funcparserlib: 1.0.1 glfw: 2.6.2 grako: 3.16.5 h5py: 3.10.0 html2text: 2020.1.16 idna: 3.4 ihm: 0.38 imagecodecs: 2023.9.18 imagesize: 1.4.1 ipykernel: 6.23.2 ipython: 8.14.0 ipython-genutils: 0.2.0 ipywidgets: 8.1.1 jedi: 0.18.2 Jinja2: 3.1.2 jupyter-client: 8.2.0 jupyter-core: 5.4.0 jupyterlab-widgets: 3.0.9 kiwisolver: 1.4.5 line-profiler: 4.0.2 lxml: 4.9.2 lz4: 4.3.2 MarkupSafe: 2.1.3 matplotlib: 3.7.2 matplotlib-inline: 0.1.6 msgpack: 1.0.4 nest-asyncio: 1.5.8 netCDF4: 1.6.2 networkx: 3.1 nibabel: 5.0.1 nptyping: 2.5.0 numexpr: 2.8.7 numpy: 1.25.1 openvr: 1.23.701 packaging: 21.3 ParmEd: 3.4.3 parso: 0.8.3 pep517: 0.13.0 pexpect: 4.8.0 pickleshare: 0.7.5 Pillow: 10.0.1 pip: 23.0 pkginfo: 1.9.6 platformdirs: 3.11.0 prompt-toolkit: 3.0.39 psutil: 5.9.5 ptyprocess: 0.7.0 pure-eval: 0.2.2 py-cpuinfo: 9.0.0 pycollada: 0.7.2 pydicom: 2.3.0 Pygments: 2.16.1 pynrrd: 1.0.0 PyOpenGL: 3.1.7 PyOpenGL-accelerate: 3.1.7 pyopenxr: 1.0.2801 pyparsing: 3.0.9 pyproject-hooks: 1.0.0 PyQt6-commercial: 6.3.1 PyQt6-Qt6: 6.3.2 PyQt6-sip: 13.4.0 PyQt6-WebEngine-commercial: 6.3.1 PyQt6-WebEngine-Qt6: 6.3.2 python-dateutil: 2.8.2 pytz: 2023.3.post1 pyzmq: 25.1.1 qtconsole: 5.4.3 QtPy: 2.4.0 RandomWords: 0.4.0 requests: 2.31.0 scipy: 1.11.1 setuptools: 67.4.0 setuptools-scm: 7.0.5 sfftk-rw: 0.7.3 six: 1.16.0 snowballstemmer: 2.2.0 sortedcontainers: 2.4.0 soupsieve: 2.5 sphinx: 6.1.3 sphinx-autodoc-typehints: 1.22 sphinxcontrib-applehelp: 1.0.7 sphinxcontrib-blockdiag: 3.0.0 sphinxcontrib-devhelp: 1.0.5 sphinxcontrib-htmlhelp: 2.0.4 sphinxcontrib-jsmath: 1.0.1 sphinxcontrib-qthelp: 1.0.6 sphinxcontrib-serializinghtml: 1.1.9 stack-data: 0.6.3 superqt: 0.5.0 tables: 3.8.0 tcia-utils: 1.5.1 tifffile: 2023.7.18 tinyarray: 1.2.4 tomli: 2.0.1 tornado: 6.3.3 traitlets: 5.9.0 typing-extensions: 4.8.0 tzdata: 2023.3 urllib3: 2.0.6 wcwidth: 0.2.8 webcolors: 1.12 wheel: 0.38.4 wheel-filename: 1.4.1 widgetsnbextension: 4.0.9
Change History (6)
comment:1 by , 2 years ago
Component: | Unassigned → Sequence |
---|---|
Owner: | set to |
Platform: | → all |
Project: | → ChimeraX |
Status: | new → assigned |
Summary: | ChimeraX bug report submission → BlastProtein returns more results than the 100 maximum requested |
comment:2 by , 23 months ago
comment:3 by , 23 months ago
I was too optimistic and only got 100 back by luck on some other query.
comment:4 by , 23 months ago
The example given this ticket, PDB 6c0w, chain B, gives 1480 results back. Maybe it is limited to 100 PDB entries, but it doesn't seem like that since that would mean on average every PDB had 14 sequence matches. It may be the identical sequences in different PDB entries count as only one hit. I suspect it is probably working correctly, only we don't understand what the maximum sequence option in blastp is doing. The goal may be to find what that option actually does and document it.
comment:5 by , 18 months ago
I'm not sure I understand what max_target_seqs does. I did a BLAST of 6c0w/B limited to 100 entries, and got the same number of results you did. When I looked on the server, however, the BLAST JSON results file has 85 "hit" objects, and each hit list several different PDB IDs. For example, the first "hit" object has 7XVL_{B,F,L,P,V,Z,f,j}, 7XVM_{B,F,L,P}, 7XX5_{B,F,L,P}, 7XX6_{B,F,L,P,V,Z,f,j}, and 7XX7_{B,F,L,P} -- that's 24 matches! They have identical taxids (9606), and then the "hsps" list gives them all the same number, bit_score, score, evalue, etc.
comment:6 by , 18 months ago
My guess his that max_target_seqs means the maximum number of unique sequences. Probably many PDB chains have exactly the same sequence so all those contribute just 1 sequence toward the limit. Especially with multiple chains in the same PDB entry (e.g. chain ids A,B,C...) for multimeric structures with many copies of the same protein, there will be separate match for each chain but just one sequence hit. We should try to figure out if that is the way it counts them and document it. No need to offer a different maximum that would strictly count chains I think -- best to stick what BLAST itself does.
Looks like this only affects the command. When I run a BLAST job through the GUI, I get 100 results back. Interesting, and a good lead!