Opened 5 years ago
Closed 5 years ago
#3528 closed defect (fixed)
Obsolete Blast hits
| Reported by: | Tristan Croll | Owned by: | Eric Pettersen |
|---|---|---|---|
| Priority: | minor | Milestone: | |
| Component: | Sequence | Version: | |
| Keywords: | Cc: | Conrad | |
| Blocked By: | Blocking: | ||
| Notify when closed: | Platform: | all | |
| Project: | ChimeraX |
Description
The following bug report has been submitted:
Platform: Linux-3.10.0-1127.13.1.el7.x86_64-x86_64-with-centos-7.8.2003-Core
ChimeraX Version: 1.0 (2020-06-04 23:15:07 UTC)
Description
A search using the Blast Protein tool picked up 4jcb, which has been obsoleted since 2014. Would be a good idea to regularly weed out obsolete entries from your database - they're usually removed for pretty good reason, and almost always replaced with improved models.
Log:
UCSF ChimeraX version: 1.0 (2020-06-04)
© 2016-2020 Regents of the University of California. All rights reserved.
How to cite UCSF ChimeraX
> open final.cif
Summary of feedback from opening final.cif
---
warnings | Unknown polymer entity '1' near line 599
Unknown polymer entity '2' near line 10916
Unknown polymer entity '3' near line 11350
Unknown polymer entity '4' near line 21311
Unknown polymer entity '5' near line 23726
8 messages similar to the above omitted
Atom C1 is not in the residue template for GPC /AV:101
Atom C1 is not in the residue template for GPC /BA:101
Atom C1 is not in the residue template for GPC /BB:101
Atom C1 is not in the residue template for GPC /BC:101
Atom C1 is not in the residue template for GPC /BD:101
36 messages similar to the above omitted
Missing or incomplete entity_poly_seq table. Inferred polymer connectivity.
Chain information for final.cif #1
---
Chain | Description
AA AB AC AD AE AF AG AH AI AJ AK AL AM AN AO AP AQ AR AS AT AU AV AW AX | ?
BA BC BF BG BH BJ BK BL BM BN BO BP BQ BR BS BT BU BX ba bb bc bd be bf bg bh
bi bj bk bl bm bo bp | ?
BB BD BE BI BV BW bn | ?
C | ?
H1 | ?
H2 | ?
L | ?
M | ?
UA | ?
UB | ?
UC | ?
aa | ?
ab ac ad ae af ag ah ai aj ak al am an ao ap | ?
> addh
Summary of feedback from adding hydrogens to final.cif #1
---
warnings | Not adding hydrogens to /UA UNK 2 CB because it is missing heavy-
atom bond partners
Not adding hydrogens to /UA UNK 3 CB because it is missing heavy-atom bond
partners
Not adding hydrogens to /UA UNK 4 CB because it is missing heavy-atom bond
partners
Not adding hydrogens to /UA UNK 5 CB because it is missing heavy-atom bond
partners
Not adding hydrogens to /UA UNK 6 CB because it is missing heavy-atom bond
partners
59 messages similar to the above omitted
notes | No usable SEQRES records for final.cif (#1) chain AA; guessing termini
instead
No usable SEQRES records for final.cif (#1) chain AB; guessing termini instead
No usable SEQRES records for final.cif (#1) chain AC; guessing termini instead
No usable SEQRES records for final.cif (#1) chain AD; guessing termini instead
No usable SEQRES records for final.cif (#1) chain AE; guessing termini instead
83 messages similar to the above omitted
Chain-initial residues that are actual N termini: /AA HIS 2, /AB HIS 2, /AC
HIS 2, /AD HIS 2, /AE HIS 2, /AF HIS 2, /AG HIS 2, /AH HIS 2, /AI HIS 2, /AJ
HIS 2, /AK HIS 2, /AL HIS 2, /AM HIS 2, /AN HIS 2, /AO HIS 2, /AP HIS 2, /AQ
HIS 2, /AR HIS 2, /AS HIS 2, /AT HIS 2, /AU HIS 2, /AV HIS 2, /AW HIS 2, /AX
HIS 2, /BA GLY 6, /BB GLY 5, /BC GLY 6, /BD GLY 5, /BE GLY 5, /BF GLY 6, /BG
GLY 6, /BH GLY 6, /BI GLY 5, /BJ GLY 6, /BK GLY 6, /BL GLY 6, /BM GLY 6, /BN
GLY 6, /BO GLY 6, /BP GLY 6, /BQ GLY 6, /BR GLY 6, /BS GLY 6, /BT GLY 6, /BU
GLY 6, /BV GLY 5, /BW GLY 5, /BX GLY 6, /C ALA 15, /H1 MET 1, /H2 SER 1, /L
ALA 1, /M MET 1, /UA UNK 2, /UB UNK 18, /UC UNK 1, /aa HIS 2, /ab MET 1, /ac
MET 1, /ad MET 1, /ae MET 1, /af MET 1, /ag MET 1, /ah MET 1, /ai MET 1, /aj
MET 1, /ak MET 1, /al MET 1, /am MET 1, /an MET 1, /ao MET 1, /ap MET 1, /ba
GLY 6, /bb GLY 6, /bc GLY 6, /bd GLY 6, /be GLY 6, /bf GLY 6, /bg GLY 6, /bh
GLY 6, /bi GLY 6, /bj GLY 6, /bk GLY 6, /bl GLY 6, /bm GLY 6, /bn GLY 5, /bo
GLY 6, /bp GLY 6
Chain-initial residues that are not actual N termini:
Chain-final residues that are actual C termini: /BA PHE 44, /BB PHE 44, /BC
PHE 44, /BD PHE 44, /BE PHE 44, /BF PHE 44, /BG PHE 44, /BH PHE 44, /BI PHE
44, /BJ PHE 44, /BK PHE 44, /BL PHE 44, /BM PHE 44, /BN PHE 44, /BO PHE 44,
/BP PHE 44, /BQ PHE 44, /BR PHE 44, /BS PHE 44, /BT PHE 44, /BU PHE 44, /BV
PHE 44, /BW PHE 44, /BX PHE 44, /C ARG 302, /H2 ILE 181, /L LYS 273, /ba PHE
44, /bb PHE 44, /bc PHE 44, /bd PHE 44, /be PHE 44, /bf PHE 44, /bg PHE 44,
/bh PHE 44, /bi PHE 44, /bj PHE 44, /bk PHE 44, /bl PHE 44, /bm PHE 44, /bn
PHE 44, /bo PHE 44, /bp PHE 44
Chain-final residues that are not actual C termini: /AA TYR 46, /AB TYR 46,
/AC TYR 46, /AD TYR 46, /AE TYR 46, /AF TYR 46, /AG TYR 46, /AH TYR 46, /AI
TYR 46, /AJ TYR 46, /AK TYR 46, /AL TYR 46, /AM TYR 46, /AN TYR 46, /AO TYR
46, /AP TYR 46, /AQ TYR 46, /AR TYR 46, /AS TYR 46, /AT TYR 46, /AU TYR 46,
/AV TYR 46, /AW TYR 46, /AX TYR 46, /H1 LYS 53, /M TYR 324, /UA UNK 33, /UB
UNK 17, /UC UNK 14, /aa ALA 60, /ab ALA 60, /ac ALA 60, /ad ALA 60, /ae ALA
60, /af ALA 60, /ag ALA 60, /ah ALA 60, /ai ALA 60, /aj ALA 60, /ak ALA 60,
/al ALA 60, /am ALA 60, /an ALA 60, /ao ALA 60, /ap ALA 60
4988 hydrogen bonds
/AA TYR 46 is not terminus, removing H atom from 'C'
/AB TYR 46 is not terminus, removing H atom from 'C'
/AC TYR 46 is not terminus, removing H atom from 'C'
/AD TYR 46 is not terminus, removing H atom from 'C'
/AE TYR 46 is not terminus, removing H atom from 'C'
40 messages similar to the above omitted
46120 hydrogens added
> open /run/media/tic20/storage/structure_dump/pu_qian/gprclh1-338aleft.mrc
Opened gprclh1-338aleft.mrc, grid size 300,300,300, pixel 1.05, shown at level
0.0201, step 2, values float32
> volume gaussian #2 bfactor 40
> clipper associate #2,3 toModel #1
Chain information for final.cif
---
Chain | Description
1.2/AA 1.2/AB 1.2/AC 1.2/AD 1.2/AE 1.2/AF 1.2/AG 1.2/AH 1.2/AI 1.2/AJ 1.2/AK
1.2/AL 1.2/AM 1.2/AN 1.2/AO 1.2/AP 1.2/AQ 1.2/AR 1.2/AS 1.2/AT 1.2/AU 1.2/AV
1.2/AW 1.2/AX | ?
1.2/BA 1.2/BC 1.2/BF 1.2/BG 1.2/BH 1.2/BJ 1.2/BK 1.2/BL 1.2/BM 1.2/BN 1.2/BO
1.2/BP 1.2/BQ 1.2/BR 1.2/BS 1.2/BT 1.2/BU 1.2/BX 1.2/ba 1.2/bb 1.2/bc 1.2/bd
1.2/be 1.2/bf 1.2/bg 1.2/bh 1.2/bi 1.2/bj 1.2/bk 1.2/bl 1.2/bm 1.2/bo 1.2/bp |
?
1.2/BB 1.2/BD 1.2/BE 1.2/BI 1.2/BV 1.2/BW 1.2/bn | ?
1.2/C | ?
1.2/H1 | ?
1.2/H2 | ?
1.2/L | ?
1.2/M | ?
1.2/UA | ?
1.2/UB | ?
1.2/UC | ?
1.2/aa | ?
1.2/ab 1.2/ac 1.2/ad 1.2/ae 1.2/af 1.2/ag 1.2/ah 1.2/ai 1.2/aj 1.2/ak 1.2/al
1.2/am 1.2/an 1.2/ao 1.2/ap | ?
> isolde start
> set selectionWidth 4
Done loading forcefield
> ui tool show Shell
/opt/UCSF/ChimeraX/lib/python3.7/site-packages/IPython/core/history.py:226:
UserWarning: IPython History requires SQLite, your history will not be saved
warn("IPython History requires SQLite, your history will not be saved")
Failed to add
/run/media/tic20/storage/structure_dump/pu_qian/new_2020_05/GP1.xml: Residue
template USER_GP1 with the same override level 0 already exists.
Failed to add
/run/media/tic20/storage/structure_dump/pu_qian/new_2020_05/GPC.xml: Residue
template USER_GPC with the same override level 0 already exists.
> set bgColor white
> show sel
> delete sel
> clipper set contourSensitivity 0.25
> select clear
> select /UA
226 atoms, 225 bonds, 1 model selected
> view sel
> ui tool show "Blast Protein"
> blastprotein /ae database pdb cutoff 1e-3 matrix BLOSUM62 maxSeqs 100 name
> bp1
Web Service: BlastProtein2 is a Python wrapper that calls blastp to search nr
or pdb for sequences similar to the given protein sequence
Opal service URL:
http://webservices.rbvi.ucsf.edu/opal2/services/BlastProtein2Service
Opal job id: appBlastProtein2Service15950736967141427397413
Opal status URL prefix:
http://webservices.rbvi.ucsf.edu/appBlastProtein2Service15950736967141427397413
stdout.txt = standard output
stderr.txt = standard error
BlastProtein finished.
> open 1xrd
Summary of feedback from opening 1xrd fetched from pdb
---
warning | Atom H1 is not in the residue template for MET /A:1
note | Fetching compressed mmCIF 1xrd from
http://files.rcsb.org/download/1xrd.cif
1xrd title:
Light-Harvesting Complex 1 Alfa Subunit from Wild-Type Rhodospirillum rubrum
[more info...]
Chain information for 1xrd
---
Chain | Description
2.1/A 2.2/A 2.3/A 2.4/A 2.5/A 2.6/A 2.7/A 2.8/A 2.9/A 2.10/A | Light-
harvesting protein B-880, α chain
> matchmaker #2/A to #1/ae
Parameters
---
Chain pairing | bb
Alignment algorithm | Needleman-Wunsch
Similarity matrix | BLOSUM-62
SS fraction | 0.3
Gap open (HH/SS/other) | 18/18/6
Gap extend | 1
SS matrix | | | H | S | O
---|---|---|---
H | 6 | -9 | -6
S | | 6 | -6
O | | | 4
Iteration cutoff | 2
Matchmaker final.cif, chain ae (#1.2) with 1xrd, chain A (#2.1), sequence
alignment score = 166.3
RMSD between 21 pruned atom pairs is 0.910 angstroms; (across all 52 pairs:
9.822)
Matchmaker final.cif, chain ae (#1.2) with 1xrd, chain A (#2.2), sequence
alignment score = 169.3
RMSD between 26 pruned atom pairs is 1.041 angstroms; (across all 52 pairs:
8.892)
Matchmaker final.cif, chain ae (#1.2) with 1xrd, chain A (#2.3), sequence
alignment score = 169.3
RMSD between 23 pruned atom pairs is 0.901 angstroms; (across all 52 pairs:
8.994)
Matchmaker final.cif, chain ae (#1.2) with 1xrd, chain A (#2.4), sequence
alignment score = 160.8
RMSD between 21 pruned atom pairs is 0.966 angstroms; (across all 52 pairs:
9.853)
Matchmaker final.cif, chain ae (#1.2) with 1xrd, chain A (#2.5), sequence
alignment score = 159.7
RMSD between 22 pruned atom pairs is 1.120 angstroms; (across all 52 pairs:
9.871)
Matchmaker final.cif, chain ae (#1.2) with 1xrd, chain A (#2.6), sequence
alignment score = 157.8
RMSD between 27 pruned atom pairs is 0.862 angstroms; (across all 52 pairs:
8.182)
Matchmaker final.cif, chain ae (#1.2) with 1xrd, chain A (#2.7), sequence
alignment score = 162.7
RMSD between 22 pruned atom pairs is 1.030 angstroms; (across all 52 pairs:
9.215)
Matchmaker final.cif, chain ae (#1.2) with 1xrd, chain A (#2.8), sequence
alignment score = 157.3
RMSD between 20 pruned atom pairs is 1.037 angstroms; (across all 52 pairs:
8.139)
Matchmaker final.cif, chain ae (#1.2) with 1xrd, chain A (#2.9), sequence
alignment score = 165.7
RMSD between 23 pruned atom pairs is 0.988 angstroms; (across all 52 pairs:
10.126)
Matchmaker final.cif, chain ae (#1.2) with 1xrd, chain A (#2.10), sequence
alignment score = 151.2
RMSD between 25 pruned atom pairs is 0.983 angstroms; (across all 52 pairs:
7.352)
> close #2
> open 4jc9
Summary of feedback from opening 4jc9 fetched from pdb
---
warning | PDB entry 4JC9 has been replaced by 4V9G
notes | Fetching compressed mmCIF 4jc9 from
http://files.rcsb.org/download/4jc9.cif
Fetching CCD SPO from http://ligand-expo.rcsb.org/reports/S/SPO/SPO.cif
4jc9 title:
RC-LH1-PufX dimer complex from Rhodobacter sphaeroides [more info...]
Chain information for 4jc9 #2
---
Chain | Description
1 2 3 5 7 D F J N P T V X Z | Light-harvesting protein B-875 α chain
4 6 8 9 E G I K O Q S U W Y | Light-harvesting protein B-875 β chain
B | Intrinsic membrane protein PufX
H | Reaction center protein H chain
L | Reaction center protein L chain
M | Reaction center protein M chain
Non-standard residues in 4jc9 #2
---
BCL — bacteriochlorophyll A
BPH — bacteriopheophytin A
FE2 — Fe (II) ion
PO4 — phosphate ion
SPO — spheroidene
U10 — ubiquinone-10 (Coenzyme Q10)
> matchmaker #2/1 to #1/ae
Parameters
---
Chain pairing | bb
Alignment algorithm | Needleman-Wunsch
Similarity matrix | BLOSUM-62
SS fraction | 0.3
Gap open (HH/SS/other) | 18/18/6
Gap extend | 1
SS matrix | | | H | S | O
---|---|---|---
H | 6 | -9 | -6
S | | 6 | -6
O | | | 4
Iteration cutoff | 2
Matchmaker final.cif, chain ae (#1.2) with 4jc9, chain 1 (#2), sequence
alignment score = 57.4
RMSD between 24 pruned atom pairs is 1.059 angstroms; (across all 42 pairs:
4.207)
> close #2
OpenGL version: 3.3.0 NVIDIA 450.51.05
OpenGL renderer: TITAN Xp/PCIe/SSE2
OpenGL vendor: NVIDIA Corporation
Manufacturer: Dell Inc.
Model: Precision T5600
OS: CentOS Linux 7 Core
Architecture: 64bit ELF
CPU: 32 Intel(R) Xeon(R) CPU E5-2687W 0 @ 3.10GHz
Cache Size: 20480 KB
Memory:
total used free shared buff/cache available
Mem: 62G 8.1G 45G 238M 9.0G 53G
Swap: 4.9G 0B 4.9G
Graphics:
03:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP102 [TITAN Xp] [10de:1b02] (rev a1)
Subsystem: NVIDIA Corporation Device [10de:11df]
Kernel driver in use: nvidia
PyQt version: 5.12.3
Compiled Qt version: 5.12.4
Runtime Qt version: 5.12.8
Change History (8)
comment:1 by , 5 years ago
| Cc: | added |
|---|---|
| Component: | Unassigned → Sequence |
| Owner: | set to |
| Platform: | → all |
| Project: | → ChimeraX |
| Status: | new → accepted |
| Summary: | ChimeraX bug report submission → Obsolete Blast hits |
comment:2 by , 5 years ago
| Priority: | normal → minor |
|---|
follow-up: 3 comment:3 by , 5 years ago
Well... that might explain a bit. Use of obsolete entries as templates was quite an issue in the last CASP round. I thought it was mostly people using (and not properly updating) their own bespoke databases - and that’s certainly true for some of the servers. But if the official NCBI database is also at fault... well, crap.
follow-up: 4 comment:4 by , 5 years ago
If you were feeling enthusiastic, the up-to-date list of obsolete entries (and their replacements, if any) is kept at http://ftp.wwpdb.org/pub/pdb/data/status/obsolete.dat. Rather than filtering out of the database, it might be easier to just filter the search results against that before formatting for display? On 2020-07-20 17:16, ChimeraX wrote:
comment:6 by , 5 years ago
I have contacted NLM about the obsolete entries.
Case #: CAS-595013-C6T1M0
comment:7 by , 5 years ago
The response from NLM:
Hello,
Thank you for the notice. I'll notify the blast developers and Structure group and we'll see what we can do.
Best regards,
Wayne
-=-=-=-=-=-=-=-=-=-=-=-
Wayne Matten, PhD
NCBI Customer Services
NCBI | NLM | NIH
comment:8 by , 5 years ago
| Resolution: | → fixed |
|---|---|
| Status: | accepted → closed |
It looks like NLM has removed the obsolete structures from the blast database.
Note:
See TracTickets
for help on using tickets.
Hi Tristan,
--Eric