Opened 5 years ago

Closed 5 years ago

#3528 closed defect (fixed)

Obsolete Blast hits

Reported by: Tristan Croll Owned by: pett
Priority: minor Milestone:
Component: Sequence Version:
Keywords: Cc: Conrad
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

The following bug report has been submitted:
Platform:        Linux-3.10.0-1127.13.1.el7.x86_64-x86_64-with-centos-7.8.2003-Core
ChimeraX Version: 1.0 (2020-06-04 23:15:07 UTC)
Description
A search using the Blast Protein tool picked up 4jcb, which has been obsoleted since 2014. Would be a good idea to regularly weed out obsolete entries from your database - they're usually removed for pretty good reason, and almost always replaced with improved models.

Log:
UCSF ChimeraX version: 1.0 (2020-06-04)  
© 2016-2020 Regents of the University of California. All rights reserved.  
How to cite UCSF ChimeraX  

> open final.cif

Summary of feedback from opening final.cif  
---  
warnings | Unknown polymer entity '1' near line 599  
Unknown polymer entity '2' near line 10916  
Unknown polymer entity '3' near line 11350  
Unknown polymer entity '4' near line 21311  
Unknown polymer entity '5' near line 23726  
8 messages similar to the above omitted  
Atom C1 is not in the residue template for GPC /AV:101  
Atom C1 is not in the residue template for GPC /BA:101  
Atom C1 is not in the residue template for GPC /BB:101  
Atom C1 is not in the residue template for GPC /BC:101  
Atom C1 is not in the residue template for GPC /BD:101  
36 messages similar to the above omitted  
Missing or incomplete entity_poly_seq table. Inferred polymer connectivity.  
  
Chain information for final.cif #1  
---  
Chain | Description  
AA AB AC AD AE AF AG AH AI AJ AK AL AM AN AO AP AQ AR AS AT AU AV AW AX | ?  
BA BC BF BG BH BJ BK BL BM BN BO BP BQ BR BS BT BU BX ba bb bc bd be bf bg bh
bi bj bk bl bm bo bp | ?  
BB BD BE BI BV BW bn | ?  
C | ?  
H1 | ?  
H2 | ?  
L | ?  
M | ?  
UA | ?  
UB | ?  
UC | ?  
aa | ?  
ab ac ad ae af ag ah ai aj ak al am an ao ap | ?  
  

> addh

Summary of feedback from adding hydrogens to final.cif #1  
---  
warnings | Not adding hydrogens to /UA UNK 2 CB because it is missing heavy-
atom bond partners  
Not adding hydrogens to /UA UNK 3 CB because it is missing heavy-atom bond
partners  
Not adding hydrogens to /UA UNK 4 CB because it is missing heavy-atom bond
partners  
Not adding hydrogens to /UA UNK 5 CB because it is missing heavy-atom bond
partners  
Not adding hydrogens to /UA UNK 6 CB because it is missing heavy-atom bond
partners  
59 messages similar to the above omitted  
notes | No usable SEQRES records for final.cif (#1) chain AA; guessing termini
instead  
No usable SEQRES records for final.cif (#1) chain AB; guessing termini instead  
No usable SEQRES records for final.cif (#1) chain AC; guessing termini instead  
No usable SEQRES records for final.cif (#1) chain AD; guessing termini instead  
No usable SEQRES records for final.cif (#1) chain AE; guessing termini instead  
83 messages similar to the above omitted  
Chain-initial residues that are actual N termini: /AA HIS 2, /AB HIS 2, /AC
HIS 2, /AD HIS 2, /AE HIS 2, /AF HIS 2, /AG HIS 2, /AH HIS 2, /AI HIS 2, /AJ
HIS 2, /AK HIS 2, /AL HIS 2, /AM HIS 2, /AN HIS 2, /AO HIS 2, /AP HIS 2, /AQ
HIS 2, /AR HIS 2, /AS HIS 2, /AT HIS 2, /AU HIS 2, /AV HIS 2, /AW HIS 2, /AX
HIS 2, /BA GLY 6, /BB GLY 5, /BC GLY 6, /BD GLY 5, /BE GLY 5, /BF GLY 6, /BG
GLY 6, /BH GLY 6, /BI GLY 5, /BJ GLY 6, /BK GLY 6, /BL GLY 6, /BM GLY 6, /BN
GLY 6, /BO GLY 6, /BP GLY 6, /BQ GLY 6, /BR GLY 6, /BS GLY 6, /BT GLY 6, /BU
GLY 6, /BV GLY 5, /BW GLY 5, /BX GLY 6, /C ALA 15, /H1 MET 1, /H2 SER 1, /L
ALA 1, /M MET 1, /UA UNK 2, /UB UNK 18, /UC UNK 1, /aa HIS 2, /ab MET 1, /ac
MET 1, /ad MET 1, /ae MET 1, /af MET 1, /ag MET 1, /ah MET 1, /ai MET 1, /aj
MET 1, /ak MET 1, /al MET 1, /am MET 1, /an MET 1, /ao MET 1, /ap MET 1, /ba
GLY 6, /bb GLY 6, /bc GLY 6, /bd GLY 6, /be GLY 6, /bf GLY 6, /bg GLY 6, /bh
GLY 6, /bi GLY 6, /bj GLY 6, /bk GLY 6, /bl GLY 6, /bm GLY 6, /bn GLY 5, /bo
GLY 6, /bp GLY 6  
Chain-initial residues that are not actual N termini:  
Chain-final residues that are actual C termini: /BA PHE 44, /BB PHE 44, /BC
PHE 44, /BD PHE 44, /BE PHE 44, /BF PHE 44, /BG PHE 44, /BH PHE 44, /BI PHE
44, /BJ PHE 44, /BK PHE 44, /BL PHE 44, /BM PHE 44, /BN PHE 44, /BO PHE 44,
/BP PHE 44, /BQ PHE 44, /BR PHE 44, /BS PHE 44, /BT PHE 44, /BU PHE 44, /BV
PHE 44, /BW PHE 44, /BX PHE 44, /C ARG 302, /H2 ILE 181, /L LYS 273, /ba PHE
44, /bb PHE 44, /bc PHE 44, /bd PHE 44, /be PHE 44, /bf PHE 44, /bg PHE 44,
/bh PHE 44, /bi PHE 44, /bj PHE 44, /bk PHE 44, /bl PHE 44, /bm PHE 44, /bn
PHE 44, /bo PHE 44, /bp PHE 44  
Chain-final residues that are not actual C termini: /AA TYR 46, /AB TYR 46,
/AC TYR 46, /AD TYR 46, /AE TYR 46, /AF TYR 46, /AG TYR 46, /AH TYR 46, /AI
TYR 46, /AJ TYR 46, /AK TYR 46, /AL TYR 46, /AM TYR 46, /AN TYR 46, /AO TYR
46, /AP TYR 46, /AQ TYR 46, /AR TYR 46, /AS TYR 46, /AT TYR 46, /AU TYR 46,
/AV TYR 46, /AW TYR 46, /AX TYR 46, /H1 LYS 53, /M TYR 324, /UA UNK 33, /UB
UNK 17, /UC UNK 14, /aa ALA 60, /ab ALA 60, /ac ALA 60, /ad ALA 60, /ae ALA
60, /af ALA 60, /ag ALA 60, /ah ALA 60, /ai ALA 60, /aj ALA 60, /ak ALA 60,
/al ALA 60, /am ALA 60, /an ALA 60, /ao ALA 60, /ap ALA 60  
4988 hydrogen bonds  
/AA TYR 46 is not terminus, removing H atom from 'C'  
/AB TYR 46 is not terminus, removing H atom from 'C'  
/AC TYR 46 is not terminus, removing H atom from 'C'  
/AD TYR 46 is not terminus, removing H atom from 'C'  
/AE TYR 46 is not terminus, removing H atom from 'C'  
40 messages similar to the above omitted  
46120 hydrogens added  
  

> open /run/media/tic20/storage/structure_dump/pu_qian/gprclh1-338aleft.mrc

Opened gprclh1-338aleft.mrc, grid size 300,300,300, pixel 1.05, shown at level
0.0201, step 2, values float32  

> volume gaussian #2 bfactor 40

> clipper associate #2,3 toModel #1

Chain information for final.cif  
---  
Chain | Description  
1.2/AA 1.2/AB 1.2/AC 1.2/AD 1.2/AE 1.2/AF 1.2/AG 1.2/AH 1.2/AI 1.2/AJ 1.2/AK
1.2/AL 1.2/AM 1.2/AN 1.2/AO 1.2/AP 1.2/AQ 1.2/AR 1.2/AS 1.2/AT 1.2/AU 1.2/AV
1.2/AW 1.2/AX | ?  
1.2/BA 1.2/BC 1.2/BF 1.2/BG 1.2/BH 1.2/BJ 1.2/BK 1.2/BL 1.2/BM 1.2/BN 1.2/BO
1.2/BP 1.2/BQ 1.2/BR 1.2/BS 1.2/BT 1.2/BU 1.2/BX 1.2/ba 1.2/bb 1.2/bc 1.2/bd
1.2/be 1.2/bf 1.2/bg 1.2/bh 1.2/bi 1.2/bj 1.2/bk 1.2/bl 1.2/bm 1.2/bo 1.2/bp |
?  
1.2/BB 1.2/BD 1.2/BE 1.2/BI 1.2/BV 1.2/BW 1.2/bn | ?  
1.2/C | ?  
1.2/H1 | ?  
1.2/H2 | ?  
1.2/L | ?  
1.2/M | ?  
1.2/UA | ?  
1.2/UB | ?  
1.2/UC | ?  
1.2/aa | ?  
1.2/ab 1.2/ac 1.2/ad 1.2/ae 1.2/af 1.2/ag 1.2/ah 1.2/ai 1.2/aj 1.2/ak 1.2/al
1.2/am 1.2/an 1.2/ao 1.2/ap | ?  
  

> isolde start

> set selectionWidth 4

Done loading forcefield  

> ui tool show Shell

/opt/UCSF/ChimeraX/lib/python3.7/site-packages/IPython/core/history.py:226:
UserWarning: IPython History requires SQLite, your history will not be saved  
warn("IPython History requires SQLite, your history will not be saved")  

Failed to add
/run/media/tic20/storage/structure_dump/pu_qian/new_2020_05/GP1.xml: Residue
template USER_GP1 with the same override level 0 already exists.  

Failed to add
/run/media/tic20/storage/structure_dump/pu_qian/new_2020_05/GPC.xml: Residue
template USER_GPC with the same override level 0 already exists.  

> set bgColor white

> show sel

> delete sel

> clipper set contourSensitivity 0.25

> select clear

> select /UA

226 atoms, 225 bonds, 1 model selected  

> view sel

> ui tool show "Blast Protein"

> blastprotein /ae database pdb cutoff 1e-3 matrix BLOSUM62 maxSeqs 100 name
> bp1

Web Service: BlastProtein2 is a Python wrapper that calls blastp to search nr
or pdb for sequences similar to the given protein sequence  
Opal service URL:
http://webservices.rbvi.ucsf.edu/opal2/services/BlastProtein2Service  
Opal job id: appBlastProtein2Service15950736967141427397413  
Opal status URL prefix:
http://webservices.rbvi.ucsf.edu/appBlastProtein2Service15950736967141427397413  
stdout.txt = standard output  
stderr.txt = standard error  
BlastProtein finished.  

> open 1xrd

Summary of feedback from opening 1xrd fetched from pdb  
---  
warning | Atom H1 is not in the residue template for MET /A:1  
note | Fetching compressed mmCIF 1xrd from
http://files.rcsb.org/download/1xrd.cif  
  
1xrd title:  
Light-Harvesting Complex 1 Alfa Subunit from Wild-Type Rhodospirillum rubrum
[more info...]  
  
Chain information for 1xrd  
---  
Chain | Description  
2.1/A 2.2/A 2.3/A 2.4/A 2.5/A 2.6/A 2.7/A 2.8/A 2.9/A 2.10/A | Light-
harvesting protein B-880, α chain  
  

> matchmaker #2/A to #1/ae

Parameters  
---  
Chain pairing | bb  
Alignment algorithm | Needleman-Wunsch  
Similarity matrix | BLOSUM-62  
SS fraction | 0.3  
Gap open (HH/SS/other) | 18/18/6  
Gap extend | 1  
SS matrix |  |  | H | S | O  
---|---|---|---  
H | 6 | -9 | -6  
S |  | 6 | -6  
O |  |  | 4  
Iteration cutoff | 2  
  
Matchmaker final.cif, chain ae (#1.2) with 1xrd, chain A (#2.1), sequence
alignment score = 166.3  
RMSD between 21 pruned atom pairs is 0.910 angstroms; (across all 52 pairs:
9.822)  
  
Matchmaker final.cif, chain ae (#1.2) with 1xrd, chain A (#2.2), sequence
alignment score = 169.3  
RMSD between 26 pruned atom pairs is 1.041 angstroms; (across all 52 pairs:
8.892)  
  
Matchmaker final.cif, chain ae (#1.2) with 1xrd, chain A (#2.3), sequence
alignment score = 169.3  
RMSD between 23 pruned atom pairs is 0.901 angstroms; (across all 52 pairs:
8.994)  
  
Matchmaker final.cif, chain ae (#1.2) with 1xrd, chain A (#2.4), sequence
alignment score = 160.8  
RMSD between 21 pruned atom pairs is 0.966 angstroms; (across all 52 pairs:
9.853)  
  
Matchmaker final.cif, chain ae (#1.2) with 1xrd, chain A (#2.5), sequence
alignment score = 159.7  
RMSD between 22 pruned atom pairs is 1.120 angstroms; (across all 52 pairs:
9.871)  
  
Matchmaker final.cif, chain ae (#1.2) with 1xrd, chain A (#2.6), sequence
alignment score = 157.8  
RMSD between 27 pruned atom pairs is 0.862 angstroms; (across all 52 pairs:
8.182)  
  
Matchmaker final.cif, chain ae (#1.2) with 1xrd, chain A (#2.7), sequence
alignment score = 162.7  
RMSD between 22 pruned atom pairs is 1.030 angstroms; (across all 52 pairs:
9.215)  
  
Matchmaker final.cif, chain ae (#1.2) with 1xrd, chain A (#2.8), sequence
alignment score = 157.3  
RMSD between 20 pruned atom pairs is 1.037 angstroms; (across all 52 pairs:
8.139)  
  
Matchmaker final.cif, chain ae (#1.2) with 1xrd, chain A (#2.9), sequence
alignment score = 165.7  
RMSD between 23 pruned atom pairs is 0.988 angstroms; (across all 52 pairs:
10.126)  
  
Matchmaker final.cif, chain ae (#1.2) with 1xrd, chain A (#2.10), sequence
alignment score = 151.2  
RMSD between 25 pruned atom pairs is 0.983 angstroms; (across all 52 pairs:
7.352)  
  

> close #2

> open 4jc9

Summary of feedback from opening 4jc9 fetched from pdb  
---  
warning | PDB entry 4JC9 has been replaced by 4V9G  
notes | Fetching compressed mmCIF 4jc9 from
http://files.rcsb.org/download/4jc9.cif  
Fetching CCD SPO from http://ligand-expo.rcsb.org/reports/S/SPO/SPO.cif  
  
4jc9 title:  
RC-LH1-PufX dimer complex from Rhodobacter sphaeroides [more info...]  
  
Chain information for 4jc9 #2  
---  
Chain | Description  
1 2 3 5 7 D F J N P T V X Z | Light-harvesting protein B-875 α chain  
4 6 8 9 E G I K O Q S U W Y | Light-harvesting protein B-875 β chain  
B | Intrinsic membrane protein PufX  
H | Reaction center protein H chain  
L | Reaction center protein L chain  
M | Reaction center protein M chain  
  
Non-standard residues in 4jc9 #2  
---  
BCL — bacteriochlorophyll A  
BPH — bacteriopheophytin A  
FE2 — Fe (II) ion  
PO4 — phosphate ion  
SPO — spheroidene  
U10 — ubiquinone-10 (Coenzyme Q10)  
  

> matchmaker #2/1 to #1/ae

Parameters  
---  
Chain pairing | bb  
Alignment algorithm | Needleman-Wunsch  
Similarity matrix | BLOSUM-62  
SS fraction | 0.3  
Gap open (HH/SS/other) | 18/18/6  
Gap extend | 1  
SS matrix |  |  | H | S | O  
---|---|---|---  
H | 6 | -9 | -6  
S |  | 6 | -6  
O |  |  | 4  
Iteration cutoff | 2  
  
Matchmaker final.cif, chain ae (#1.2) with 4jc9, chain 1 (#2), sequence
alignment score = 57.4  
RMSD between 24 pruned atom pairs is 1.059 angstroms; (across all 42 pairs:
4.207)  
  

> close #2




OpenGL version: 3.3.0 NVIDIA 450.51.05
OpenGL renderer: TITAN Xp/PCIe/SSE2
OpenGL vendor: NVIDIA Corporation
Manufacturer: Dell Inc.
Model: Precision T5600
OS: CentOS Linux 7 Core
Architecture: 64bit ELF
CPU: 32 Intel(R) Xeon(R) CPU E5-2687W 0 @ 3.10GHz
Cache Size: 20480 KB
Memory:
	              total        used        free      shared  buff/cache   available
	Mem:            62G        8.1G         45G        238M        9.0G         53G
	Swap:          4.9G          0B        4.9G

Graphics:
	03:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP102 [TITAN Xp] [10de:1b02] (rev a1)	
	Subsystem: NVIDIA Corporation Device [10de:11df]	
	Kernel driver in use: nvidia
PyQt version: 5.12.3
Compiled Qt version: 5.12.4
Runtime Qt version: 5.12.8

Change History (8)

comment:1 by pett, 5 years ago

Cc: Conrad added
Component: UnassignedSequence
Owner: set to pett
Platform: all
Project: ChimeraX
Status: newaccepted
Summary: ChimeraX bug report submissionObsolete Blast hits

comment:2 by pett, 5 years ago

Priority: normalminor

Hi Tristan,

Well, it's not our databases per se, we simply copy them from the NCBI on a weekly basis. So then reading through that database, finding which entries were obsolete and either deleting or updating them would be quite a task. I agree it would be good to do in theory, but unlikely to happen. I will try complaining to the NCBI at some point.

--Eric

in reply to:  3 ; comment:3 by Tristan Croll, 5 years ago

Well... that might explain a bit. Use of obsolete entries as templates was quite an issue in the last CASP round. I thought it was mostly people using (and not properly updating) their own bespoke databases - and that’s certainly true for some of the servers. But if the official NCBI database is also at fault... well, crap.


in reply to:  4 ; comment:4 by Tristan Croll, 5 years ago

If you were feeling enthusiastic, the up-to-date list of obsolete 
entries (and their replacements, if any) is kept at 
http://ftp.wwpdb.org/pub/pdb/data/status/obsolete.dat. Rather than 
filtering out of the database, it might be easier to just filter the 
search results against that before formatting for display?

On 2020-07-20 17:16, ChimeraX wrote:

comment:5 by pett, 5 years ago

Good to know. Maybe.

comment:6 by pett, 5 years ago

I have contacted NLM about the obsolete entries.

Case #: CAS-595013-C6T1M0 

comment:7 by pett, 5 years ago

The response from NLM:

Hello,

Thank you for the notice. I'll notify the blast developers and Structure group and we'll see what we can do.

Best regards,
Wayne

-=-=-=-=-=-=-=-=-=-=-=-
Wayne Matten, PhD
NCBI Customer Services
NCBI | NLM | NIH

comment:8 by pett, 5 years ago

Resolution: fixed
Status: acceptedclosed

It looks like NLM has removed the obsolete structures from the blast database.

Note: See TracTickets for help on using tickets.