Opened 5 months ago

Last modified 5 months ago

#17792 accepted defect

Edge case PDB file freezes ChimeraX

Reported by: Roden Deng Luo Owned by: Eric Pettersen
Priority: normal Milestone:
Component: Input/Output Version:
Keywords: Cc:
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

Hi,

I downloaded the attached file from https://zenodo.org/records/14271435.
Trying to open It just freezes ChimeraX version 1.9 (2024-12-11) on Ubuntu.
In debug mode in PyCharm, I could not pause the program to see the
potential problem. The major difference between it and the official 1KBH is
that all the models/states are concatenated and there is no header or
metadata. An error message would be good.

Best,
Roden

-- 

This message and its contents, including attachments are intended solely 
for the original recipient. If you are not the intended recipient or have 
received this message in error, please notify me immediately and delete 
this message from your computer system. Any unauthorized use or 
distribution is prohibited. Please consider the environment before printing 
this email.

difference_summary.txt

SKEMPI_v2.0_1KBH.pdb

Attachments (3)

difference_summary.txt (1.5 KB ) - added by Roden Deng Luo 5 months ago.
Added by email2trac
SKEMPI_v2.0_1KBH.pdb (2.6 MB ) - added by Roden Deng Luo 5 months ago.
Added by email2trac
filter_pdbs.py (1.4 KB ) - added by Roden Deng Luo 5 months ago.
Added by email2trac

Change History (5)

by Roden Deng Luo, 5 months ago

Attachment: difference_summary.txt added

Added by email2trac

by Roden Deng Luo, 5 months ago

Attachment: SKEMPI_v2.0_1KBH.pdb added

Added by email2trac

comment:1 by Eric Pettersen, 5 months ago

Component: UnassignedInput/Output
Owner: set to Eric Pettersen
Platform: all
Project: ChimeraX
Status: newaccepted

Why would they "flatten" a file like this? It makes no sense. It saves a trivial amount of storage space relative to that taken by the coordinate records.

comment:2 by Roden Deng Luo, 5 months ago

My guess is this may be some kind of "common" practice in the deep learning
(DL) world for protein structures.

I saw quite some single state cases (not causing any problems) as such
before, from different DL projects. For this specific case, the direct
ref, PPB-Affinity (https://www.nature.com/articles/s41597-024-03997-4)
reported a curated dataset, potentially subject to DL uses. The previously
attached specific data point originated from SKEMPI 2.0
https://academic.oup.com/bioinformatics/article/35/3/462/5055583. The paper
has a "Processed PDB files" section. But it is not clear to me why those
mentioned intentions would result in such an end file.

I used the attached python script to detect multiple occurrences of
chain-residue number pairs; if such, I delete the data point from my
current pipeline. There are about 200 such cases in PPB-Affinity.

On Tue, May 27, 2025 at 10:08 PM ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
wrote:

>
>
>
>

filter_pdbs.py

by Roden Deng Luo, 5 months ago

Attachment: filter_pdbs.py added

Added by email2trac

Note: See TracTickets for help on using tickets.