Opened 3 years ago

Closed 3 years ago

#7561 closed enhancement (fixed)

Make ChimeraX fetch 8 character PDB identifiers

Reported by: Tom Goddard Owned by: pett
Priority: moderate Milestone:
Component: Input/Output Version:
Keywords: Cc: Greg Couch
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

The PDB announced they will start using 8 character PDB identifiers. These identifiers are already in all PDB mmcif files from August 2021 on. Once they run out of 4 character identifiers, any new entries will only be available using 8 character identifiers. And they say those will only be provided in mmCIF format.

https://www.wwpdb.org/news/news?year=2022#630fee4cebdf34532a949c34

Our current fetch command "open 1a0m" looks specifically for 4 character identifiers and will need to handle 8-character identifier too. Probably should also check that it is alpha-numeric with no "." to avoid mistaking 8 character filenames for identifiers. Also the code may need a change to the fetch url since the PDB also seems to be putting a "pdb_" in front of the 8 character identifier in most places.

Attachments (2)

pdb_00017fgz-extended_PDB_CCD_codes-model.cif (280.5 KB ) - added by pett 3 years ago.
pdb_00017xsv-extended_PDB_CCD_codes-model.cif (453.5 KB ) - added by pett 3 years ago.

Download all attachments as: .zip

Change History (8)

comment:1 by pett, 3 years ago

Status: assignedaccepted

comment:2 by pett, 3 years ago

Status: acceptedfeedback

Well, I have added support for 8-character IDs (https://github.com/RBVI/ChimeraX/commit/91db3471cdaf07fa1f11788fc9edab06d670e9db) but am mystified as to how to test it, since there are no actual 8-character ID entries. Using the site we currently fetch from (http://files.rcsb.org/download/%s.cif) works for 4-character IDs but not the equivalent (e.g. "0000" in front) 8-character IDs, even with "pdb_" prepended. Using the DOI brings you to the entry's page -- it does not download a file.

Open to testing ideas. Or should I prod the PDB somehow?

Last edited 3 years ago by pett (previous) (diff)

comment:3 by Tom Goddard, 3 years ago

I guess we need to ask PDB if they have files online with names like pdb_00001xyz.cif that would allow a fetch with the 8 character id. I didn't see them online and I guess they probably don't exist yet. Probably you should ask PDB if those files are available and tell them we want to test ChimeraX fetching the files.

comment:4 by pett, 3 years ago

Status: feedbackaccepted

Have shot off mail to the PDB asking about how to test 8-character IDs.

comment:5 by pett, 3 years ago

Cc: Greg Couch added

The RCSB has pledged to support 8-character-ID downloads. I tried the example entries referenced in the link in the ticket description, and ChimeraX fails to read them with "stying lost" a few thousand lines into the file, at the point where the 5-character CCD entry is encountered in the ATOM/HETATM records. I don't know if the actual final form of these entries will behave the same way or if they will allow 5 columns for the residue name throughout. For convenience, I have attached the example files to this ticket.

Last edited 3 years ago by pett (previous) (diff)

comment:6 by pett, 3 years ago

Resolution: fixed
Status: acceptedclosed

There is already another ticket for lost styling I think. Can't do anything more on 8-character IDs until the RCSB provides a way to fetch them.

Note: See TracTickets for help on using tickets.