Opened 3 years ago
Closed 3 years ago
#7561 closed enhancement (fixed)
Make ChimeraX fetch 8 character PDB identifiers
Reported by: | Tom Goddard | Owned by: | pett |
---|---|---|---|
Priority: | moderate | Milestone: | |
Component: | Input/Output | Version: | |
Keywords: | Cc: | Greg Couch | |
Blocked By: | Blocking: | ||
Notify when closed: | Platform: | all | |
Project: | ChimeraX |
Description
The PDB announced they will start using 8 character PDB identifiers. These identifiers are already in all PDB mmcif files from August 2021 on. Once they run out of 4 character identifiers, any new entries will only be available using 8 character identifiers. And they say those will only be provided in mmCIF format.
https://www.wwpdb.org/news/news?year=2022#630fee4cebdf34532a949c34
Our current fetch command "open 1a0m" looks specifically for 4 character identifiers and will need to handle 8-character identifier too. Probably should also check that it is alpha-numeric with no "." to avoid mistaking 8 character filenames for identifiers. Also the code may need a change to the fetch url since the PDB also seems to be putting a "pdb_" in front of the 8 character identifier in most places.
Attachments (2)
Change History (8)
comment:1 by , 3 years ago
Status: | assigned → accepted |
---|
comment:3 by , 3 years ago
I guess we need to ask PDB if they have files online with names like pdb_00001xyz.cif that would allow a fetch with the 8 character id. I didn't see them online and I guess they probably don't exist yet. Probably you should ask PDB if those files are available and tell them we want to test ChimeraX fetching the files.
comment:4 by , 3 years ago
Status: | feedback → accepted |
---|
Have shot off mail to the PDB asking about how to test 8-character IDs.
comment:5 by , 3 years ago
Cc: | added |
---|
The RCSB has pledged to support 8-character-ID downloads. I tried the example entries referenced in the link in the ticket description, and ChimeraX fails to read them with "stying lost" a few thousand lines into the file, at the point where the 5-character CCD entry is enchanted in the ATOM/HETATM records. I don't know if the actual final form of these entries will behave the same way or if they will allow 5 columns for the residue name throughout. For convenience, I have attached the example files to this ticket.
by , 3 years ago
Attachment: | pdb_00017fgz-extended_PDB_CCD_codes-model.cif added |
---|
by , 3 years ago
Attachment: | pdb_00017xsv-extended_PDB_CCD_codes-model.cif added |
---|
comment:6 by , 3 years ago
Resolution: | → fixed |
---|---|
Status: | accepted → closed |
There is already another ticket for lost styling I think. Can't do anything more on 8-character IDs until the RCSB provides a way to fetch them.
Well, I have added support for 8-character IDs (https://github.com/RBVI/ChimeraX/commit/91db3471cdaf07fa1f11788fc9edab06d670e9db) but am mystified as to how to test it, since there are no actual 8-character ID entries. Using the site we currently fetch from (http://files.rcsb.org/download/%s.cif) works for 4-character IDs but not the equivalent (e.g. "0000" in front) 8-character IDs, even with "pdb_" prepended. Using the DOI brings you to the entry's page -- it does not download a file.
Open to testing ideas. Or should I prod the PDB somehow?