Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#7564 closed enhancement (not a bug)

Handle 5-character residue names from Protein Databank

Reported by: Tom Goddard Owned by: Greg Couch
Priority: moderate Milestone:
Component: Input/Output Version:
Keywords: Cc: Eric Pettersen
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

The PDB announced the will switch from 3 character to 5 character residue names by 2024, possibly sooner because they will run out of 3 character names.

https://www.wwpdb.org/news/news?year=2022#630fee4cebdf34532a949c34

I'm not sure if any of our code will need fixing for this. I think we handle residue names of any length. But I am not sure whether our Chemical Components Dictionary (CCD) fetching code makes any assumptions about 3 character names. Also might be worth checking of the Chimera mmCIF reader handles the 5 character residues names.

Change History (3)

comment:1 by Eric Pettersen, 3 years ago

Cc: Eric Pettersen added; Greg Couch removed
Owner: changed from Eric Pettersen to Greg Couch

Residue names are unlimited in length. Reassigning to Greg to check that CCD fetching and mmCIF reading (doesn't it use "fixed format" for speed?) are unaffected by this change.

comment:2 by Greg Couch, 3 years ago

Resolution: not a bug
Status: assignedclosed

The mmCIF reading code uses fixed width columns for speed, but the column widths are determined dynamically from how the data is presented. Residue names are the atomstruct::ResName, so no limits on size.

in reply to:  3 ; comment:3 by goddard@…, 3 years ago

I'm not sure if the PDB offers test files with 5 character residue names.  It would seem highly desirable to actually test such a file if available.
Note: See TracTickets for help on using tickets.