Opened 4 years ago
Closed 3 years ago
#6067 closed defect (fixed)
Update RCSB biounit fetch to new URLs if needed
Reported by: | Tom Goddard | Owned by: | Tom Goddard |
---|---|---|---|
Priority: | moderate | Milestone: | 1.4 |
Component: | Core | Version: | |
Keywords: | Cc: | pett | |
Blocked By: | Blocking: | ||
Notify when closed: | Platform: | all | |
Project: | ChimeraX |
Description
The RCSB is updating their mmCIF assembly files and changing the ftp site for fetching them in May 2022. The assembly files will also contain multiple chains instead of multiple models.
We need to verify that the ChimeraX (and Chimera) will still work with the URL changes.
We need to make sure the NIAID scripts that use fetched biounit files still work.
Begin forwarded message:
From: Jasmine Young <jasmin@…>
Subject: pdb-l: Distributing PDBx/mmCIF Formatted Assembly Files
Date: February 2, 2022 at 6:59:30 AM PST
To: pdb-l@…
Dear PDB users,
Starting May 3, 2022, the PDB archive will distribute assembly files in PDBx/mmCIF format, allowing direct access and visualization of the curated assemblies for all PDB entries.
Currently, PDBx/mmCIF formatted assembly files are provided for structures that are non-PDB compliant, however the coordinates use model numbers to differentiate alternate symmetry copies of PDB chain IDs. This method is not ideal, nor necessary, for the current archive PDBx/mmCIF format and has lead to limited use of these files in community software tools. In response to this issue and recommendations by the wwPDB advisory committee, we are implementing updated, standardized practices for generation of assembly files for all PDB entries.
These updated PDBx/mmCIF format assembly files will have improved organization of assembly data to support usage by the community. These files will include all symmetry generated copies of each chain within a single model, with distinct chain IDs (_atom_site.auth_asym_id and _atom_site.label_asym_id) assigned to each. Generation of distinct chain IDs in assembly files are based upon the following rules:
# Chain IDs of the original chains from the atomic coordinate file will be retained (e.g., A)
# Assign unique chain ID (atom_site.label_asym_id and atom_site.auth_asym_id) for each symmetry copy within a single model. Rules of chain ID assignments:
- The applied index of the symmetry operator (pdbx_struct_oper_list.id) will be appended to the original chain ID separated by a dash (e.g., A-2, A-3, etc.)
- If there are more than one type of symmetry operators applied to generate symmetry copy, a dash sign will be used between two operators (e.g., A-12-60, A-60-88, etc.)
In addition, entity ID and chain ID mapping categories will be provided: _pdbx_entity_remapping and _pdbx_chain_remapping.
A new directory (ftp.wwpdb.org/pub/pdb/data/assemblies/mmCIF/) will be created for the distribution of these updated assembly files. The directory containing the existing assembly mmCIF files for large entries will be removed (ftp.wwpdb.org/pub/pdb/data/biounit/mmCIF/ <https://ftp.wwpdb.org/pub/pdb/data/biounit/mmCIF/>).
wwPDB asks all PDB users and software developers to review code and address any limitations related to PDB assemblies. Sample files are made available for testing purposes and to support community adoption at GitHub.com/wwpdb/assembly-mmcif-examples (https://github.com/wwpdb/assembly-mmcif-examples).
If you plan to use these assembly files for graphical viewing, check if your visualization software (e.g., PyMol, ChimeraX, etc.) supports instantiation of assemblies directly from atomic coordinate files (_struct_assembly related categories), you do so for improved efficiency.
For any further information please email info@….
--
Regards,
Jasmine
===========================================================
Jasmine Young, Ph.D.
Biocuration Team Lead
RCSB Protein Data Bank
Research Professor
Institute for Quantitative Biomedicine
Rutgers, The State University of New Jersey
174 Frelinghuysen Rd
Piscataway, NJ 08854-8087
Email:jasmine@rcsb.rutgers.edu
Phone: (848)445-0103 ext 4920
Fax: (732)445-4320
===========================================================
The archive of messages, sent to pdb-l@…, can be found at:
https://lists.wwpdb.org/empathy/list/pdb-l.lists.wwpdb.org
To subscribe via email, send a message with subject or body 'subscribe' to:
pdb-l-request@…
and follow the instructions in the newly received email.
To unsubscribe via email, send a message with subject or body 'unsubscribe' to:
pdb-l-request@…
and follow the instructions in the newly received email.
Change History (11)
comment:2 by , 4 years ago
I tried opening the new assembly file 4sbv from https://github.com/wwpdb/assembly-mmcif-examples. It displays correctly. But the chain identifiers are A,A-2,A-3,...,A-60 and these identifiers with "-" do not work in ChimeraX commands, "select /A-10" selects nothing, probably because the "-" is being interpreted as chain id A through 10. I was able to use "sel chain_id=A-10" and also "sel /A*" and select the specified chains. It would have been better if the PDB used underscores but probably too late to complain.
comment:3 by , 4 years ago
Ouch. I wonder how many other programs will break on these fun new arbitrary length chain IDs.
follow-up: 4 comment:4 by , 4 years ago
It can also happen that there are more than one set of symmetry operators which is said to produce chain identifiers like A-3-25. So they can get longish. They use the original PDB chain ids as the prefix, so we may also see for a model with chain 1, chain ids like 1-2, 1-3, 1-4, 1-27. At any rate if you know the tricks you can still get by in ChimeraX with these files.
But some things in addition to hand-type atom-specs are broken. For instance menu Select / Chains / A-10 does not select anything producing command "select #2/A-10". And other GUI generated commands are likely to fail unless we spruce up the Python code that produces atom specifier strings so it looks for "-" in chain identifiers. The mmCIF spec seems to allow not only "-" but also "/", ":", ",", "&", "|", ">", ... basically any punctuation in the chain identifier. If anyone starts using these in mmCIF files ChimeraX is going to be pretty upset about it.
https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v40.dic/Items/_atom_site.label_asym_id.html
comment:5 by , 4 years ago
Here's the PDB announcement from Feb 8, 2022 about the new assembly files.
comment:6 by , 4 years ago
I asked the PDB by email if the URLs used by ChimeraX and Chimera will continue to work and whether they will deliver different files after May 3, 2022:
From: Tom Goddard Subject: New PDB assembly files Date: April 19, 2022 at 6:15:55 PM PDT To: info@wwpdb.org I have a few questions about the new PDB assembly files that are to be released May 3, 2022 as announced by the PDB here https://www.rcsb.org/news/feature/61f839908f40f9265109d399 I develop ChimeraX which can directly fetch the assembly files and want to make sure this will continue to work after May 3. We currently fetch those files using the following URLs. https://files.rcsb.org/download/7rqe-assembly1.cif https://www.ebi.ac.uk/pdbe/static/entry/download/4sbv-assembly-1.cif.gz Will these URLs continue to work? And will they give the new files after May 3, 2022 where an assembly consists of one model with additional chains instead of the old style with multiple models? Also our older Chimera program that is still heavily used fetches PDB assemblies in PDB format using this URL ftp://fttp.wwwpdb.org/pub/pdb/data/biounit/coordinates/all/2bbv.pdb1.gz Will that URL continue to work? Will it continue to provide the old multimodel PDB format assembly files for entries where it is possible? Thanks! Tom Goddard ChimeraX developer
comment:7 by , 4 years ago
Pushed changes so that automatically generated atom specs are more bulletproof against these kinds of chain IDs. The Select menu and the chain table now generate selection commands that work.
comment:8 by , 3 years ago
PDBe reports that they the pdbe_bio fetch url will continue to work and will be updated to use the new files, but at a later date because they want to include additional metadata.
From: "David Armstrong via RT" <pdbehelp@ebi.ac.uk> Subject: [PDBe #580837] New bioassembly files Date: April 25, 2022 at 3:50:00 AM PDT To: goddard@sonic.net Reply-To: pdbehelp@ebi.ac.uk <URL: http://helpdesk.ebi.ac.uk/Ticket/Display.html?id=580837 > Dear Tom, Thank you for reaching out to us regarding the assembly files we will soon be providing through the wwPDB FTP. Firstly, I would just like to confirm that these new assembly files generated from the wwPDB are based upon the assembly files we already generate at PDBe and the ones that you are accessing. Neither of these files are multi-model - like our existing files these are single models in mmCIF format. There are examples of these new assembly files in github: https://github.com/wwpdb/assembly-mmcif-examples. In terms of us replacing our PDBe assembly files with the new wwPDB assembly files, we will do this, however not immediately. We are looking to derive some additional assembly specific data based on the wwPDB files and make it accessible via our API, so it will take some time to finalise this pipeline so that we can replace the existing PDBe assembly files. Hopefully, due to the reasoning my comments above, this should not affect your usage of these files anyway, but please do let us know if you have any specific concerns. Kind Regards, David Armstrong
comment:9 by , 3 years ago
RCSB reports that the rcsb_bio fetch url will continue to work and will switch over to delivering the new assembly files
From: Rachel Kramer Green <rachel.green@rcsb.org> Subject: Re: New PDB assembly files (help-18605) Date: April 22, 2022 at 11:59:52 AM PDT To: Tom Goddard <goddard@sonic.net> Cc: Info <info@rcsb.org> Reply-To: Info <info@rcsb.org> Dear Tom, https://files.rcsb.org/download/7rqe-assembly1.cif This URL will continue to work and will deliver assembly files in a new format. The new format will incorporate all of the symmetry generated copies within a single model assigning distinct chain IDs (_atom_site.auth_asym_id and _atom_site.label_asym_id) to each copy. https://www.ebi.ac.uk/pdbe/static/entry/download/4sbv-assembly-1.cif.gz Please contact PDBe pdbdep@ebi.ac.uk to ask them about this link. ftp://fttp.wwwpdb.org/pub/pdb/data/biounit/coordinates/all/2bbv.pdb1.gz This link or it’s content will remain unchanged. Please let us know if you have any additional questions Best regards, Rachel
comment:10 by , 3 years ago
Also the previous RCSB email says that the Chimera fetch of assemblies in PDB format will continue to provide the old assembly files.
comment:11 by , 3 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Looks like the assembly file fetch should continue to work.
Begin forwarded message:
From: Tom Goddard
Subject: Re: pdb-l: Distributing PDBx/mmCIF Formatted Assembly Files
Date: February 2, 2022 at 10:57:21 AM PST
To: Eric Pettersen
We'll have to look into this. ChimeraX fetches biounit mmcif files from the RCSB now but does not use the ftp site they mention in this email (that will be shut down). Instead we use urls like
which are not mentioned in this email. Hopefully those continue to work.
Chimera is fetch PDB format biounit files from
and the email also does not say if that is going away.