Opened 20 months ago

Last modified 19 months ago

#14713 accepted defect

DSSP output in ChimeraX

Reported by: r.joosten@… Owned by: pett
Priority: normal Milestone:
Component: Structure Analysis Version:
Keywords: Cc: Elaine Meng, Tom Goddard, m.hekkelman@…, a.perrakis@…, Greg Couch
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

Dear ChimeraX developers,

We are the current developers of DSSP and during a group meeting the 
DSSP implementation in ChimeraX came up.

 From the documentation I gather that you are using a reimplementation 
of the 1983 algorithm. There have been a number of updates to that 
algorithm. For instance there was an update for Alpha-bulges and more 
recently we added detection for poly-proline type II helices (a.k.a. 
kappa-helices).

To incorporate the latter, we have made backward compatible changes to 
the DSSP format (see https://pdb-redo.eu/dssp/about#DSSP). At the same 
time we also added direct annotation of mmCIF files (see 
https://github.com/PDB-REDO/dssp/blob/trunk/libdssp/mmcif_pdbx/dssp-extension.dic 
for the dictionary extension). The current version of DSSP is available 
as stand-alone executable, library code and an API (see 
https://pdb-redo.eu/dssp/download). The code is under a BSD 2-clause 
license.

Would you be interested in incorporating the current version of DSSP in 
ChimeraX, reading the mmCIF output and/or executing the algorthm? We 
would be happy to discuss by mail or in a short teleconference.


On a related note, we also develop the PDB-REDO databank and 
webserver/service. It would be great if PDB-REDO entries (i.e. rerefined 
and rebuilt versions of PDB entries, frequently better models than the 
original; see pdb-redo.eu) could be loaded directly into ChimeraX 
equivalent to how this is done for PDB and EMDB entries.

We host both models and map coefficients in MTZ format. For example, 
PDB(-REDO) entry '1cbs' is hosted as 
https://pdb-redo.eu/db/1cbs/1cbs_final.cif for the model and 
https://pdb-redo.eu/db/1cbs/1cbs_final.mtz for the electron density maps 
(normal, difference and, if availble, anomalous).

Similar options of fetching PDB-REDO data are available in Coot, 
Moorhen, YASARA and CCP4mg and we believe this could also be very useful 
for ChimeraX users.

Best wishes,
Robbie Joosten


Change History (8)

comment:1 by pett, 20 months ago

Cc: Elaine Meng Tom Goddard m.hekkelman@… a.perrakis@… added
Component: UnassignedStructure Analysis
Owner: set to pett
Platform: all
Project: ChimeraX
Status: newaccepted

Dear Robbie,

Yes, you are correct that we are using a reimplementation of the 1983 algorithm (namely https://github.com/RBVI/ChimeraX/blob/develop/src/bundles/atomic_lib/atomic_cpp/atomstruct_cpp/CompSS.cpp). Though ChimeraX is capable of logging a detailed account of the helix and strand positions/types the algorithm finds, internally ChimeraX only keeps track of which residues are helix/strand/coil (since that's all it needs for rendering) and not the types of helix/strand/coil.
So the improvements you have made for finding additional secondary structure elements (alpha bulges; kappa helices) would be of interest to us, either through incorporation/translation into our own code, or by using your library implementation [FYI, the API link you sent (https://pdb-redo.eu/dssp/download) produces a "Not found" if you click on the "DSSP API" link on that page]. An issue with the latter approach is that we are currently using C++11 compilers, not C++17. Either way, once incorporated we would add your NAR paper to the list of publications to cite for DSSP.
I don't know if this covers everything you wanted to suggest re DSSP in ChimeraX. If not, don't hesitate to elaborate.
We have had one request for fetching PDB-REDO structures/maps, and I don't believe it would be terribly difficult to add. I could probably add it in the not too distant future. If you and/or your collaborators would like to be added to the ticket we have open for that request, so that you will updated as progress is made, let me know.

Sincerely, Eric

Eric Pettersen
UCSF Computer Graphics Lab

comment:2 by r.joosten@…, 19 months ago

Dear Eric,

Thanks for picking this up. We will have a look at the API 
documentation. The API works (the form on the main page uses it) but, 
yeah, we for got to document it properly. Anyway, I'm happy to hear that 
you are interested in one way or another incorporating the current DSSP. 
We are happy to help if needed.
With respect to supporting PDB-REDO, please add me to the ticket.

Best wishes,
Robbie

On 08/03/2024 01:38, ChimeraX wrote:

comment:3 by pett, 19 months ago

Cc: Greg Couch added

Hi Robbie,

I added you to the PDB-REDO ticket. One thing I wanted to mention is that ChimeraX uses some highly specialized code tuned to the exact format used by the PDB for their mmCIF files to read them as fast as possible. If it encounters files that don't match that format, it can still read them but not as fast and it issues a warning to the ChimeraX log.
The PDB-REDO mmCIFs don't quite match the PDB format because in their atom information the numerical fields are right justified instead of left justified. So right now ChimeraX issues a warning when reading them and uses the slightly slower parsing. I don't know if you are interested in changing this in your files or not. If you are, I would just wait for the REDO files to get updated. Otherwise, I can suppress the warning so that users don't see it all the time...

--Eric

comment:4 by Greg Couch, 19 months ago

Slightly slower means up to 3X slower.

comment:5 by pett, 19 months ago

Although 3x is only the parse time -- the time to create the molecular data and draw the structure is unaffected, so the overall time from opening the file to displaying the structure is slowed, but not by 3x.

comment:6 by r.joosten@…, 19 months ago

Hi Eric,

I didn't realise an mmCIF parser could be sensitive to that. I don't think we ever benchmarked the performance of libcifpp (our cif library) on different formatting, mostly because the (mm)CIF format doesn't prescribe any particular white-space organisation. Now I'm interested in how you do with files from Phenix that use a single space between fields and have no column formatting. 
Now since right justification is easiest to read (I'm afraid that I still have to do that more often than one would like), I don't think we are going to change our formatting anytime soon. I think that removing the warning makes sense. 
That said, if you find errors (as in dictionary non-compliance) in our mmCIF files, please scream bloody murder and tell us.

Cheers,
Robbie




comment:7 by Greg Couch, 19 months ago

See https://readcif.readthedocs.io/en/latest/compare.html for 
benchmarking results.  In the files from the PDB, the columns are 
left-justified.  That is a requirement for reliably figuring out the 
columns offsets from the first row of a table.  For PDB-REDO, if you 
leave out the audit_conform table from the generated mmCIF file, then it 
wouldn't look it it comes from the PDB, and ChimeraX would automatically 
use the slower parsing.

     -- Greg

On 3/20/2024 11:59 AM, r.joosten@nki.nl wrote:
>
>
>
>
>
>> -----Original Message-----
>> From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
>> Sent: Wednesday, March 20, 2024 17:34
>> To: pett@cgl.ucsf.edu; r.joosten@nki.nl
>> Cc: a.perrakis@nki.nl; goddard@cgl.ucsf.edu; gregc@cgl.ucsf.edu;
>> m.hekkelman@nki.nl; meng@cgl.ucsf.edu
>> Subject: Re: [ChimeraX] #14713: DSSP output in ChimeraX
>>
>> LET OP: Deze e-mail is afkomstig van buiten de organisatie. Open alleen links of
>> bijlagen als je de afzender kent en weet dat de inhoud veilig is.
>> CAUTION: This email originated from outside of the organization. Do not click
>> links or open attachments unless you recognize the sender and know the
>> content is safe.
>>
>> #14713: DSSP output in ChimeraX
>> -----------------------------------------+----------------------
>>            Reporter:  r.joosten@\u2026         |      Owner:  pett
>>                Type:  defect              |     Status:  accepted
>>            Priority:  normal              |  Milestone:
>>           Component:  Structure Analysis  |    Version:
>>          Resolution:                      |   Keywords:
>>          Blocked By:                      |   Blocking:
>> Notify when closed:                      |   Platform:  all
>>             Project:  ChimeraX            |
>> -----------------------------------------+----------------------
>> Changes (by pett):
>>
>>   * cc: Greg Couch (added)
>>
>> Comment:
>>
>>   Hi Robbie,
>>           I added you to the PDB-REDO ticket.  One thing I wanted to mention
>>   is that ChimeraX uses some highly specialized code tuned to the exact
>>   format used by the PDB for  their mmCIF files to read them as fast as
>>   possible.  If it encounters files that don't match that format, it can
>>   still read them but not as fast and it issues a warning to the ChimeraX
>>   log.
>>           The PDB-REDO mmCIFs don't quite match the PDB format because in
>>   their atom information the numerical fields are right justified instead of
>>   left justified.  So right now ChimeraX issues a warning when reading them
>>   and uses the slightly slower parsing.  I don't know if you are interested
>>   in changing this in your files or not.  If you are, I would just wait for
>>   the REDO files to get updated.  Otherwise, I can suppress the warning so
>>   that users don't see it all the time...
>>
>>   --Eric
>> --
>> Ticket URL:
>> <https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/14713#comment:3>
>> ChimeraX <https://www.rbvi.ucsf.edu/chimerax/>
>> ChimeraX Issue Tracker
>

comment:8 by r.joosten@…, 19 months ago

Hi Greg,

Thank you for the link. The stylized read is a nice performance trick if you can rely on consistent formatting for an entire block. I tried that for  reading reflection data from the PDB (back in the day) and found out that I couldn't rely on this, particularly when you also have line wrapping. But nowadays PDB entries are very consistent.

The audit_conform block describes the dictionary (version) and is important when you hold on to files for a longer time and want to make data as FAIR as you can. It is not PDB specific per se, but we seem to be one of the few people writing these records. We cannot drop these, but I'll try to think of another record that distinguishes PDB from PDB-REDO data.

Cheers,
Robbie

On 20 Mar 2024 20:15, Greg Couch <gregc@cgl.ucsf.edu> wrote:
LET OP: Deze e-mail is afkomstig van buiten de organisatie. Open alleen links of bijlagen als je de afzender kent en weet dat de inhoud veilig is.
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

See https://readcif.readthedocs.io/en/latest/compare.html for
benchmarking results.  In the files from the PDB, the columns are
left-justified.  That is a requirement for reliably figuring out the
columns offsets from the first row of a table.  For PDB-REDO, if you
leave out the audit_conform table from the generated mmCIF file, then it
wouldn't look it it comes from the PDB, and ChimeraX would automatically
use the slower parsing.

     -- Greg

On 3/20/2024 11:59 AM, r.joosten@nki.nl wrote:
>
>
>
>
>
>> -----Original Message-----
>> From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
>> Sent: Wednesday, March 20, 2024 17:34
>> To: pett@cgl.ucsf.edu; r.joosten@nki.nl
>> Cc: a.perrakis@nki.nl; goddard@cgl.ucsf.edu; gregc@cgl.ucsf.edu;
>> m.hekkelman@nki.nl; meng@cgl.ucsf.edu
>> Subject: Re: [ChimeraX] #14713: DSSP output in ChimeraX
>>
>> LET OP: Deze e-mail is afkomstig van buiten de organisatie. Open alleen links of
>> bijlagen als je de afzender kent en weet dat de inhoud veilig is.
>> CAUTION: This email originated from outside of the organization. Do not click
>> links or open attachments unless you recognize the sender and know the
>> content is safe.
>>
>> #14713: DSSP output in ChimeraX
>> -----------------------------------------+----------------------
>>            Reporter:  r.joosten@…         |      Owner:  pett
>>                Type:  defect              |     Status:  accepted
>>            Priority:  normal              |  Milestone:
>>           Component:  Structure Analysis  |    Version:
>>          Resolution:                      |   Keywords:
>>          Blocked By:                      |   Blocking:
>> Notify when closed:                      |   Platform:  all
>>             Project:  ChimeraX            |
>> -----------------------------------------+----------------------
>> Changes (by pett):
>>
>>   * cc: Greg Couch (added)
>>
>> Comment:
>>
>>   Hi Robbie,
>>           I added you to the PDB-REDO ticket.  One thing I wanted to mention
>>   is that ChimeraX uses some highly specialized code tuned to the exact
>>   format used by the PDB for  their mmCIF files to read them as fast as
>>   possible.  If it encounters files that don't match that format, it can
>>   still read them but not as fast and it issues a warning to the ChimeraX
>>   log.
>>           The PDB-REDO mmCIFs don't quite match the PDB format because in
>>   their atom information the numerical fields are right justified instead of
>>   left justified.  So right now ChimeraX issues a warning when reading them
>>   and uses the slightly slower parsing.  I don't know if you are interested
>>   in changing this in your files or not.  If you are, I would just wait for
>>   the REDO files to get updated.  Otherwise, I can suppress the warning so
>>   that users don't see it all the time...
>>
>>   --Eric
>> --
>> Ticket URL:
>> <https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/14713#comment:3>
>> ChimeraX <https://www.rbvi.ucsf.edu/chimerax/>
>> ChimeraX Issue Tracker
>
Note: See TracTickets for help on using tickets.