Opened 23 months ago
Closed 13 months ago
#10270 closed defect (fixed)
need way to set full sequence of a chain
| Reported by: | Greg Couch | Owned by: | Eric Pettersen |
|---|---|---|---|
| Priority: | high | Milestone: | |
| Component: | Structure Editing | Version: | |
| Keywords: | Cc: | Tristan Croll, Tom Goddard, matthias.vorlaender@…, guillaume.gaullier@…, Elaine Meng | |
| Blocked By: | Blocking: | ||
| Notify when closed: | Platform: | all | |
| Project: | ChimeraX |
Description
It would be useful to be able to set the correct sequence to use for a chain. That could happen because there were no SEQRES/entity_poly_seq information in the input PDB/mmCIF file, but the full sequence is known. (For example, the intermediate output from a Phenix refinement.) Then the structure could be saved with the sequence for subsequent use, either by ChimeraX or some other program.
Change History (16)
comment:1 by , 23 months ago
| Status: | assigned → accepted |
|---|
comment:2 by , 23 months ago
| Cc: | added |
|---|
comment:3 by , 23 months ago
Very much so! On Fri, Dec 1, 2023 at 9:35 PM ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu> wrote: > > > >
comment:4 by , 23 months ago
Tristan, it seems likely that this problem mentioned by Guillaume came about because he refined an AlphaFoldDB .cif model with ISOLDE and ISOLDE copied it and lost the atomic structure metadata containing the entity_poly_seq table. Not sure. But it would be good to make sure ISOLDE preserves the full sequence when writing out mmCIF from ChimeraX.
comment:5 by , 23 months ago
Just replied to that email thread. ISOLDE doesn’t make a copy of the model when the user selects it, just “transplants” it into the Clipper arrangement. If other models are merged *into* the working model, then I’m quite sure all their metadata are lost. On Fri, 1 Dec 2023 at 22:23, ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu> wrote: > > >
comment:6 by , 23 months ago
Make sense that ISOLDE is of course going to preserve the sequence info. If swapaa loses the sequence info I wonder if it should be improved to just update the sequence info.
comment:7 by , 23 months ago
I'm not sure how swapaa got dragged into this, but swapaa *does* update the sequence.
comment:8 by , 23 months ago
Tristan suggested in an email that Guillaume's written out mmCIF was missing the full sequence because of something more than deleting residues, maybe mutating a residue with swapaa or inserting a new residue. We don't know how the full sequence was lost from the mmCIF file in Guillaume's case and if we did then fixing that sequence info loss might more directly address the problem that led to this ticket. Still the ability to set the sequence as requested in this ticket would be useful.
comment:9 by , 23 months ago
It sounded like Guilaume edited the file, then wrote it out, and repeated that process. If he had used the bestGuess option the first time the mmCIF was written (and the bug was fixed then), then the sequence would have carried through.
comment:10 by , 23 months ago
Greg, you asked me a few days ago about whether these AlphaFold predictions had sequence info and I looked and saw that all the PDB prediction files had just ATOM records and no SEQRES. But then Guillaume said he actually started with an AlphaFold Database file and I guess the cleaned those up because those are in mmCIF and do have the entry_poly_seq.
However the sequence info was lost, it seems unlikely that any user is going to know to use the "bestGuess true" option when saving, and then like Guillaume they end up with a refined model with many residues deleted (not observed in the cryoEM/X-ray) and it is too late to fix except using the capability requested in this ticket of adding the full sequence back and resaving the refined model as mmCIF.
comment:11 by , 23 months ago
I wasn't clear enough in my description. While swapaa does update the sequence, using it appears to mark the sequence as non-authoritative, so after using it on any amino acid the simple command save xxx.cif #{model id} yields the warning Not saving entity_poly_seq for non-authoritative sequences. I was unaware of the bestGuess option myself - might be worth adding something like "To override this, use the option 'bestGuess true' in the save command" to the warning message?
comment:12 by , 23 months ago
In the daily build, I have renamed from_seqres to full_sequence_known in the Python layer to clarify how the variable is actually used. Retained from_seqres as deprecated for backwards compatibility. Changed swapaa to not set it to False.
comment:13 by , 22 months ago
| Cc: | added |
|---|
I think the plan is to add two commands, one to copy sequence information from one or more chains to an equal number of recipient chains, and one to copy sequence information from an sequence in a alignment to associated chains. The sequence viewer could offer some kind of GUI access to the second command. The first command would first try a dead simple matching of copied sequence to the chain's residues and if that didn't exactly match fall back to creating an under-the-hood alignment and then basically doing what the second command does. In any case, a mismatch would prevent the copy from occurring.
comment:14 by , 22 months ago
| Cc: | added |
|---|
comment:15 by , 22 months ago
Thank you for cc-ing me to this ticket.
For the record, I did indeed use swapaa during the modeling that led to my question on the email list. I forgot to mention it because at the time I was suspecting something was up with saving, not thinking the swapaa could have made the sequence behave differently.
Guillaume
________________________________
From: ChimeraX <ChimeraX-bugs-admin@cgl.ucsf.edu>
Sent: Friday, December 22, 2023 2:16:47 AM
To: pett@cgl.ucsf.edu; goddard@cgl.ucsf.edu; gregc@cgl.ucsf.edu
Cc: Guillaume Gaullier; matthias.vorlaender@imp.ac.at; meng@cgl.ucsf.edu; tcroll@altoslabs.com
Subject: Re: [ChimeraX] #10270: need way to set full sequence of a chain
#10270: need way to set full sequence of a chain
----------------------------------------+----------------------
Reporter: Greg Couch | Owner: pett
Type: defect | Status: accepted
Priority: high | Milestone:
Component: Structure Editing | Version:
Resolution: | Keywords:
Blocked By: | Blocking:
Notify when closed: | Platform: all
Project: ChimeraX |
----------------------------------------+----------------------
Changes (by Tom Goddard):
* cc: guillaume.gaullier@…, Elaine Meng (added)
--
Ticket URL: <https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/10270#comment:14>
ChimeraX <https://www.rbvi.ucsf.edu/chimerax/>
ChimeraX Issue Tracker
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert.
CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.
När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/
E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy
comment:16 by , 13 months ago
| Resolution: | → fixed |
|---|---|
| Status: | accepted → closed |
The "sequence update" command, added on Jun 13th, does the second part (transfer sequence info from an alignment sequence into an associated chain). It's not clear to me if the direct chain-to-chain transfer is really needed. I can imagine scenarios where it might be useful, but even in those it's just one extra step to create a private alignment and then use "sequence update".
If someone feels strongly that direct chain-to-chain update is really needed, I will reopen this.
CCing Tristan and Tom G. since they'd probably be interested in when this gets implemented.