Opened 4 years ago

Closed 4 years ago

#5305 closed defect (nonchimerax)

loop modeling dimer, one loop goes crazy

Reported by: Elaine Meng Owned by: pett
Priority: moderate Milestone:
Component: Sequence Version:
Keywords: Cc:
Blocked By: Blocking:
Notify when closed: Platform: all
Project: ChimeraX

Description

(1) open 1qo7

(2) click in Log to open corresponding UniProt sequence (Q9UR30_ASPNG), both chains of dimer automatically associate.

(3) tried Model Loops, internal missing only, otherwise default settings

One chain's loop modeling looks fine, the other has a persistent long bond in all 5 copies. Session attached.

Attachments (3)

dimerloops.cxs (1.7 MB ) - added by Elaine Meng 4 years ago.
flex5.cxs (1.5 MB ) - added by Elaine Meng 4 years ago.
self-clashing models created with 5 flexible residues adjacent to internal missing segment, standard protocol
seqresflex5.cxs (1.5 MB ) - added by Elaine Meng 4 years ago.

Change History (9)

by Elaine Meng, 4 years ago

Attachment: dimerloops.cxs added

in reply to:  2 comment:1 by Elaine Meng, 4 years ago

It's weird, the scores remain very good, and the loop of one chain is modeled fine while the other one always has a crazy long bond.

I tried using DOPE protocol, which did not help with 1 flexible res, and also had similarly poor results with standard protocol and 0-3 flexible res.

However, 4 or 5 flexible residues (standard protocol) gave loops for both chains that didn't contain a crazy long bond.  However, I noticed that the loops do not avoid the rest of the protein.  Is that expected?  They still have good scores, but I had a faint memory that the scores were based only on the modeled part so it wouldn't reflect self-clashes.  I'll attach another session from using standard protocol and 5 flexible residues.

by Elaine Meng, 4 years ago

Attachment: flex5.cxs added

self-clashing models created with 5 flexible residues adjacent to internal missing segment, standard protocol

in reply to:  4 ; comment:2 by Elaine Meng, 4 years ago

Another experiment is making Model Loops use the seqres sequence (opened by clicking in the second "Description" column of the table shown when 1qo7 is opened, epoxide hydrolase) instead of the UniProt one (from clicking in the third column of the table).  In that case I got good-looking loops even with adjacentFlexible=1!  Both sequences automatically associate with both chains, though... it's not like I had to manually associate with the uniprot one, like for that other protein I was complaining about earlier.

in reply to:  5 comment:3 by Elaine Meng, 4 years ago

Well, not so fast... it didn't make long bonds using the seqres sequence, but I still got self-intersections.  Again, there were very few clashes in one chain but then the other chain was crappy.  Seems like the refinement is only paying attention to one of the modeled loops and skipping the other one after building a (bad) starting model.

In the attached seqresflex5.cxs, model #2.3 chain B has only 2 self-clashes, whereas its chain A has 96 self-clashes. E.g. count the chain B self-clashes with:

clashes #2.3/b restrict both log true

by Elaine Meng, 4 years ago

Attachment: seqresflex5.cxs added

comment:4 by pett, 4 years ago

Status: assignedaccepted

comment:5 by pett, 4 years ago

The long-bond problem is fixed. The self-clash problem is not.

comment:6 by pett, 4 years ago

Resolution: nonchimerax
Status: acceptedclosed

Modeling long loops is unreliable, as per Ben:

On Oct 4, 2021, at 11:00 AM, Modeller Caretaker <modeller-care@…> wrote:

On 9/29/21 4:41 PM, Eric Pettersen wrote:

Modeling missing structure -- allowing perhaps one residue on each side of the missing area to remodeled -- works fairly well. However, if several more residues to each side are allowed to be remodeled (in the example I'm providing, 6) then there are significant self-clashes in the resulting model(s). I am wondering if there is something wrong with the input I am providing, or is this an expected behavior, or is it a bug?

The first two. Looks like you are trying to model 2 20-residue loops, which is asking a lot. Back in 2000 the effective limit was 12 residues or so. You can probably manage a lot more now with modern hardware but the search space does expand rather rapidly as you extend the loops, so 40 residues is probably still out of reach. And even if you could manage it you'd have to build an ensemble of many, many models to hope to get anything approximating the "real" solution (you are essentially doing a mini-folding problem after all; even in the era of AlphaFold modeling without a template is tricky). Looks like you're only building a single model.

See also Fig.8 at https://salilab.org/pdf/Fiser_ProteinSci_2000.pdf

Note: See TracTickets for help on using tickets.