[Chimera-users] Seeking guidance on Volume-volume fitting

Wed Apr 24 12:33:03 PDT 2013

Hi Ingvar,

   I experimented with fitting the two very similar 30S ribosomal 
subunit structures EMD 5500 and 5501 to understand why you were needing 
5000 search positions to locate the best fit while in most cases I've 
played with in the past it takes less than 100 search positions.  Here's 
what I found out.

   The basic trouble is that the two maps have background density levels 
that are not at zero.  They are at about -4.1 and -3.9 while the 
interesting contour level is around 2.5.  Shifting the background level 
to zero and using the default "overlap" fitting metric makes fit search 
find the best fit in about 20 trial positions.  (I did the shifts with 
Chimera Volume Filter, type Scale.)  Shifts in the density values have a 
strong effect on the fitting when using the overlap or correlation (not 
about mean) metrics.  It is a bad idea to use either of those two 
metrics if the background density is not zero in both maps.  Tests 
fitting with those two methods produced horrible fits with hardly any 
overlap between maps and also seemed to have hundreds of different 
locally optimal fits.  The correlation about mean metric subtracts mean 
density values and so shifts have no effect.  But that metric also 
produces a huge number of locally optimal solutions -- the reason you 
had to try 5000 starting positions.  The reason that "overlap" is the 
default metric is that it tends to have many fewer local best fits.  So 
you can be quite far from the best solution and still find it as the 
local optimum.

   So my advice is to make sure the background density is 0, and use the 
default "overlap" fitting metric if you want the widest radius of 
convergence and fewest local best fits.  How do you make the background 
density 0?  One idea is to subtract the mean of all density values.  
That may be reasonable if the structure occupies only a small part of 
the map box, but becomes bad if the structure occupies a large fraction 
of the box.  In the latter case trickier methods would have to be 
applied.  If you trusted the "suggested contour level" you might only 
average values below that level. The real solution here is that EMDB 
should insist that deposited single-particle maps say what the 
background level -- ie the level outside the particles, or insist it is 
0.  If the maps can have any unspecified shift, then automated 
processing is not going to work.  The same problem existed a few years 
ago when EMDB didn't have the "suggested contour level".  It isn't 
possible to reliably guess a good contour level.

   If you are considering searches of a user supplied map against all 
EMDB maps I think you need it to be fast, and will need to sacrifice 
reliability to some extent.  So your earlier suggestion about aligning 
principle axes and optimizing that fit might be good, except for nearly 
spherical particles like viruses.  Even with viruses it would give a 
pretty random initial configuration but that might suffice since the fit 
would be likely to align symmetry axes.  The main problem would be that 
if the user supplied map is a subpart of an EMDB map (half a ribosome, 
half a virus, etc...) it won't find the good fit if you just align 
principle axes.  For that to work, the maps will have to represent the 
same object.

     Tom

-------- Original Message --------
Subject: Re: [Chimera-users] Seeking guidance on Volume-volume fitting
From: ingvar
To: Tom Goddard
Date: 4/24/13 5:10 AM
> Hi Tom,
>
> Thank you for explaining the inner workings of the volume-volume 
> fitting in Chimera.
>
> The long term goal is to set up a searchable database with what 
> volumes fit in each other, not necessarily limited to EM, and also 
> provide a service where a user can upload a volume and see if it fits 
> in any existing volume in the archive.
>
> It sounds like I am used a good set options.  I will try some other 
> examples to see if I have better luck in finding the best fit there, 
> may be a set of 70 S ribosomes.
>
> /Ingvar
>
>
> On 2013-04-24 06:09, Tom Goddard wrote:
>> Hi Ingvar,
>>
>>   It sounds like you want to automate the fitting of one EMDB
>> ribosome map to another, and probably other structures too. Probably
>> you are aiming to have it find the best fit as often as possible
>> without taking too long.  It is not easy to guarantee you have the
>> best fit, even using an exhaustive search can miss it unless you take
>> very fine rotational and translation steps.  Here are answers to some
>> of your questions.
>>
>>   Should one map be resampled before fitting in the other map? No, I
>> don't see any advantage to doing that.  The fit is done on the grid
>> points of the first map within the displayed contour level, and the
>> second map is interpolated at those grid points.
>>
>>   Could aligning principle inertia axes help get the best fit. Yes,
>> in some cases that will work.  And if you only optimize that initial
>> it will be very fast (less than a second typically).  But unless you
>> search lots of other possible orientations you will miss the best fit
>> in some percentage of cases.  So it doesn't seem particularly useful
>> if reliability of automated fitting is your goal.
>>
>>   Does it matter if grid sizes are different?  No.  Trilinear
>> interpolation is being used, so there is no advantage of the grid
>> sizes or spacings matching.
>>
>>   Should any correction be done for fitting different resolution maps
>> to each other?   I don't think there would be much advantage to say
>> smoothing the higher resolution map to match the lower resolution map.
>> I don't think that will improve convergence to the best fit. And I
>> don't think it will change the best fit -- although maybe it is
>> theoretically possible.
>>
>>   If you are using the correlation about mean metric then shifting
>> the density values will have no effect, because the mean is subtracted
>> from the density of both maps.  But if you are using correlation or
>> overlap a shift may be needed.
>>
>>   I am unpleasantly surprised by your low success rate getting the
>> best fit in 5000 or 50000 searched positions.  My experience has been
>> that it is found in a few hundred positions.  I will need to try your
>> examples.  I'll report in a later email what I find.
>>
>>   Tom
>>
>>
>> On Apr 22, 2013, at 3:37 AM, ingvar  wrote:
>>
>>> Dear Chimera,
>>>
>>> I am seeking guidance on what would be best practice in doing 
>>> volume-volume fitting, and what expectations should I have on number 
>>> of trial conformations.  For instance would it be beneficial to 
>>> resample one of the volumes to the same grid as the other volume.  
>>> What about volumes with different resolutions, do this require any 
>>> special handling.
>>> In the case below it would probably be more efficient to just align 
>>> the principal moments of inertia of the envelopes above the contour 
>>> level and then do a search from there, though this will only work 
>>> well if the volumes are very similar.
>>>
>>> In an initial study I used 3 EM volumes from EMDB, (EMD-5500, 
>>> EMD-5501, and EMD-2017).  They are all 30S E. coli ribosomes of 
>>> similar resolution (12.9, 14.0, 13.5 Å).  I thought that this would 
>>> be a relatively easy starting point, with the intent to then go on 
>>> to fit 30S subunit in some of the many E. coli 70S EM volumes that 
>>> are available.
>>> The volumes EMD-5500 and EMD-5501 are in similar orientation, while 
>>> EMD-2017 is in a completely different orientation.
>>> The grids are similar but not identical, 125^3 vs 128^3.  I adjusted 
>>> the contour levels from the EMDB recommended values to make the 
>>> enclosed volumes more similar in size.
>>>
>>> Contour levels used:
>>> EMD-5500 -2.8 -> -2.5
>>> EMD-5501 -2.8
>>> EMD-2017 39 -> 32
>>>
>>> I am using scripts like the one shown here:
>>>
>>> from chimera import runCommand as rc
>>> from chimera.tkgui import saveReplyLog
>>> rc("open data/EMD-5500.map")
>>> rc("volume #0 level -2.5 transparency 0.5")
>>> rc("open data/EMD-5501.map")
>>> rc("volume #1 level -2.8 transparency 0.5")
>>> rc("fitmap #1 #0 search 50000 metric cam envelope true inside 0.2")
>>> saveReplyLog(r'/Users/ingvar/chimera/log/fit5500_5501_loc.txt')
>>>
>>> It seems correlation about mean is the best metric for volume-volume 
>>> fitting, and that it is best to only use points inside the envelope.
>>>
>>> The maps EMD-5500, and EMD-5501 have the unusual feature that the 
>>> density range is shifted downwards so that the average is well below 
>>> 0.  I was concerned about that and moved the density range, but with 
>>> the fitmap parameters above that did not seem to have any 
>>> significant impact (with other sets of parameters the density range 
>>> appears to be an issue).
>>>
>>> In this case, it is relatively easy to see when you have found the 
>>> "good" fit, all other fits have clearly worse statistics.
>>>
>>> What I was surprised about was the number of trial conformations 
>>> needed to be reasonable certain to find the "good" fit.
>>>
>>> 5500 in 5501 7 times in 5000
>>> 5501 in 5500 2 times in 5000
>>> 5500 in 2017 0 times in 5000
>>> 5500 in 2017 8 times in 50000
>>> 2017 in 5501 0 times in 5000
>>>
>>> Many Thanks,
>>> Ingvar Lagerstedt
>>>
>>>
>>>
>>> _______________________________________________
>>> Chimera-users mailing list
>>> Chimera-users at cgl.ucsf.edu
>>> http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users
>>>
> _______________________________________________
> Chimera-users mailing list
> Chimera-users at cgl.ucsf.edu
> http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users
>