[Chimera-users] Seeking guidance on Volume-volume fitting

Thu Apr 25 04:13:27 PDT 2013

Hi Tom,

Thank you for the suggestions.  I shifted the density range, and 
changed to the "overlap" metric.  This worked well, I found the best fit 
in 28/500 for 5500/5501 case and 25/200 for the 5500/2017 case.  This is 
much more in line with what I hoped for.

It used to be that almost all single particle EM maps had the 
background scattering centred at 0, but more recently a number of 
entries have the background peak away from zero.  I think you are right, 
that EMDB ought to either enforce background at zero or at least collect 
the value if it is non-zero, for non-tomogram volumes.  I will pass this 
suggestion on.  Some maps in the EMDB are masked, in those cases there 
is no background scattering.

Selecting a suitable contour level is challenging.  The recommended 
levels provided by the authors in EMDB are often useful, but at least in 
some cases clearly incorrect, to the point that values outside the 
density range have been suggested.  In theory it should be possible to 
select a level that corresponds to the expected volume of a sample.  If 
the weight and density is known, then calculating the volume is trivial. 
In practice this is not always so straightforward.  I have added a 
graph of the estimated enclosed volume as a function of density on the 
PDBe's EM analysis pages, e.g., pdbe.org/EMD-5500/analysis, which can be 
used as an additional guidance to the recommended contour levels.  The 
analysis pages also has a density distribution plot, in the same way as 
Chimera has in the Volume Viewer interface.  I think I use more bins, 
making masking and/or background scattering to stand out more.

With regards to aligning the two volumes along the principal moment of 
inertia, that would have to be a user-specified option, that should be 
used only when the two objects are known to be more or less the same 
sample.

Many Thanks,
Ingvar

On 2013-04-24 20:33, Tom Goddard wrote:
> Hi Ingvar,
> 
>   I experimented with fitting the two very similar 30S ribosomal
> subunit structures EMD 5500 and 5501 to understand why you were
> needing 5000 search positions to locate the best fit while in most
> cases I've played with in the past it takes less than 100 search
> positions.  Here's what I found out.
> 
>   The basic trouble is that the two maps have background density
> levels that are not at zero.  They are at about -4.1 and -3.9 while
> the interesting contour level is around 2.5.  Shifting the background
> level to zero and using the default "overlap" fitting metric makes fit
> search find the best fit in about 20 trial positions.  (I did the
> shifts with Chimera Volume Filter, type Scale.)  Shifts in the density
> values have a strong effect on the fitting when using the overlap or
> correlation (not about mean) metrics.  It is a bad idea to use either
> of those two metrics if the background density is not zero in both
> maps.  Tests fitting with those two methods produced horrible fits
> with hardly any overlap between maps and also seemed to have hundreds
> of different locally optimal fits.  The correlation about mean metric
> subtracts mean density values and so shifts have no effect.  But that
> metric also produces a huge number of locally optimal solutions -- the
> reason you had to try 5000 starting positions.  The reason that
> "overlap" is the default metric is that it tends to have many fewer
> local best fits.  So you can be quite far from the best solution and
> still find it as the local optimum.
> 
>   So my advice is to make sure the background density is 0, and use
> the default "overlap" fitting metric if you want the widest radius of
> convergence and fewest local best fits.  How do you make the
> background density 0?  One idea is to subtract the mean of all density
> values.  That may be reasonable if the structure occupies only a small
> part of the map box, but becomes bad if the structure occupies a large
> fraction of the box.  In the latter case trickier methods would have
> to be applied.  If you trusted the "suggested contour level" you might
> only average values below that level. The real solution here is that
> EMDB should insist that deposited single-particle maps say what the
> background level -- ie the level outside the particles, or insist it
> is 0.  If the maps can have any unspecified shift, then automated
> processing is not going to work.  The same problem existed a few years
> ago when EMDB didn't have the "suggested contour level".  It isn't
> possible to reliably guess a good contour level.
> 
>   If you are considering searches of a user supplied map against all
> EMDB maps I think you need it to be fast, and will need to sacrifice
> reliability to some extent.  So your earlier suggestion about aligning
> principle axes and optimizing that fit might be good, except for
> nearly spherical particles like viruses.  Even with viruses it would
> give a pretty random initial configuration but that might suffice
> since the fit would be likely to align symmetry axes.  The main
> problem would be that if the user supplied map is a subpart of an EMDB
> map (half a ribosome, half a virus, etc...) it won't find the good fit
> if you just align principle axes.  For that to work, the maps will
> have to represent the same object.
> 
>     Tom
> 
> 
> -------- Original Message --------
> Subject: Re: [Chimera-users] Seeking guidance on Volume-volume fitting
> From: ingvar
> To: Tom Goddard
> Date: 4/24/13 5:10 AM
>> Hi Tom,
>> 
>> Thank you for explaining the inner workings of the volume-volume 
>> fitting in Chimera.
>> 
>> The long term goal is to set up a searchable database with what 
>> volumes fit in each other, not necessarily limited to EM, and also 
>> provide a service where a user can upload a volume and see if it fits 
>> in any existing volume in the archive.
>> 
>> It sounds like I am used a good set options.  I will try some other 
>> examples to see if I have better luck in finding the best fit there, 
>> may be a set of 70 S ribosomes.
>> 
>> /Ingvar
>> 
>> 
>> On 2013-04-24 06:09, Tom Goddard wrote:
>>> Hi Ingvar,
>>> 
>>>   It sounds like you want to automate the fitting of one EMDB
>>> ribosome map to another, and probably other structures too. Probably
>>> you are aiming to have it find the best fit as often as possible
>>> without taking too long.  It is not easy to guarantee you have the
>>> best fit, even using an exhaustive search can miss it unless you 
>>> take
>>> very fine rotational and translation steps.  Here are answers to 
>>> some
>>> of your questions.
>>> 
>>>   Should one map be resampled before fitting in the other map? No, I
>>> don't see any advantage to doing that.  The fit is done on the grid
>>> points of the first map within the displayed contour level, and the
>>> second map is interpolated at those grid points.
>>> 
>>>   Could aligning principle inertia axes help get the best fit. Yes,
>>> in some cases that will work.  And if you only optimize that initial
>>> it will be very fast (less than a second typically).  But unless you
>>> search lots of other possible orientations you will miss the best 
>>> fit
>>> in some percentage of cases.  So it doesn't seem particularly useful
>>> if reliability of automated fitting is your goal.
>>> 
>>>   Does it matter if grid sizes are different?  No.  Trilinear
>>> interpolation is being used, so there is no advantage of the grid
>>> sizes or spacings matching.
>>> 
>>>   Should any correction be done for fitting different resolution 
>>> maps
>>> to each other?   I don't think there would be much advantage to say
>>> smoothing the higher resolution map to match the lower resolution 
>>> map.
>>> I don't think that will improve convergence to the best fit. And I
>>> don't think it will change the best fit -- although maybe it is
>>> theoretically possible.
>>> 
>>>   If you are using the correlation about mean metric then shifting
>>> the density values will have no effect, because the mean is 
>>> subtracted
>>> from the density of both maps.  But if you are using correlation or
>>> overlap a shift may be needed.
>>> 
>>>   I am unpleasantly surprised by your low success rate getting the
>>> best fit in 5000 or 50000 searched positions.  My experience has 
>>> been
>>> that it is found in a few hundred positions.  I will need to try 
>>> your
>>> examples.  I'll report in a later email what I find.
>>> 
>>>   Tom
>>> 
>>> 
>>> On Apr 22, 2013, at 3:37 AM, ingvar  wrote:
>>> 
>>>> Dear Chimera,
>>>> 
>>>> I am seeking guidance on what would be best practice in doing 
>>>> volume-volume fitting, and what expectations should I have on number 
>>>> of trial conformations.  For instance would it be beneficial to 
>>>> resample one of the volumes to the same grid as the other volume.  
>>>> What about volumes with different resolutions, do this require any 
>>>> special handling.
>>>> In the case below it would probably be more efficient to just align 
>>>> the principal moments of inertia of the envelopes above the contour 
>>>> level and then do a search from there, though this will only work 
>>>> well if the volumes are very similar.
>>>> 
>>>> In an initial study I used 3 EM volumes from EMDB, (EMD-5500, 
>>>> EMD-5501, and EMD-2017).  They are all 30S E. coli ribosomes of 
>>>> similar resolution (12.9, 14.0, 13.5 Å).  I thought that this would 
>>>> be a relatively easy starting point, with the intent to then go on 
>>>> to fit 30S subunit in some of the many E. coli 70S EM volumes that 
>>>> are available.
>>>> The volumes EMD-5500 and EMD-5501 are in similar orientation, while 
>>>> EMD-2017 is in a completely different orientation.
>>>> The grids are similar but not identical, 125^3 vs 128^3.  I 
>>>> adjusted the contour levels from the EMDB recommended values to make 
>>>> the enclosed volumes more similar in size.
>>>> 
>>>> Contour levels used:
>>>> EMD-5500 -2.8 -> -2.5
>>>> EMD-5501 -2.8
>>>> EMD-2017 39 -> 32
>>>> 
>>>> I am using scripts like the one shown here:
>>>> 
>>>> from chimera import runCommand as rc
>>>> from chimera.tkgui import saveReplyLog
>>>> rc("open data/EMD-5500.map")
>>>> rc("volume #0 level -2.5 transparency 0.5")
>>>> rc("open data/EMD-5501.map")
>>>> rc("volume #1 level -2.8 transparency 0.5")
>>>> rc("fitmap #1 #0 search 50000 metric cam envelope true inside 0.2")
>>>> saveReplyLog(r'/Users/ingvar/chimera/log/fit5500_5501_loc.txt')
>>>> 
>>>> It seems correlation about mean is the best metric for 
>>>> volume-volume fitting, and that it is best to only use points inside 
>>>> the envelope.
>>>> 
>>>> The maps EMD-5500, and EMD-5501 have the unusual feature that the 
>>>> density range is shifted downwards so that the average is well below 
>>>> 0.  I was concerned about that and moved the density range, but with 
>>>> the fitmap parameters above that did not seem to have any 
>>>> significant impact (with other sets of parameters the density range 
>>>> appears to be an issue).
>>>> 
>>>> In this case, it is relatively easy to see when you have found the 
>>>> "good" fit, all other fits have clearly worse statistics.
>>>> 
>>>> What I was surprised about was the number of trial conformations 
>>>> needed to be reasonable certain to find the "good" fit.
>>>> 
>>>> 5500 in 5501 7 times in 5000
>>>> 5501 in 5500 2 times in 5000
>>>> 5500 in 2017 0 times in 5000
>>>> 5500 in 2017 8 times in 50000
>>>> 2017 in 5501 0 times in 5000
>>>> 
>>>> Many Thanks,
>>>> Ingvar Lagerstedt
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Chimera-users mailing list
>>>> Chimera-users at cgl.ucsf.edu
>>>> http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users
>>>> 
>> _______________________________________________
>> Chimera-users mailing list
>> Chimera-users at cgl.ucsf.edu
>> http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users
>>