[Chimera-users] R-factor

Thu May 12 18:19:09 PDT 2011

Hi Tom,

Thank you for your response.
I'm sorry, I cannot get your true meaning, but If my interpretation is correct, you disallow the multi-domain fitting or flexible fitting of atomic models to the EM maps?

In my case, two crystal structures were already reported and inter-domain movement between 4 sub-domains were truly observed.  However, since both crystal structures could not be fitted well into my EM map, I divided them into 4 sub-domains which separately, rigidly fitted to my EM map and finally  got a good fit with my EM map (fit in map tool with CCC).  I fitted two type of  crystal structures independently and got same results, suggesting that my model might be close to the truth. In fact, each density contributing to 15 helices which can be clearly, separately observed in my EM map fits well with the helices of atomic model.  

In your words, my EM map clearly separates the densities of 4 sub-domains so that my map has at least 4 bits of information.  Furthermore, since my EM map clearly separates 15 densities of helices, my map has 15 bits of information in this term.  Thus, we can choice the numbers of parameters from 1 to 15.  1 means rigid body fit of atomic model itself.  15 means 15 helices divided from atomic model separately, rigidly  fit into EM map.  My question is the fitting of models from 1 to 15 parameters can be simply evaluated by squares of residuals or CCC because increasing parameters always decreases squares of residuals?  

Ryo

On 2011/05/13, at 2:29, Tom Goddard wrote:

> Hi Ryo,
> 
>  Adding more parameters to your atomic model will improve the fit to the map, but it may give an answer further from the truth.  I don't have suggestions about how to evaluate this.  At a crude level if my map only has N bits of information I probably should not fit with more than N bits of parameters.  Perhaps there is a way to evaluate that.  But I don't think it is the major source of getting a wrong model.  I suspect the main problem is that the parameters you choose (e.g. your 4 domains) may be bad parameters -- the molecule doesn't really have the flexibility defined by those parameters.
> 
>    Tom
> 
>> Hi Tom,
>> 
>> Thank you for your quick response.
>> As you suggested, CCC is very efficient to minimize the squares of residuals.
>> Simple rigid body fitting works well with the evaluation by CCC.
>> 
>> Recent improvement of the resolution of EM map has allowed us to further refine the fitted atomic model (crystal structure) by dividing it into the several domains or flexible MD fitting.
>> Now I have tried to fit the crystal structure to the EM map, but simple rigid body fitting of 1monomer does not work well.  So I divided the monomer into four sub-domains which seem to fit better with the increasing cross-correlation.
>> However, the four domain-fitting fit really better than rigid body monomer fitting?
>> Because increasing the number of parameters must decrease the squares of residuals, the fitting with four parameters (four domain-fitting) must have the lower squares of residuals than that with one parameters (rigid body monomer fitting).
>> I think, therefore, in this type of model fitting, some other criteria or correction might be needed to avoid the over-fitting of the model. (ex. Rfree in crystallography; AIC (Akaike's Information Criterion)...)
>> 
>> Do you have any opinion about that?
>> 
>> Ryo
>> 
>> On 2011/05/12, at 1:39, Tom Goddard wrote:
>> 
>>> Hi Ryo,
>>> 
>>>  Chimera does not calculate the real-space R-factor.  The real space R-factor defined for crystallographic maps in
>>> 
>>>    Branden C. and Jones A., Nature 343 687-689 (1990)
>>> 
>>> is
>>> 
>>>    RSRF = sum(|d_o - d_c|) / sum(|d_o + d_c|)
>>> 
>>> where
>>> 
>>>    d_o is the observed (experimental) density
>>>    d_c is the calculated density from the atomic model.
>>> 
>>> and the sum is over grid points in the d_c map, probably using d_o interpolated values at those exact same points.  They also compute RSRF per-residue and I'm not clear what grid points they use in that case -- maybe just the atom center positions.
>>> 
>>>  This has some immediate problems when applied to EM maps and fit models.  First you need the observed and calculated density maps to have the same normalization.  If the experimental density values range from -5000 to 10000 and the calculated ones from 0.001 to 0.01 then obviously you get nonsense.  The next problem is that the experimental density values from single-particle EM reconstructions are often negative in parts of the map.  You can see from the formula above that can cause havoc.  If experimental density at just one grid point is close to being the negative of the calculated density it will make a huge contribution to RSRF.  X-ray maps also have many negative density values, but their magnitudes seem to be less.
>>> 
>>>  The idea behind RSRF is to judge the fit by looking at the size of difference map values d_o - d_c relative to the size of the values in the observed and calculated maps.  The standard cross-correlation coefficient does something very similar and does it better I think.  Here's how.  Consider the sum of the squares of the residuals over all the grid points and normalize by the sums of squares of the densities in the experimental and calculated maps
>>> 
>>>    E = sum((d_o - d_c)**2) / (sqrt(sum(d_o**2)) * sqrt(sum(d_c**2)))
>>> 
>>> This has the same problem described above that the maps may have different normalizations.  So put a scale factor f in front of the calculated map d_c
>>> 
>>>    E = sum((d_o - f*d_c)**2) / (sqrt(sum(d_o**2)) * sqrt(sum((f*d_c)**2)))
>>> 
>>> and choose the scale factor f so that E is minimized.  In other words, we scale the calculated map to minimize the error between experimental and calculated maps.  It is easy to show that
>>> 
>>>    f = sqrt(sum(d_o**2)) / sqrt(sum(d_c**2))
>>> 
>>> and then
>>> 
>>>    E = 2 * ( 1 - CCC )
>>> 
>>> where
>>> 
>>>    CCC = sum(d_o * d_c) / (sqrt(sum(d_o**2)) * sqrt(sum(d_c**2)))
>>> 
>>> is just the normal cross-correlation coefficient (without mean values being subtracted).
>>> 
>>>  So the standard cross-correlation coefficient is a direct and sensible measure of residual error.
>>> 
>>>  If you can give me a sound reason why another measure of residual error is useful, I'll be happy to add it to Chimera.
>>> 
>>>    Tom
>>> 
>>> 
>>>> Hi, Chimera staffs
>>>> 
>>>> To evaluate the validity of fitting of the atomic model with the EM map, can I calculate the real-space R factor with chimera?
>>>> 
>>>> Ryo
>>>> _______________________________________________
>>>> Chimera-users mailing list
>>>> Chimera-users at cgl.ucsf.edu
>>>> http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users
>>>> 
>>> 
>> 
>> _______________________________________________
>> Chimera-users mailing list
>> Chimera-users at cgl.ucsf.edu
>> http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users
>> 
> 
>