[Chimera-users] R-factor
Tom Goddard
goddard at sonic.net
Thu May 12 10:29:45 PDT 2011
Hi Ryo,
Adding more parameters to your atomic model will improve the fit to
the map, but it may give an answer further from the truth. I don't have
suggestions about how to evaluate this. At a crude level if my map only
has N bits of information I probably should not fit with more than N
bits of parameters. Perhaps there is a way to evaluate that. But I
don't think it is the major source of getting a wrong model. I suspect
the main problem is that the parameters you choose (e.g. your 4 domains)
may be bad parameters -- the molecule doesn't really have the
flexibility defined by those parameters.
Tom
> Hi Tom,
>
> Thank you for your quick response.
> As you suggested, CCC is very efficient to minimize the squares of residuals.
> Simple rigid body fitting works well with the evaluation by CCC.
>
> Recent improvement of the resolution of EM map has allowed us to further refine the fitted atomic model (crystal structure) by dividing it into the several domains or flexible MD fitting.
> Now I have tried to fit the crystal structure to the EM map, but simple rigid body fitting of 1monomer does not work well. So I divided the monomer into four sub-domains which seem to fit better with the increasing cross-correlation.
> However, the four domain-fitting fit really better than rigid body monomer fitting?
> Because increasing the number of parameters must decrease the squares of residuals, the fitting with four parameters (four domain-fitting) must have the lower squares of residuals than that with one parameters (rigid body monomer fitting).
> I think, therefore, in this type of model fitting, some other criteria or correction might be needed to avoid the over-fitting of the model. (ex. Rfree in crystallography; AIC (Akaike's Information Criterion)...)
>
> Do you have any opinion about that?
>
> Ryo
>
> On 2011/05/12, at 1:39, Tom Goddard wrote:
>
>> Hi Ryo,
>>
>> Chimera does not calculate the real-space R-factor. The real space R-factor defined for crystallographic maps in
>>
>> Branden C. and Jones A., Nature 343 687-689 (1990)
>>
>> is
>>
>> RSRF = sum(|d_o - d_c|) / sum(|d_o + d_c|)
>>
>> where
>>
>> d_o is the observed (experimental) density
>> d_c is the calculated density from the atomic model.
>>
>> and the sum is over grid points in the d_c map, probably using d_o interpolated values at those exact same points. They also compute RSRF per-residue and I'm not clear what grid points they use in that case -- maybe just the atom center positions.
>>
>> This has some immediate problems when applied to EM maps and fit models. First you need the observed and calculated density maps to have the same normalization. If the experimental density values range from -5000 to 10000 and the calculated ones from 0.001 to 0.01 then obviously you get nonsense. The next problem is that the experimental density values from single-particle EM reconstructions are often negative in parts of the map. You can see from the formula above that can cause havoc. If experimental density at just one grid point is close to being the negative of the calculated density it will make a huge contribution to RSRF. X-ray maps also have many negative density values, but their magnitudes seem to be less.
>>
>> The idea behind RSRF is to judge the fit by looking at the size of difference map values d_o - d_c relative to the size of the values in the observed and calculated maps. The standard cross-correlation coefficient does something very similar and does it better I think. Here's how. Consider the sum of the squares of the residuals over all the grid points and normalize by the sums of squares of the densities in the experimental and calculated maps
>>
>> E = sum((d_o - d_c)**2) / (sqrt(sum(d_o**2)) * sqrt(sum(d_c**2)))
>>
>> This has the same problem described above that the maps may have different normalizations. So put a scale factor f in front of the calculated map d_c
>>
>> E = sum((d_o - f*d_c)**2) / (sqrt(sum(d_o**2)) * sqrt(sum((f*d_c)**2)))
>>
>> and choose the scale factor f so that E is minimized. In other words, we scale the calculated map to minimize the error between experimental and calculated maps. It is easy to show that
>>
>> f = sqrt(sum(d_o**2)) / sqrt(sum(d_c**2))
>>
>> and then
>>
>> E = 2 * ( 1 - CCC )
>>
>> where
>>
>> CCC = sum(d_o * d_c) / (sqrt(sum(d_o**2)) * sqrt(sum(d_c**2)))
>>
>> is just the normal cross-correlation coefficient (without mean values being subtracted).
>>
>> So the standard cross-correlation coefficient is a direct and sensible measure of residual error.
>>
>> If you can give me a sound reason why another measure of residual error is useful, I'll be happy to add it to Chimera.
>>
>> Tom
>>
>>
>>> Hi, Chimera staffs
>>>
>>> To evaluate the validity of fitting of the atomic model with the EM map, can I calculate the real-space R factor with chimera?
>>>
>>> Ryo
>>> _______________________________________________
>>> Chimera-users mailing list
>>> Chimera-users at cgl.ucsf.edu
>>> http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users
>>>
>>
>
> _______________________________________________
> Chimera-users mailing list
> Chimera-users at cgl.ucsf.edu
> http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users
>
More information about the Chimera-users
mailing list