[Chimera-users] R-factor

Wed May 11 09:39:21 PDT 2011

Hi Ryo,

   Chimera does not calculate the real-space R-factor.  The real space 
R-factor defined for crystallographic maps in

     Branden C. and Jones A., Nature 343 687-689 (1990)

is

     RSRF = sum(|d_o - d_c|) / sum(|d_o + d_c|)

where

     d_o is the observed (experimental) density
     d_c is the calculated density from the atomic model.

and the sum is over grid points in the d_c map, probably using d_o 
interpolated values at those exact same points.  They also compute RSRF 
per-residue and I'm not clear what grid points they use in that case -- 
maybe just the atom center positions.

   This has some immediate problems when applied to EM maps and fit 
models.  First you need the observed and calculated density maps to have 
the same normalization.  If the experimental density values range from 
-5000 to 10000 and the calculated ones from 0.001 to 0.01 then obviously 
you get nonsense.  The next problem is that the experimental density 
values from single-particle EM reconstructions are often negative in 
parts of the map.  You can see from the formula above that can cause 
havoc.  If experimental density at just one grid point is close to being 
the negative of the calculated density it will make a huge contribution 
to RSRF.  X-ray maps also have many negative density values, but their 
magnitudes seem to be less.

   The idea behind RSRF is to judge the fit by looking at the size of 
difference map values d_o - d_c relative to the size of the values in 
the observed and calculated maps.  The standard cross-correlation 
coefficient does something very similar and does it better I think.  
Here's how.  Consider the sum of the squares of the residuals over all 
the grid points and normalize by the sums of squares of the densities in 
the experimental and calculated maps

     E = sum((d_o - d_c)**2) / (sqrt(sum(d_o**2)) * sqrt(sum(d_c**2)))

This has the same problem described above that the maps may have 
different normalizations.  So put a scale factor f in front of the 
calculated map d_c

     E = sum((d_o - f*d_c)**2) / (sqrt(sum(d_o**2)) * sqrt(sum((f*d_c)**2)))

and choose the scale factor f so that E is minimized.  In other words, 
we scale the calculated map to minimize the error between experimental 
and calculated maps.  It is easy to show that

     f = sqrt(sum(d_o**2)) / sqrt(sum(d_c**2))

and then

     E = 2 * ( 1 - CCC )

where

     CCC = sum(d_o * d_c) / (sqrt(sum(d_o**2)) * sqrt(sum(d_c**2)))

is just the normal cross-correlation coefficient (without mean values 
being subtracted).

   So the standard cross-correlation coefficient is a direct and 
sensible measure of residual error.

   If you can give me a sound reason why another measure of residual 
error is useful, I'll be happy to add it to Chimera.

     Tom

> Hi, Chimera staffs
>
> To evaluate the validity of fitting of the atomic model with the EM map, can I calculate the real-space R factor with chimera?
>
> Ryo
> _______________________________________________
> Chimera-users mailing list
> Chimera-users at cgl.ucsf.edu
> http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users
>