Distribution of proportion and error

In a random sample from a categorical population with probability pi of success, the number of successes, x, has a binomial distribution,

mean(p) = pi

The sample proportion, p = x / n, has a distribution with similar shape -- it is just scaled by a factor 1/n. From the properties of the binomial distribution, its distribution has mean and standard deviation

mean(p) = pi

sd(p)

Bias and standard error

When the proportion p is used to estimate pi, the estimation error is p - pi. The error distribution therefore has the same shape as that of p, but is shifted to have mean zero. The bias and standard error of the sample proportion are therefore

sd(p)

Standard error from data

Unfortunately, the standard error of p involves pi, and this is unknown in practical problems. To get a numerical value for the standard error, we must therefore replace pi with our best estimate of its value, p.

sd(p)

Rice survey

In the rice survey, a proportion p = 17/36 = 0.472 of the n = 36 farmers used 'Old' varieties. The number using 'Old' varieties should have a binomial distribution,

mean(p) = pi

The diagram below initially shows this distribution with pi replaced by our best estimate, p = 0.472.

Use the pop-up menu to display the (approximate) distributions of the sample proportion, p, and the estimation error. Observe that all three distributions have the same basic shape -- only the scale on the axis changes.

We estimated that a proportion 0.472 of farmers in the region use 'Old' varieties. From the error distribution, it is unlikely that this estimate will be in error by more than 0.2.

Normal approximation to the error distribution

If the sample size, n, is large enough, the binomial distribution is approximately normal, so we have the approximation

mean(p) = pi

You will see later that it is often easier to use this normal approximation than the binomial distribution.

Closeness of the normal approximation

The diagram below shows the binomial distribution for the errors in simulations with probability pi of success (red) and its normal approximation (grey).

Use the sliders to verify that

  • The normal approximation improves as n increases, whatever the value of pi.
  • The normal approximation is best when pi is close to 0.5.

The normal approximation to the error distribution is therefore reasonable provided the sample size is reasonably large and pi is not close to 0 or 1. (We will give better guidelines later.)