Distribution of proportion and error
In a random sample from a categorical population with probability of success, the number of successes, x, has a binomial distribution,
The sample proportion, p = x / n, has a distribution with similar shape -- it is just scaled by a factor 1/n. From the properties of the binomial distribution, its distribution has mean and standard deviation
Bias and standard error
When the proportion p is used to estimate , the estimation error is p - . The error distribution therefore has the same shape as that of p, but is shifted to have mean zero. The bias and standard error of the sample proportion are therefore
Standard error from data
Unfortunately, the standard error of p involves , and this is unknown in practical problems. To get a numerical value for the standard error, we must therefore replace with our best estimate of its value, p.
In the rice survey, a proportion p = 17/36 = 0.472 of the n = 36 farmers used 'Old' varieties. The number using 'Old' varieties should have a binomial distribution,
The diagram below initially shows this distribution with replaced by our best estimate, p = 0.472.
Use the pop-up menu to display the (approximate) distributions of the sample proportion, p, and the estimation error. Observe that all three distributions have the same basic shape -- only the scale on the axis changes.
|We estimated that a proportion 0.472 of farmers in the region use 'Old' varieties. From the error distribution, it is unlikely that this estimate will be in error by more than 0.2.|
Normal approximation to the error distribution
If the sample size, n, is large enough, the binomial distribution is approximately normal, so we have the approximation
|You will see later that it is often easier to use this normal approximation than the binomial distribution.|
Closeness of the normal approximation
The diagram below shows the binomial distribution for the errors in simulations with probability of success (red) and its normal approximation (grey).
Use the sliders to verify that
- The normal approximation improves as n increases, whatever the value of .
- The normal approximation is best when is close to 0.5.
The normal approximation to the error distribution is therefore reasonable provided the sample size is reasonably large and is not close to 0 or 1. (We will give better guidelines later.)