Distribution of proportion and error

In a random sample from a categorical population with probability
of success, the **number** of successes, *x*,
has a binomial
distribution,

The **sample proportion**, *p* = *x* / *n*,
has a distribution with similar shape -- it is just scaled by a factor ^{1}/* _{n}*.
From the properties
of the binomial distribution, its distribution has mean and standard deviation

Bias and standard error

When the proportion *p* is used to estimate
,
the estimation error is *p* - .
The error distribution therefore has the same shape as that of *p*,
but is shifted to have mean zero. The bias and standard error of the sample
proportion are therefore

Standard error from data

Unfortunately, the standard error of *p*
involves ,
and this is unknown in practical problems. To get a numerical value for the
standard error, we must therefore replace
with our best estimate of its value, *p*.

Rice survey

In the rice survey, a proportion *p* =^{ 17}/_{36} = 0.472
of the *n* = 36 farmers used 'Old' varieties. The number
using 'Old' varieties should have a binomial distribution,

The diagram below initially shows this distribution with
replaced by our best estimate, *p* = 0.472.

Use the pop-up menu to display the (approximate) distributions of the sample
proportion, *p*, and the estimation error. Observe that all three
distributions have the same basic shape -- only the scale on the axis changes.

We estimated that a proportion 0.472 of farmers
in the region use 'Old' varieties. From the error distribution, it is
unlikely that this estimate will be in error by more than 0.2. |

Normal approximation to the error distribution

If the sample size, *n*, is large enough, the
binomial distribution is approximately normal, so we have the approximation

You will see later that it is often easier to
use this normal approximation than the binomial distribution. |

Closeness of the normal approximation

The diagram below shows the binomial distribution for the errors in simulations with probability of success (red) and its normal approximation (grey).

Use the sliders to verify that

- The normal approximation improves as
*n*increases, whatever the value of . - The normal approximation is best when is close to 0.5.

The normal approximation to the error distribution is therefore reasonable provided the sample size is reasonably large and is not close to 0 or 1. (We will give better guidelines later.)