Saturday, June 18, 2011

Standard errors and why we are here!

Link to Youtube

The whole point of statistics is to use information from a sample to estimate a parameter in a population. Intuitively, wouldn't you agree that the larger the sample, the more accurate the estimate of the parameter? If we have a sample where n>=30, we can make use of the Central Limit Theorem to state that the sampling distribution of our favourite statistic, xbar, follows an approximately normal distribution (the website I just linked to has a very nice simulation).

The sampling distribution of xbar has its own mean and standard deviation. To avoid mixing up the standard deviation of the population and the standard deviation of the sampling distribution of xbar, we call the latter one the standard error. We find it by dividing the population standard deviation by the square root of the sample size, like this:


This is an incredibly important little equation which we'll see lots of times. Can you see that as the sample size increases, the SE (standard error) decreases? 

All this has a highly practical use for estimating a population mean. Which is what we are trying to do in the first place. To start off, let's just imagine that we happen to know the population mean, and it is $51800 (this is from the EAI dataset). The population standard deviation is $4000. We draw a sample of size n = 30. The standard error is


xbar, or the sample mean from our sample of 30, is let's say $52300. What is the probability that the sample mean is within $500 of the population mean? Now we can use Excel for this, recall the function =normdist? 
We want within $500 of the population mean, so that is from 51800 - 500 = 51300 to 51800 + 500 = 52300. Draw a little sketch....showing that we are looking for the area of the bit shaded in red. This will be the probability that the sample mean is within 500 plus or minus of the population mean.


We need to do two calculations in Excel, and then subtract one from the other to find the area in between, which is also the probability (of course!). Go

=normdist(52300,51800,730.3,true) and =normdist(51300,51800,730.3,true). You should end up with 0.7532 - 0.2468 = 0.5064. The meaning of this result: there is a 50/50 chance that the sample mean is within $500 of the population mean. Not too hopeful is it?

Now try the same thing all over again, BUT increase the sample size to n = 100. What happens? Think through what exactly is going on here. Here's my output...but please do it yourself! And take a look at the Youtube on this:


                 

                 

No comments:

Post a Comment