lecture4 (2024)

Lecture4
Sample Size

In this cyberlecture, I'd like tooutline a few of the important concepts relating to sample size. Generally,larger samples are good, and this is the case for a number of reasons. So, I'mgoing to try to show this in several different ways.

Bigger is Better
1. The first reason to understand why a large sample size is beneficial issimple. Larger samples more closely approximate the population. Because theprimary goal of inferential statistics is to generalize from a sample to apopulation, it is less of an inference if the sample size is large.

2. A second reason is kind of theopposite. Small samples are bad. Why? If we pick a small sample, we run agreater risk of the small sample being unusual just by chance. Choosing 5people to represent the entire U.S., even if they are chosen completely atrandom, will often result if a sample that is very unrepresentative of thepopulation. Imagine how easy it would be to, just by chance, select 5Republicans and no Democrats for instance.

Let's take this point a littlefurther. If there is an increased probability of one small sample beingunusual, that means that if we were to draw many small samples as when asampling distribution is created (see the second lecture),unusual samples are more frequent. Consequently, there is greater samplingvariability with small samples. This figure is another way to illustrate this:

lecture4 (1)

Note: this is a dramatization to illustrate theeffect of sample sizes, the curves depicted here are fictitious, in order toprotect the innocent and may or may not represent real statistical samplingcurves. A more realistic depiction can be found on p. 163.

In the curve with the "smallsize samples," notice that there are fewer samples with means around themiddle value, and more samples with means out at the extremes. Both the rightand left tails of the distribution are "fatter." In the curve withthe "large size samples," notice that there are more samples withmeans around the middle (and therefore closer to the population value), andfewer with sample means at the extremes. The differences in the curvesrepresent differences in the standard deviation of the samplingdistribution--smaller samples tend to have larger standard errors and largersamples tend to have smaller standard errors.

3. This point about standard errorscan be illustrated a different way. One statistical test is designed to see ifa single sample mean is different from a population mean. A version of thistest is the t-test for a single mean. The purpose of this t-test is tosee if there is a significant difference between the sample mean and thepopulation mean. The t-test formula looks like this:

lecture4 (2)

The t-test formula (also found on p.161 of the Daniel text) has two main components. First, it takes into accounthow large the difference between the sample and the population mean is byfinding the difference between them (lecture4 (3)). When the sample mean is far from thepopulation mean, the difference will be large. Second, t-test formula dividesthis quantity by the standard error (symbolized by lecture4 (4)). By dividing by the standarderror, we are taking into account sampling variability. Only if the differencebetween the sample and population means is large relative to the amount of samplingvariability will we consider the difference to be "statisticallysignificant". When sampling variability is high (i.e., the standard erroris large), the difference between the sample mean and the population mean maynot seem so big.

Concept

Mathematic Representation

distance of the sample mean from the population mean

lecture4 (5)

representation of sampling variability

lecture4 (6)

Ratio of the distance from the population mean relative to the sampling variability

t

Now,back to sample size... As we saw in the figure with the curves above, thestandard error (which represents the amount of sampling variability) is largerwhen the sample size is small and smaller when the sample size is large. So,when the sample size is small, it can be difficult to see a difference betweenthe sample mean and the population mean, because there is too much samplingvariability messing things up. If the sample size is large, it is easier to seea difference between the sample mean and population mean because the samplingvariability is not obscuring the difference. (Kinda nifty how we get from anabstract concept to a formula, huh? I took years of math, but until I took astatistics course, I didn't realize the numbers and symbols in formulas reallysignified anything).

4. Another reason why bigger isbetter is that the value of the standard error is directly dependent on thesample size. This is really the same reason given in #2 above, but I'll show ita different way. To calculate the standard error, we divide the standarddeviation by the sample size (actually there is a square root in there).

lecture4 (7)

In this equation, lecture4 (8)is the standard error, sis the standard deviation, and n is the sample size. If we were to plugin different values for n (try some hypothetical numbers if you want!), usingjust one value for s, the standard error would be smaller for larger values ofn, and the standard error would be larger for smaller values of n.

5. There is a rule that someone cameup with (someone who had vastly superior brain to the population average) thatstates that if sample sizes are large enough, a sampling distribution will benormally distributed (remember that a normal distribution has specialcharacteristics; see p. 107 in the Daniel text; an approximately normallydistributed curve is also depicted by the large sample size curve in the figureabove). This is called the central limit theorem. Ifwe know that the sampling distribution is normally distributed, we can makebetter inferences about the population from the sample. The samplingdistribution will be normal, given sufficient sample size, regardless ofthe shape of the population distribution.

6. Finally, that last reason I canthink of right now why bigger is better is that larger sample sizes give usmore power. Remember that in the previous lecturepower was defined as the probability of retaining the alternative hypothesiswhen the alternative hypothesis is actually true in the population. That is, ifwe can increase our chances of correctly choosing the alternative hypothesis inour sample, we have more power. If the sample size is large, we will have asmaller standard error, and as described in the #3 and #4, we are more likelyto find significance with a lower standard errror.

Do I seem like I am repeatingmyself? Probably. Part of the reason is that it is important to try to explainthese concepts in several different ways, but it is also because, instatistics, everything is interrelated

How Big Should MySample Be?
This is a good question to ask, and it is frequently asked. Unfortunately,there is not a really simple answer. It depends on the type of statistical testone is conducting. It also depends on how precise your measures are and howwell designed your study is. So, it just depends. I often hear a generalrecommendation that there be about 15 or more participants in each group whenconducting a t-test or ANOVA. Don't worry, we'll return to this question later.

<![if !supportEmptyParas]><![endif]>

lecture4 (2024)
Top Articles
Latest Posts
Article information

Author: Horacio Brakus JD

Last Updated:

Views: 5290

Rating: 4 / 5 (51 voted)

Reviews: 82% of readers found this page helpful

Author information

Name: Horacio Brakus JD

Birthday: 1999-08-21

Address: Apt. 524 43384 Minnie Prairie, South Edda, MA 62804

Phone: +5931039998219

Job: Sales Strategist

Hobby: Sculling, Kitesurfing, Orienteering, Painting, Computer programming, Creative writing, Scuba diving

Introduction: My name is Horacio Brakus JD, I am a lively, splendid, jolly, vivacious, vast, cheerful, agreeable person who loves writing and wants to share my knowledge and understanding with you.