1
A Bit More on ?
Survey Sample Size Design
Professor Ron Fricker
Naval Postgraduate School
Monterey, California
1/28/13
Some Basic Considerations
?? Previously gave sample size expressions for
means, totals, and proportions
?? Survey usually a combination of Likert and
binary (yes/no) questions
–? If so, use proportions calculation to estimate
required sample size
–? Most conservative (i.e., gives the largest sample
size)
?? If sample size expression involves standard
deviations, also estimate conservatively
–? I.e., use larger estimates vice smaller
1/28/13
2
For Clustered Designs
?? Almost always using clustering because of or
under budget restrictions
–? Implicit sampling frame → area sampling →face-
to-face survey mode → expensive
?? Basic approach: Within budget constraints,
maximize the number of clusters (PSU, SSU,
etc.)
–? Spread the sample size across as many clusters
as possible
–? That is, you’re trying to minimize the correlation as
much as possible
1/28/13
3
For Stratified Designs
?? Two broad reasons to use stratification:
–? Requirement that results for subpopulations have
a particular margin of error
?? Generally related to oversampling
proportionally smaller or rare groups
–? Homogeneity on the measures of interest among
subpopulations
?? Stratification will then result in more precise
population estimates
1/28/13
4
Sample Size Determination & Allocation
?? Compared to SRS, calculations more
complicated as need to
–? Choose overall sample size
n, and
–? Allocate sample to strata,
n
1
+
n
2
+… +
n
L
= n
?? “Best” allocation depends on
–? Purpose of the stratification
–? Number of sampling units in each stratum
–? Variability of sampling units within each stratum
–? Cost of surveying each sampling unit from each
stratum
2/1/13
5
Sample Size Calculation for ?
Estimating the Mean
?? Begin as with SRS, setting
?? Now, define
a
i
as the proportion of the sample
size to allocate to strata
i,
n
i
=
n x
a
i
?? Then,
2/1/13
6
B = 2 Var
?
y
st
( )
n =
N
i
2σ
i
2
a
i
i=1
L
∑
N2
B2 4+
N
i
2σ
i
2
i=1
L
∑
Proportionate Allocation to Strata
?? Sample size within each strata is proportional
to strata size in population
?? If
N is population size and
n is total sample
size, then
where
–?
N
i
is the population size of stratum
i
–?
n
i
is the sample size for stratum
i
?? Then
2/1/13
7
/
/
i
i
n n N N
=
( ) (
)
2 2
2 2
2
1
1
1
2 2
2 2
2 2
2 2
2
2
1
1
1
4
4
4
L
L
L
i i
i
i i
i
i i
i
i
i
L
L
L
i i
i i
i i
i
i
i
N
a
N
N N
N
n
N B
N
N B
N
NB
N
σ
σ
σ
σ
σ
σ
=
=
=
=
=
=
=
=
=
+
+
+
∑
∑
∑
∑
∑
∑
Other Allocation Schemes
?? Rather than allocating sample proportional to
strata size, can
–? Allocate according to variability of the strata
?? Idea is to allocate more of the sample to strata
that are more variable
?? Done right, can provide most precise
population estimates
–? Allocate according to the cost of collection per
strata
?? Idea is to allocate more of the sample to strata
that cost less
2/1/13
8
Design Effect
?? The design effect (aka deff) compares how a
complex sampling design, in this case
stratified sampling, compares to SRS
?? Design effect can be greater or less than 1
–? But with reasonably homogeneous strata, almost
always get decrease in variance
–? I.e. a deff < 1
2/1/13
9
d2 =
Var
Y
st
( )
Var
Y
SRS
( )
=
1
N2
N
i
2 1?
n
i
/
N
i
(
)
s
i
2 /
n
i
i=1
L
∑
1?
n
i
/
N
i
(
)
s2 /
n
Effective Sample Size
?? I like to think about design effects in terms of
effective sample size
–? What size SRS would give the same precision as
the complex sample design?
?? Consider a simple clustered example, with
m
clusters,
and
d2 = 3.13
–? The effective sample size is
–? So we could have done a SRS of a sample of 64
and achieved the same precision
–? Would have meant going to 64/
m times as many
sites – perhaps unaffordable
2/1/13
10
200/3.13 64
eff
n =
=
200
i
n =
∑