1
x t
s
n
±
*
$
p
x
n
=
$
*
$
(
$
)
p z
p p
n
±
1
Confidence Intervals
A confidence interval is an interval whose purpose is to estimate a parameter (a number that could, in
theory, be calculated from the population, if measurements were available for the whole population).
A confidence interval has three elements. First there is the interval itself, something like (123, 456).
Second is the confidence level, something like 95%. Third there is the parameter being estimated,
something like the population mean, µ or the population proportion, p. In order to have a meaningful
statement, you need all three elements: (123, 456) is a 95% confidence interval for µ .
Formulas:
General formula for confidence intervals: estimate ± margin of error
z* is 1.645 for 90% confidence, 1.96 for 95% confidence, and 2.576 for 99% confidence
CI for a population mean (
σ
is known and n > 30 or the variable is normally distributed in the
population)
x z
n
±
*
σ
(TI-83: STAT TESTS 7:ZInterval)
CI for a population mean (
σ
is unknown and n > 30 or the variable is normally distributed in the
population)
(TI-83: STAT TESTS 8:TInterval)
CI for a Population proportion (when
np and n p
$
(
$
)
10 1 10
)
(TI-83: STAT TESTS A:1-PropZInterval)
If you don’t know
$
p
, use
$
p
=
1
2
(conservative approach) .
Minimum required sample size for a desired margin of error and confidence level:
When it is a mean problem:
n
z
m
=
*
σ
2
When it is a proportion problem:
n
z
m
p p
=
*
$
(
$
)
2
1
2
Examples:
1. You wish to estimate, with 95% confidence, the proportion of computers that need repairs
or have problems by the time the product is three years old. Your estimate must be
accurate within 3% of the true proportion.
a. If no preliminary estimate is available, find the minimum sample size required.
If no preliminary estimate is available, use the conservative choice:
$
.
p
=
0 5
m = 3% = 0.03
n
z
m
p p
=
=
=
*
$
(
$
)
.
. ( . ) .
2
2
1
2
0 03
05 1 05 1111111
Thus we need at least 1112 computers to sample. (Remember: ALWAYS round up!)
b. Now suppose a prior study involving less than 100 computers found that 19% of these
computers needed repairs or had problems by the time the product was three years old. Find
the minimum sample size needed.
Now
$
.
p
=
019
n
z
m
p p
=
=
=
*
$
(
$
)
.
. ( . )
2
2
1
2
0 03
019 1 019 684
This is a whole number, thus the minimum sample size we need is 684.
2. A college administrator would like to determine how much time students spend on
homework assignments during a typical week. A questionnaire is sent to a sample of n =
100 students and their response indicates a mean of 7.4 hours per week and standard
deviation of 3hours.
(a) What is the point estimate of the mean amount of homework for the entire student
population (i.e., what is the point estimate for
µ
,
the unknown population mean)?
The point estimate for the population mean is the sample mean. In this case it’s 7.4 hours.
(b) Now make an interval estimate of the population mean so that you are 95% confident that
the “true” mean is in your interval (i.e., compute the 95% confidence interval).
Conditions: random sample? We don’t really know. n > 30, so we can assume by the CLT
that the shape of the sampling distribution of the sample means is approximately normal.
x
=
7.4
hours, and s = 3 hours. The population s.d. is unknown, we only know the sample
s.d., so we need to use the t-interval.
3
Using
x t
s
n
±
*
( t* = 1.987) or the calculator: 8: TInterval
The 95% confidence interval is (6.8, 8.0).
That means, we are 95% confident that the mean time ALL students spend on homework
assignments during a typical week is between 6.8 hours and 8.0 hours.
(c) Now compute the 99% confidence interval.
Repeating part b with t* = 2.632, we get (6.6, 8.2). That means, we are 99% confident that
the mean time ALL students spend on homework assignments during a typical week is
between 6.6 hours and 8.2 hours.
(d) Compare your answer to “b” and “c”. Which confidence interval is wider, and why? How
is the width of the confidence interval related to the percentage/degree of confidence?
The 99% confidence interval is wider.
If you widen the confidence interval of plausible
values, you're more sure that the real parameter is in there somewhere.
(e) Now compute the 95% confidence interval again, but assume that n = 50.
Since n is still larger than 30, we can use the t-interval again. (t* = 2.014)
The 95% confidence interval with n = 50 is (6.5, 8.3).
(f) Compare your answer to “b” and “e”. Which confidence interval is wider, and why? How
is the width of the confidence interval related to the size of the sample?
The sample size of 100 gives a smaller confidence interval than the sample of size 50. The larger
your sample size, the more sure you can be that their answers truly reflect the population. This
indicates that for a given confidence level, the larger your sample size, the smaller your confidence
interval. However, the relationship is not linear (i.e., doubling the sample size does not halve the
confidence interval. Actually if we make the sample size quadrupled (times 4), that would halve the
confidence interval).
3. In Roosevelt National Forest, the rangers took random samples of live aspen trees and
measured the base circumference of each tree. Assume that the circumferences of the trees are
normally distributed.
a. The first sample had 30 trees with a mean circumference of 15.71 inches, and standard deviation
of 4.63 inches. Find a 95% confidence interval for the mean circumference of aspen trees from
this data.
Conditions: random sample checked, σ is unknown, and n =30 and the circumferences are
normally distributed, so we can use the t-interval.
x
= 15.71 s = 4.63 n = 30
4
Using
x t
s
n
±
*
(t* = 2.045) or the calculator: 8: TInterval
The 95% t-interval is (13.98, 17.44).
This means, that we are 95% confident that the mean circumference of ALL live aspen trees in
Roosevelt National Forest is between 13.98 inches and 17.44 inches. That is, based on this
sample. If we could measure the circumference of ALL of the live aspen trees there, then we are
95% confident that the mean of all the measurements would be between 13.98 inches and 17.44
inches.
Also, it means that if we would take many, many samples of size 30 of live aspen trees and
calculate a 95% confidence interval for each sample, about 95% of them would contain the real,
actual mean circumference and about 5% would miss it. But, of course, we don’t know which
5% would miss it.
The next sample had 100 trees with a mean of 15.58 inches. Again find a 95% confidence
interval for the mean circumference of aspen trees from these data.
Conditions: σ is unknown, and n > 30 and the circumferences are normally distributed, so we can
use the t-interval.
x
= 15.71 s = 4.63 n = 100
Using
x t
s
n
±
*
(t* = 1.984) or the calculator: 8: TInterval
The 95% t-interval is (14.79, 16.63).
This means, that we are 95% confident that the mean circumference of ALL live aspen trees in
Roosevelt National Forest is between 14.79 inches and 16.63 inches. That is, based on this
sample, if we could measure the circumference of ALL the live aspen trees there, then we are
95% confident that the mean of all the measurements would be between 14.79 inches and 16.63
inches.
The last sample had 300 trees with a mean of 15.59 inches. Find a 95% confidence interval from
these data.
Conditions: σ is unknown, and n > 30 and the circumferences are normally distributed, so we can
use the t-interval.
x
= 15.71 s = 4.63 n = 300
Using
x t
s
n
±
*
(t* = 1.96) or the calculator: 8: TInterval
The 95% t-interval is (15.18, 16.24).
5
This means, that we are 95% confident that the mean circumference of ALL live aspen trees in
Roosevelt National Forest is between 15.18 inches and 16.24 inches. That is, based on this
sample, if we could measure the circumference of ALL the live aspen trees there, then we are
95% confident that the mean of all the measurements would be between 15.18 inches and 16.24
inches.
Find the length of each interval of parts (a), (b) and (c). Comment on how these lengths change as
the sample size increases.
The length of the CI with n = 30 is 17.44 – 13.98 = 3.46
The length of the CI with n = 100 is 16.63 – 14.79 = 1.84
The length of the CI with n = 300 is 16.24 – 15.18 = 1.06.
The length of the interval gets smaller as the sample size increases.
4. In an article exploring blood serum levels of vitamins and lung cancer risks (The New England
Journal of Medicine), the mean serum level of vitamin E in the control group was 11.9 mg/liter.
There were 196 patients in the control group. (These patients were free of all cancer, except
possible skin cancer, in the subsequent 8 years). Assume that the standard deviation σ = 4.30
mg/liter.
a. Find a 95% confidence interval for the mean serum level of vitamin E in all persons similar to
the control group.
Conditions: Random sample? We don’t really know, but let’s assume they picked the subjects
randomly. σ is known, so we can use the z-interval.
x
= 11.9 σ = 4.30 n = 196
Using either
x z
n
±
*
σ
(z* = 1.96) or the calculator: 7: ZInterval
The 95% t-interval is (11.3, 12.5).
This means, that we are 95% confident that the mean serum level of vitamin E in the ALL cancer
free patients is between 11.3 mg/liter and 12.5 mg/liter. That is, based on this sample, if we could
measure the mean serum level of vitamin E in ALL cancer free patients (except possible skin cancer
in the subsequent 8 years), then we are 95% confident that the mean of all the measurements would
be between 11.3 mg/liter and 12.5 mg/liter.
b. If you wanted to estimate the mean serum level of vitamin E, with 90% confidence, and a margin
of error of no more than 0.25 mg/liter, how large a sample would you need?
For the minimum sample size we need we can use the formula:
n
z
m
=
*
σ
2
6
n
z
m
=
=
=
*
. .
.
.
σ
2
2
1645 4 30
0 25
80055
Thus, we would need at least 801 cancer free patients in our sample.
5. Suppose in a state with a large number of voters that 56 out of 100 randomly surveyed voters
favored Proposition 1. This is just a small sample of all the voters. Do you think Proposition 1
passed?
YES, but I am not very sure, I would like more information.
a. Give a range of plausible values for the proportion of all voters who favored Proposition 1. (That
is, find a 95% confidence interval)
Our goal is to estimate the proportion of ALL voters who favored Proposition 1 (p).
In our sample, 56 out of 100 favored the proposition, that is
$
p
= 56/100 = 0.56 = 56%.
x = 56 n = 100
$
p
=0.56
Checking conditions for CI: random sample,
np
$
=
>
56 10
and
n p
(
$
) ( . )1 100 1 0 56 44
=
=
Conditions are satisfied. We use :
$
*
$
(
$
)
p z
p p
n
±
1
Thus, using the formula above (with z* = 1.96), or using the A:1-PropZInt menu on the calculator, we get
(0.462, 0.653).
That is we are 95% confident that the proportion of ALL voters who favored Propostion 1 is
between 46.2% and 65.3%.
Other samples of 100 voters would yield other 95% confidence intervals. Most of these
confidence intervals (about 95% of them) would capture p, but a few of them (about 5%) would
not.
b. The 95% confidence interval we just computed is rather wide and does not pinpoint p to any
great extent. (In fact, we cannot even tell whether a majority voted for Proposition 1
Our next example shows that we can obtain a narrower confidence interval by taking a larger
sample.
Suppose in a state with a large number of voters that 560 out of 1000 randomly surveyed voters
favored Proposition 1. Give a range of plausible values for the proportion of all voters who
favored Proposition 1.
Our goal is to estimate the proportion of ALL voters who favored Proposition 1 (p).
In our sample, 560 out of 1000 favored the proposition, that is
$
p
= 560/1000 = 0.56 = 56%.
x = 560 n = 1000
$
p
=0.56
7
Checking conditions for CI: random sample,
np
$
=
>
560 10
and
n p
(
$
) ( . )1 1000 1 0 56 440 10
=
=
>
Conditions are satisfied. We use :
$
*
$
(
$
)
p z
p p
n
±
1
Thus, using the formula above (with z* = 1.96), or using the A:1-PropZInt menu on the calculator, we get
(0.529, 0.591).
That is, based on the results from our sample of size 1000, we are 95% confident that the
proportion of ALL voters who favored Propostion 1 is between 52.9% and 59.1%.
Notice that the sample size of 1000 gives a much narrower confidence interval than the sample
size of 100. In fact, with the larger sample, we can be quite confident (about 95% of the time
anyway), that a majority of the voters favored Proposition 1, since the smaller endpoint of the
samples 95% confidence interval, 0.529 is greater than one-half. Bear in mind, however, that the
larger sample may be more costly and time consuming than the smaller one.
Now, how confident are you that Proposition 1 passed or failed?
I’d bet a small amount of money that I am right.
c. Forget the previous parts now. Assume that you didn’t take any samples yet. What sample size
you need to use if you want the margin of error to be at most 3% with 95% confidence but you
have no estimate of p?
Because you don’t have an estimate of p, use
$
p
= 0.5. We want the margin of error to be at most
3%, that is m = 0.03.
n
z
m
p p
=
=
=
*
$
(
$
)
.
.
. ( . ) .
2
2
1
196
0 03
0 5 1 0 5 1067111
Thus, to get a margin of error to be at most 3%, we need at least 1068voters in our sample.
d. Now let’s assume you did a pilot sample, in which 56 out of 100 voters said they favor
Proposition 1. What sample size you need to use if you want the margin of error to be at most
3% with 95% confidence now?
Now we have an estimate of p from the pilot study, so we use
$
p
= 0.56. We want the margin of
error to be at most 3%, that is m = 0.03.
n
z
m
p p
=
=
=
*
$
(
$
)
.
.
. ( . ) .
2
2
1
196
0 03
056 1 0 56 105174
Thus, to get a margin of error to be at most 3%, we need at least 1052 voters in our sample.
6. Sometimes a 95% confidence interval is not enough. For example, in testing new medical drugs
or procedures, a 99% confidence interval may be required before the new drug or procedure is
approved for general use. For example, a new drug for migraines might induce insomnia
(difficulty of falling asleep) in some patients. If this side effect happens in too many patients, the
8
drug might not be approved. More precisely, if it could happen in more than 5% of all the
patients, it won’t be approved. In a random sample of 632 migraine patients who took the new
pill, 19 of them experienced insomnia. Based on this sample result, what would be your
recommendation, should the new drug be approved or not?
We want to estimate the proportion of ALL migraine patients who would experience insomnia.
The sample proportion,
$
p
, is 19/632 = 0.03 = 3%
We want to calculate the 99% confidence interval based on this sample result. Let’s check the
conditions first:
Random sample,
np
$
=
>
19 10
and
n p
(
$
)1 613 10
=
>
Conditions are satisfied. We use :
$
*
$
(
$
)
p z
p p
n
±
1
Thus, using the formula above (with z* = 2.575), or using the A:1-PropZInt menu on the calculator, we
get (0.0126, 0.0476).
Thus, based on this sample result, we are 99% confident that if we could test every migraine
patients who would take this pill, the proportion of them who would experience insomnia would
be between about 1.26% and 4.76%. Therefore, we can recommend the approval of the new
drug.
7. The Gallup Poll survey organization conducted telephone interviews with a randomly selected
national sample of 1,003 adults, 18 years and older, on Mar. 3-5, 2003. In the survey they found that
281 adults said that the nation’s energy situation is “very serious”. Find a 95 and 99% confidence
interval for the unknown proportion of Americans who felt that the nation’s energy situation is very
serious.
This is a proportion problem.
$
p
x
n
= =
281
1003
Conditions: random sample, checked,
np n p
$
, (
$
) ( )
= = > = = >
1003
281
1003
281 10 1 1003 1
281
1003
722 10
95% confidence interval:
$
*
$
(
$
)
p z
p p
n
±
1
(z* = 1.96)
Or using the calculator: STAT TESTSA:1-PropZInt, x = 213, n = 1003, C-level: 0.95
The 95% confidence interval is: (0.253, 0.308)
We are 95% confident that the proportion of ALL adult in the U.S. who feel that the nation’s energy
situation is very serious is somewhere between 25.3% and 30.8%. That is, if we could ask EVERY
adult in the U.S. and ask them what they think about the nation’s energy situation, we are 95%
confident that 25.3%-30.8% of them would think that the energy situation is very serious.
9
99% confidence interval:
$
*
$
(
$
)
p z
p p
n
±
1
(z* = 2.575)
Or using the calculator: STAT TESTSA:1-PropZInt, x = 281, n = 1003, C-level: 0.99
The 95% confidence interval is: (0.244, 0.317)
We are 99% confident that the proportion of ALL adult in the U.S. who feels that the nation’s energy
situation is very serious is somewhere between 24.4% and 31.7%. That is, if we could ask EVERY
adult in the U.S. and ask them what they think about the nation’s energy situation, we are 95%
confident that 24.4%-31.7% of them would think that the energy situation is very serious.
Again, as it should be, the 99% confidence interval is wider.
8. The dataset "Normal Body Temperature, Gender, and Heart Rate" contains 130 observations of body
temperature, along with the gender of each individual and his or her heart rate. MINITAB provides
the following information:
Descriptive Statistics
Variable N Mean Median Tr Mean StDev SE Mean
TEMP 130 98.249 98.300 98.253 0.733 0.064
Variable Min Max Q1 Q3
TEMP 96.300 100.800 97.800 98.700
Based on these results, construct and interpret a 95% confidence intervals for the mean body
temperature. According to these results, is the usual assumed normal body temperature of 98.6
degrees Fahrenheit within the 95% confidence interval for the mean?
This is a mean problem.
Conditions: random sample: we don’t know. No information about that. n > 30.
Since we don’t know sigma, the population’s standard deviation, we need to use the t-interval.
The sample mean is 98.249, and the sample standard deviation is 0.733 (both are provided above).
Use t* = 1.984
The 95% confidence interval:
x t
s
n
± = ± =
* . .
.
( . , . )98 249 1984
0 733
130
98121 98 377
Or using the calculator: STAT TESTS 8: TInterval: highlight Stat, and enter 98.249 for the mean, 0.733
for Sx, and 130 for n.
We are 95% confident that the mean body temperature for ALL people is between 98.121 and 98.377 degrees
of Fahrenheit. The usual assumed normal body temperature of 98.6 degrees Fahrenheit is not within
the 95% confidence interval for the mean.