Microsoft Word - VCF09CI extra problemsAns.docx

x t

(

)

p z

p p

−

Confidence Intervals

A confidence interval is an interval whose purpose is to estimate a parameter (a number that could, in

theory, be calculated from the population, if measurements were available for the whole population).

A confidence interval has three elements. First there is the interval itself, something like (123, 456).

Second is the confidence level, something like 95%. Third there is the parameter being estimated,

something like the population mean, µ or the population proportion, p. In order to have a meaningful

statement, you need all three elements: (123, 456) is a 95% confidence interval for µ .

Formulas:

General formula for confidence intervals: estimate ± margin of error

z* is 1.645 for 90% confidence, 1.96 for 95% confidence, and 2.576 for 99% confidence

CI for a population mean (

is known and n > 30 or the variable is normally distributed in the

population)

x z

(TI-83: STAT  TESTS  7:ZInterval)

CI for a population mean (

is unknown and n > 30 or the variable is normally distributed in the

population)

(TI-83: STAT  TESTS  8:TInterval)

CI for a Population proportion (when

np and n p

(

)

≥

−

≥

10 1 10

)

(TI-83: STAT  TESTS  A:1-PropZInterval)

If you don’t know

, use

(conservative approach) .

Minimum required sample size for a desired margin of error and confidence level:

When it is a mean problem:

⋅













When it is a proportion problem:

p p













−

(

)

Examples:

1. You wish to estimate, with 95% confidence, the proportion of computers that need repairs

or have problems by the time the product is three years old. Your estimate must be

accurate within 3% of the true proportion.

a. If no preliminary estimate is available, find the minimum sample size required.

If no preliminary estimate is available, use the conservative choice:

0 5

m = 3% = 0.03

p p













− =













⋅ − =

(

)

. ( . ) .

0 03

05 1 05 1111111

Thus we need at least 1112 computers to sample. (Remember: ALWAYS round up!)

b. Now suppose a prior study involving less than 100 computers found that 19% of these

computers needed repairs or had problems by the time the product was three years old. Find

the minimum sample size needed.

Now

019

p p













− =













⋅ − =

(

)

. ( . )

0 03

019 1 019 684

This is a whole number, thus the minimum sample size we need is 684.

2. A college administrator would like to determine how much time students spend on

homework assignments during a typical week. A questionnaire is sent to a sample of n =

100 students and their response indicates a mean of 7.4 hours per week and standard

deviation of 3hours.

(a) What is the point estimate of the mean amount of homework for the entire student

population (i.e., what is the point estimate for

the unknown population mean)?

The point estimate for the population mean is the sample mean. In this case it’s 7.4 hours.

(b) Now make an interval estimate of the population mean so that you are 95% confident that

the “true” mean is in your interval (i.e., compute the 95% confidence interval).

Conditions: random sample? We don’t really know. n > 30, so we can assume by the CLT

that the shape of the sampling distribution of the sample means is approximately normal.

7.4

hours, and s = 3 hours. The population s.d. is unknown, we only know the sample

s.d., so we need to use the t-interval.

Using

x t

( t* = 1.987) or the calculator: 8: TInterval

The 95% confidence interval is (6.8, 8.0).

That means, we are 95% confident that the mean time ALL students spend on homework

assignments during a typical week is between 6.8 hours and 8.0 hours.

Repeating part b with t* = 2.632, we get (6.6, 8.2). That means, we are 99% confident that

the mean time ALL students spend on homework assignments during a typical week is

between 6.6 hours and 8.2 hours.

(d) Compare your answer to “b” and “c”. Which confidence interval is wider, and why? How

is the width of the confidence interval related to the percentage/degree of confidence?

The 99% confidence interval is wider.

If you widen the confidence interval of plausible

values, you're more sure that the real parameter is in there somewhere.

(e) Now compute the 95% confidence interval again, but assume that n = 50.

Since n is still larger than 30, we can use the t-interval again. (t* = 2.014)

The 95% confidence interval with n = 50 is (6.5, 8.3).

(f) Compare your answer to “b” and “e”. Which confidence interval is wider, and why? How

is the width of the confidence interval related to the size of the sample?

The sample size of 100 gives a smaller confidence interval than the sample of size 50. The larger

your sample size, the more sure you can be that their answers truly reflect the population. This

indicates that for a given confidence level, the larger your sample size, the smaller your confidence

interval. However, the relationship is not linear (i.e., doubling the sample size does not halve the

confidence interval. Actually if we make the sample size quadrupled (times 4), that would halve the

confidence interval).

3. In Roosevelt National Forest, the rangers took random samples of live aspen trees and

measured the base circumference of each tree. Assume that the circumferences of the trees are

normally distributed.

a. The first sample had 30 trees with a mean circumference of 15.71 inches, and standard deviation

of 4.63 inches. Find a 95% confidence interval for the mean circumference of aspen trees from

this data.

Conditions: random sample checked, σ is unknown, and n =30 and the circumferences are

normally distributed, so we can use the t-interval.

= 15.71 s = 4.63 n = 30

Using

x t

(t* = 2.045) or the calculator: 8: TInterval

The 95% t-interval is (13.98, 17.44).

This means, that we are 95% confident that the mean circumference of ALL live aspen trees in

Roosevelt National Forest is between 13.98 inches and 17.44 inches. That is, based on this

sample. If we could measure the circumference of ALL of the live aspen trees there, then we are

95% confident that the mean of all the measurements would be between 13.98 inches and 17.44

inches.

Also, it means that if we would take many, many samples of size 30 of live aspen trees and

calculate a 95% confidence interval for each sample, about 95% of them would contain the real,

actual mean circumference and about 5% would miss it. But, of course, we don’t know which

5% would miss it.

The next sample had 100 trees with a mean of 15.58 inches. Again find a 95% confidence

interval for the mean circumference of aspen trees from these data.

Conditions: σ is unknown, and n > 30 and the circumferences are normally distributed, so we can

use the t-interval.

= 15.71 s = 4.63 n = 100

Using

x t

(t* = 1.984) or the calculator: 8: TInterval

The 95% t-interval is (14.79, 16.63).

This means, that we are 95% confident that the mean circumference of ALL live aspen trees in

Roosevelt National Forest is between 14.79 inches and 16.63 inches. That is, based on this

sample, if we could measure the circumference of ALL the live aspen trees there, then we are

95% confident that the mean of all the measurements would be between 14.79 inches and 16.63

inches.

The last sample had 300 trees with a mean of 15.59 inches. Find a 95% confidence interval from

these data.

Conditions: σ is unknown, and n > 30 and the circumferences are normally distributed, so we can

use the t-interval.

= 15.71 s = 4.63 n = 300

Using

x t

(t* = 1.96) or the calculator: 8: TInterval

The 95% t-interval is (15.18, 16.24).

This means, that we are 95% confident that the mean circumference of ALL live aspen trees in

Roosevelt National Forest is between 15.18 inches and 16.24 inches. That is, based on this

sample, if we could measure the circumference of ALL the live aspen trees there, then we are

95% confident that the mean of all the measurements would be between 15.18 inches and 16.24

inches.

Find the length of each interval of parts (a), (b) and (c). Comment on how these lengths change as

the sample size increases.

The length of the CI with n = 30 is 17.44 – 13.98 = 3.46

The length of the CI with n = 100 is 16.63 – 14.79 = 1.84

The length of the CI with n = 300 is 16.24 – 15.18 = 1.06.

The length of the interval gets smaller as the sample size increases.

4. In an article exploring blood serum levels of vitamins and lung cancer risks (The New England

Journal of Medicine), the mean serum level of vitamin E in the control group was 11.9 mg/liter.

There were 196 patients in the control group. (These patients were free of all cancer, except

possible skin cancer, in the subsequent 8 years). Assume that the standard deviation σ = 4.30

mg/liter.

a. Find a 95% confidence interval for the mean serum level of vitamin E in all persons similar to

the control group.

Conditions: Random sample? We don’t really know, but let’s assume they picked the subjects

randomly. σ is known, so we can use the z-interval.

= 11.9 σ = 4.30 n = 196

Using either

x z

(z* = 1.96) or the calculator: 7: ZInterval

The 95% t-interval is (11.3, 12.5).

This means, that we are 95% confident that the mean serum level of vitamin E in the ALL cancer

free patients is between 11.3 mg/liter and 12.5 mg/liter. That is, based on this sample, if we could

measure the mean serum level of vitamin E in ALL cancer free patients (except possible skin cancer

in the subsequent 8 years), then we are 95% confident that the mean of all the measurements would

be between 11.3 mg/liter and 12.5 mg/liter.

b. If you wanted to estimate the mean serum level of vitamin E, with 90% confidence, and a margin

of error of no more than 0.25 mg/liter, how large a sample would you need?

For the minimum sample size we need we can use the formula:

⋅













⋅













⋅













. .

1645 4 30

0 25

80055

Thus, we would need at least 801 cancer free patients in our sample.

5. Suppose in a state with a large number of voters that 56 out of 100 randomly surveyed voters

favored Proposition 1. This is just a small sample of all the voters. Do you think Proposition 1

passed?

YES, but I am not very sure, I would like more information.

a. Give a range of plausible values for the proportion of all voters who favored Proposition 1. (That

is, find a 95% confidence interval)

Our goal is to estimate the proportion of ALL voters who favored Proposition 1 (p).

In our sample, 56 out of 100 favored the proposition, that is

= 56/100 = 0.56 = 56%.

x = 56 n = 100

=0.56

Checking conditions for CI: random sample,

56 10

and

n p

(

) ( . )1 100 1 0 56 44

−

Conditions are satisfied. We use :

(

)

p z

p p

−

Thus, using the formula above (with z* = 1.96), or using the A:1-PropZInt menu on the calculator, we get

(0.462, 0.653).

That is we are 95% confident that the proportion of ALL voters who favored Propostion 1 is

between 46.2% and 65.3%.

Other samples of 100 voters would yield other 95% confidence intervals. Most of these

confidence intervals (about 95% of them) would capture p, but a few of them (about 5%) would

not.

b. The 95% confidence interval we just computed is rather wide and does not pinpoint p to any

great extent. (In fact, we cannot even tell whether a majority voted for Proposition 1

Our next example shows that we can obtain a narrower confidence interval by taking a larger

sample.

Suppose in a state with a large number of voters that 560 out of 1000 randomly surveyed voters

favored Proposition 1. Give a range of plausible values for the proportion of all voters who

favored Proposition 1.

Our goal is to estimate the proportion of ALL voters who favored Proposition 1 (p).

In our sample, 560 out of 1000 favored the proposition, that is

= 560/1000 = 0.56 = 56%.

x = 560 n = 1000

=0.56

Checking conditions for CI: random sample,

560 10

and

n p

(

) ( . )1 1000 1 0 56 440 10

−

Conditions are satisfied. We use :

(

)

p z

p p

−

Thus, using the formula above (with z* = 1.96), or using the A:1-PropZInt menu on the calculator, we get

(0.529, 0.591).

That is, based on the results from our sample of size 1000, we are 95% confident that the

proportion of ALL voters who favored Propostion 1 is between 52.9% and 59.1%.

Notice that the sample size of 1000 gives a much narrower confidence interval than the sample

size of 100. In fact, with the larger sample, we can be quite confident (about 95% of the time

anyway), that a majority of the voters favored Proposition 1, since the smaller endpoint of the

samples 95% confidence interval, 0.529 is greater than one-half. Bear in mind, however, that the

larger sample may be more costly and time consuming than the smaller one.

Now, how confident are you that Proposition 1 passed or failed?

I’d bet a small amount of money that I am right.

c. Forget the previous parts now. Assume that you didn’t take any samples yet. What sample size

you need to use if you want the margin of error to be at most 3% with 95% confidence but you

have no estimate of p?

Because you don’t have an estimate of p, use

= 0.5. We want the margin of error to be at most

3%, that is m = 0.03.

p p













− =













− =

(

)

. ( . ) .

196

0 03

0 5 1 0 5 1067111

Thus, to get a margin of error to be at most 3%, we need at least 1068voters in our sample.

d. Now let’s assume you did a pilot sample, in which 56 out of 100 voters said they favor

Proposition 1. What sample size you need to use if you want the margin of error to be at most

3% with 95% confidence now?

Now we have an estimate of p from the pilot study, so we use

= 0.56. We want the margin of

error to be at most 3%, that is m = 0.03.

p p













− =













− =

(

)

. ( . ) .

196

0 03

056 1 0 56 105174

Thus, to get a margin of error to be at most 3%, we need at least 1052 voters in our sample.

6. Sometimes a 95% confidence interval is not enough. For example, in testing new medical drugs

or procedures, a 99% confidence interval may be required before the new drug or procedure is

approved for general use. For example, a new drug for migraines might induce insomnia

(difficulty of falling asleep) in some patients. If this side effect happens in too many patients, the

drug might not be approved. More precisely, if it could happen in more than 5% of all the

patients, it won’t be approved. In a random sample of 632 migraine patients who took the new

pill, 19 of them experienced insomnia. Based on this sample result, what would be your

recommendation, should the new drug be approved or not?

We want to estimate the proportion of ALL migraine patients who would experience insomnia.

The sample proportion,

, is 19/632 = 0.03 = 3%

We want to calculate the 99% confidence interval based on this sample result. Let’s check the

conditions first:

Random sample,

19 10

and

n p

(

)1 613 10

−

Conditions are satisfied. We use :

(

)

p z

p p

−

Thus, using the formula above (with z* = 2.575), or using the A:1-PropZInt menu on the calculator, we

get (0.0126, 0.0476).

Thus, based on this sample result, we are 99% confident that if we could test every migraine

patients who would take this pill, the proportion of them who would experience insomnia would

be between about 1.26% and 4.76%. Therefore, we can recommend the approval of the new

drug.

7. The Gallup Poll survey organization conducted telephone interviews with a randomly selected

national sample of 1,003 adults, 18 years and older, on Mar. 3-5, 2003. In the survey they found that

281 adults said that the nation’s energy situation is “very serious”. Find a 95 and 99% confidence

interval for the unknown proportion of Americans who felt that the nation’s energy situation is very

serious.

This is a proportion problem.

= =

281

1003

Conditions: random sample, checked,

np n p

, (

) ( )

= ⋅ = > − = − = >

1003

281

1003

281 10 1 1003 1

281

1003

722 10

95% confidence interval:

(

)

p z

p p

−

(z* = 1.96)

Or using the calculator: STAT  TESTSA:1-PropZInt, x = 213, n = 1003, C-level: 0.95

The 95% confidence interval is: (0.253, 0.308)

We are 95% confident that the proportion of ALL adult in the U.S. who feel that the nation’s energy

situation is very serious is somewhere between 25.3% and 30.8%. That is, if we could ask EVERY

adult in the U.S. and ask them what they think about the nation’s energy situation, we are 95%

confident that 25.3%-30.8% of them would think that the energy situation is very serious.

99% confidence interval:

(

)

p z

p p

−

(z* = 2.575)

Or using the calculator: STAT  TESTSA:1-PropZInt, x = 281, n = 1003, C-level: 0.99

The 95% confidence interval is: (0.244, 0.317)

We are 99% confident that the proportion of ALL adult in the U.S. who feels that the nation’s energy

situation is very serious is somewhere between 24.4% and 31.7%. That is, if we could ask EVERY

adult in the U.S. and ask them what they think about the nation’s energy situation, we are 95%

confident that 24.4%-31.7% of them would think that the energy situation is very serious.

Again, as it should be, the 99% confidence interval is wider.

8. The dataset "Normal Body Temperature, Gender, and Heart Rate" contains 130 observations of body

temperature, along with the gender of each individual and his or her heart rate. MINITAB provides

the following information:

Descriptive Statistics

Variable N Mean Median Tr Mean StDev SE Mean

TEMP 130 98.249 98.300 98.253 0.733 0.064

Variable Min Max Q1 Q3

TEMP 96.300 100.800 97.800 98.700

Based on these results, construct and interpret a 95% confidence intervals for the mean body

temperature. According to these results, is the usual assumed normal body temperature of 98.6

degrees Fahrenheit within the 95% confidence interval for the mean?

This is a mean problem.

Conditions: random sample: we don’t know. No information about that. n > 30.

Since we don’t know sigma, the population’s standard deviation, we need to use the t-interval.

The sample mean is 98.249, and the sample standard deviation is 0.733 (both are provided above).

Use t* = 1.984

The 95% confidence interval:

x t

± = ± =

* . .

( . , . )98 249 1984

0 733

130

98121 98 377

Or using the calculator: STAT  TESTS  8: TInterval: highlight Stat, and enter 98.249 for the mean, 0.733

for Sx, and 130 for n.

We are 95% confident that the mean body temperature for ALL people is between 98.121 and 98.377 degrees

of Fahrenheit. The usual assumed normal body temperature of 98.6 degrees Fahrenheit is not within

the 95% confidence interval for the mean.