Lecture Notes: Inferences for Population Proportions

I. Introduction to Proportions

Often, we are interested in the proportion (or percentage) of individuals in a population who possess a certain characteristic or fall into a specific category. For example:

The proportion of voters supporting a particular candidate.
The percentage of manufactured items that are defective.
The proportion of patients responding positively to a new treatment.

A. Key Definitions

Population Proportion (p): The true, often unknown, proportion of the entire population that has the specified attribute. This is a population parameter.
Sample Proportion (\(\hat{p}\)): The proportion of a sample drawn from the population that has the specified attribute. This is a sample statistic, calculated from our data, and serves as our best estimate of the unknown population proportion \(p\).

B. Calculating the Sample Proportion

The sample proportion is calculated using the formula:

\( \hat{p} = \frac{x}{n} \)

Where:

\(x\) = the number of individuals in the sample who have the attribute.
\(n\) = the total sample size.

Note: \(\hat{p}\) will always be a value between 0 and 1 (inclusive).

II. The Sampling Distribution of \(\hat{p}\)

Imagine taking many random samples of the same size \(n\) from a population with proportion \(p\). If we calculate \(\hat{p}\) for each sample, the distribution of these \(\hat{p}\) values (the sampling distribution) has important properties:

Mean: The mean of the sampling distribution of \(\hat{p}\) is equal to the true population proportion, \(p\). ( \( \mu_{\hat{p}} = p \) )
Standard Deviation (Standard Error): The standard deviation of the sampling distribution of \(\hat{p}\), often called the standard error of the proportion, is given by:
\( \sigma_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}} \)

Since \(p\) is usually unknown, we estimate the standard error using \(\hat{p}\) when constructing confidence intervals: \( SE(\hat{p}) = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \).
Shape: If the sample size \(n\) is sufficiently large, the sampling distribution of \(\hat{p}\) is approximately Normal. This is justified by the Central Limit Theorem.

Condition for Normality

We can assume the sampling distribution of \(\hat{p}\) is approximately normal if the expected number of successes and failures are both sufficiently large. Common checks (using 10 as a slightly more conservative threshold often seen):

For Confidence Intervals: Check if \( n\hat{p} \ge 10 \) AND \( n(1-\hat{p}) \ge 10 \).
For Hypothesis Tests: Check if \( np_0 \ge 10 \) AND \( n(1-p_0) \ge 10 \), using the hypothesized value \(p_0\).

(Note: Some sources use 5 or 15; check your course requirements. Using 10 is common.)

III. Common Critical Values (z*) from Standard Normal Distribution

Quick Reference: Critical Values (z*)

Critical values (\(z^*\)) are essential for constructing confidence intervals and performing hypothesis tests using the critical value approach. They represent the number of standard deviations away from the mean needed to capture a certain area under the standard normal curve.

Confidence Level (C)	Significance Level (\(\alpha = 1-C\))	Two-Tailed \(z^*\) (for CI & \(H_a: \neq\))	Left-Tailed \(z^*\) (for \(H_a: <\))	Right-Tailed \(z^*\) (for \(H_a: >\))
90%	0.10	\(\pm 1.645\)	\(-1.282\)	\(+1.282\)
95%	0.05	\(\pm 1.96\)	\(-1.645\)	\(+1.645\)
99%	0.01	\(\pm 2.576\)	\(-2.326\)	\(+2.326\)

Note: Left/Right tailed values correspond to the critical value for a hypothesis test with the specified \(\alpha\). Two-tailed values are used for confidence intervals (where \(\alpha/2\) is in each tail) and two-tailed hypothesis tests.

IV. Confidence Intervals for One Population Proportion

A confidence interval provides a range of plausible values for the unknown population proportion \(p\), based on sample data.

A. Purpose and Interpretation

"We are C% confident that the true population proportion \(p\) lies within the interval (Lower Bound, Upper Bound)."

B. Conditions/Assumptions for the One-Proportion z-Interval

Random Sample
Normality Condition: \(x = n\hat{p} \ge 10\) and \(n-x = n(1-\hat{p}) \ge 10\). (Using 10)
Independence Condition: \(N \ge 10n\) if sampling without replacement.

C. Formula for the One-Proportion z-Interval

Point Estimate \(\pm\) Margin of Error
\( \hat{p} \pm E \)

Where the Margin of Error (E) is:

\( E = z^* \times SE(\hat{p}) = z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \)

\(z^*\) is the critical value from the table above for the desired confidence level (use the two-tailed value).

Example: Drug Test Poll (Q3)

\(n=21039\), \(\hat{p} = 0.63\), 99% CI.

Conditions met (using threshold 10).
\(z^* \approx 2.576\) (from table for 99% confidence).
\( E = 2.576 \sqrt{\frac{0.63(0.37)}{21039}} \approx 0.0086 \)
CI: \( 0.63 \pm 0.0086 \Rightarrow (0.6214, 0.6386) \)
Interpretation: We are 99% confident...

D. Determining Required Sample Size

\( n = \hat{p}_{guess}(1-\hat{p}_{guess}) \left( \frac{z^*}{E} \right)^2 \)

Use \(\hat{p}_{guess} = 0.5\) if conservative. Round \(n\) up.

Example: Sample Size Calculation

\(E = 0.03\), 95% CI (\(z^* \approx 1.96\)).

Use \(\hat{p}_{guess} = 0.5\).
\( n = 0.5(0.5) \left( \frac{1.96}{0.03} \right)^2 \approx 1067.11 \)
Round Up: \(n = 1068\).

V. Hypothesis Testing for One Population Proportion

A formal procedure to decide between two claims about \(p\).

A. Purpose

Assess if sample evidence contradicts a claim (\(H_0\)) about \(p_0\).

B. The Steps of Hypothesis Testing (One-Proportion z-Test)

State Hypotheses: \(H_0: p = p_0\) vs. \(H_a\) (\(\neq, <, >\)).
Check Conditions: Random Sample, Normality (\(np_0, n(1-p_0) \ge 10\)), Independence. (Using 10)
Set Significance Level (\(\alpha\)): Usually 0.05.
Calculate Test Statistic (z):
\( z = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}} \)
Determine P-value OR Find Critical Value(s). (Use Z* table above for critical value approach).
Make Decision: Reject \(H_0\) if P-value \(\le \alpha\) or if \(z\) is in the rejection region.
State Conclusion in Context.

C. Examples of Setting up Hypotheses

Example: India Births (Q1)

\(p_0 = 0.517\). Has proportion changed? \(H_0: p = 0.517\), \(H_a: p \neq 0.517\).

Example: Home Field Advantage (Q2)

\(p_0 = 0.50\). Is win rate greater? \(H_0: p = 0.50\), \(H_a: p > 0.50\).

D. Example Calculation: Driving Test (Q4)

Example: Driving Test Hypothesis Test (Q4)

\(p_0 = 0.80\), \(n=90\), \(x=61\). Test if rate is lower at \(\alpha = 0.05\).

Hypotheses: \(H_0: p = 0.80\), \(H_a: p < 0.80\).
Conditions: \(np_0=72\ge 10\), \(n(1-p_0)=18\ge 10\). Met.
\(\alpha = 0.05\).
Test Statistic: \(\hat{p} \approx 0.6778\). \( z \approx -2.8985 \)
P-value (Left-tailed): \(P(Z \le -2.8985) \approx 0.0019\).
Decision: Reject \(H_0\) (since \(0.0019 \le 0.05\)).
Conclusion: Significant evidence rate is lower than 80%.

Visualization (Left-Tailed Test)

Observed Z ≈ -2.90 | Critical Z (α=0.05, left-tail) ≈ -1.645

E. Example Calculation: Union Membership (Q5)

Example: Union Membership Hypothesis Test (Q5)

\(p_0 = 0.135\), \(n=2000\), \(x=240\). Test if rate is different at \(\alpha = 0.05\).

Hypotheses: \(H_0: p = 0.135\), \(H_a: p \neq 0.135\).
Conditions: \(np_0=270\ge 10\), \(n(1-p_0)=1730\ge 10\). Met.
\(\alpha = 0.05\).
Test Statistic: \(\hat{p} = 0.12\). \( z \approx -1.963 \)
P-value (Two-tailed): \(2 \times P(Z \le -1.963) \approx 0.0496\).
Decision: Reject \(H_0\) (since \(0.0496 \le 0.05\)).
Conclusion: Significant evidence city rate differs from national rate.

Visualization (Two-Tailed Test)

Observed Z ≈ -1.96 | Critical Z (α=0.05, two-tail) ≈ ±1.96

VI. Inferences for Two Population Proportions

Comparing proportions (\(p_1, p_2\)) from two independent groups.

A. Notation

Group 1: \(p_1, n_1, x_1, \hat{p}_1\). Group 2: \(p_2, n_2, x_2, \hat{p}_2\).

B. Sampling Distribution of \(\hat{p}_1 - \hat{p}_2\)

Mean: \( p_1 - p_2 \)
SE: \( \sqrt{\frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}} \)
Shape: Approx. Normal if conditions met for both.

C. Confidence Interval for \(p_1 - p_2\)

Conditions: Indep. Random Samples; Normality (\(n\hat{p}, n(1-\hat{p}) \ge 10\) for both); Independence.
Formula:
\( (\hat{p}_1 - \hat{p}_2) \pm z^* \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}} \)
Interpretation: C% confident difference is in interval. Check if 0 is included.

D. Hypothesis Test for \(p_1 - p_2\)

Hypotheses: Usually \(H_0: p_1 = p_2\) vs. \(H_a\).
Conditions: Indep. Random Samples; Normality (use \(\hat{p}_p\): \(n\hat{p}_p, n(1-\hat{p}_p) \ge 10\) for both); Independence.
Significance Level \(\alpha\).
Pooled Proportion (\(\hat{p}_p\)):
\( \hat{p}_p = \frac{x_1 + x_2}{n_1 + n_2} \)
Test Statistic (z):
\( z = \frac{(\hat{p}_1 - \hat{p}_2) - 0}{\sqrt{\hat{p}_p(1-\hat{p}_p) \left( \frac{1}{n_1} + \frac{1}{n_2} \right)}} \)
P-value or Critical Value.
Decision.
Conclusion.

VII. Formula Summary

Concept	Formula	Notes
Sample Proportion	\( \hat{p} = \frac{x}{n} \)	Estimate of \(p\).
Standard Error (1-Prop CI)	\( SE(\hat{p}) = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \)	Used for 1-prop CI.
Standard Deviation (1-Prop Test)	\( SD(\hat{p}) = \sqrt{\frac{p_0(1-p_0)}{n}} \)	Used for 1-prop HT (under \(H_0\)).
Margin of Error (1-Prop CI)	\( E = z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \)	Width determinant for 1-prop CI.
One-Proportion z-Interval	\( \hat{p} \pm z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \)	Interval estimate for \(p\).
One-Proportion z-Test Statistic	\( z = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}} \)	Tests \(H_0: p=p_0\).
Sample Size (1-Prop CI, guess)	\( n = \hat{p}_{guess}(1-\hat{p}_{guess}) \left( \frac{z^*}{E} \right)^2 \)	Round up.
Sample Size (1-Prop CI, conservative)	\( n = 0.25 \left( \frac{z^*}{E} \right)^2 \)	Use \(\hat{p}_{guess}=0.5\). Round up.
Pooled Sample Proportion	\( \hat{p}_p = \frac{x_1 + x_2}{n_1 + n_2} \)	Used for 2-prop z-test (under \(H_0\)).
Standard Error (2-Prop CI, unpooled)	\( SE = \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}} \)	Used for 2-prop CI.
Standard Error (2-Prop Test, pooled)	\( SE_p = \sqrt{\hat{p}_p(1-\hat{p}_p) \left( \frac{1}{n_1} + \frac{1}{n_2} \right)} \)	Used for 2-prop z-test (under \(H_0\)).
Two-Proportion z-Interval	\( (\hat{p}_1 - \hat{p}_2) \pm z^* \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}} \)	Interval estimate for \(p_1 - p_2\).
Two-Proportion z-Test Statistic	\( z = \frac{(\hat{p}_1 - \hat{p}_2) - 0}{\sqrt{\hat{p}_p(1-\hat{p}_p) \left( \frac{1}{n_1} + \frac{1}{n_2} \right)}} \)	Tests \(H_0: p_1 = p_2\).

VIII. Final Conclusion

Inferences for population proportions involve estimating unknown proportions using confidence intervals or testing claims using hypothesis tests. These procedures rely on sample proportions and the properties of their sampling distributions, particularly the normal approximation under certain conditions.

Confidence Intervals provide a range of plausible values for \(p\) or \(p_1 - p_2\).
Hypothesis Tests assess evidence against a null claim about \(p\) or the equality of \(p_1\) and \(p_2\).

Careful checking of conditions (Randomness, Normality, Independence) is essential before applying these z-procedures for either one or two proportions.