Lecture Notes: Inferences for Population Proportions

I. Introduction to Proportions

Often, we are interested in the proportion (or percentage) of individuals in a population who possess a certain characteristic or fall into a specific category. For example:

A. Key Definitions

B. Calculating the Sample Proportion

The sample proportion is calculated using the formula:

\( \hat{p} = \frac{x}{n} \)

Where:

Note: \(\hat{p}\) will always be a value between 0 and 1 (inclusive).

II. The Sampling Distribution of \(\hat{p}\)

Imagine taking many random samples of the same size \(n\) from a population with proportion \(p\). If we calculate \(\hat{p}\) for each sample, the distribution of these \(\hat{p}\) values (the sampling distribution) has important properties:

Condition for Normality

We can assume the sampling distribution of \(\hat{p}\) is approximately normal if the expected number of successes and failures are both sufficiently large. Common checks (using 10 as a slightly more conservative threshold often seen):

(Note: Some sources use 5 or 15; check your course requirements. Using 10 is common.)

III. Common Critical Values (z*) from Standard Normal Distribution

Quick Reference: Critical Values (z*)

Critical values (\(z^*\)) are essential for constructing confidence intervals and performing hypothesis tests using the critical value approach. They represent the number of standard deviations away from the mean needed to capture a certain area under the standard normal curve.

Confidence Level (C) Significance Level (\(\alpha = 1-C\)) Two-Tailed \(z^*\)
(for CI & \(H_a: \neq\))
Left-Tailed \(z^*\)
(for \(H_a: <\))
Right-Tailed \(z^*\)
(for \(H_a: >\))
90% 0.10 \(\pm 1.645\) \(-1.282\) \(+1.282\)
95% 0.05 \(\pm 1.96\) \(-1.645\) \(+1.645\)
99% 0.01 \(\pm 2.576\) \(-2.326\) \(+2.326\)

Note: Left/Right tailed values correspond to the critical value for a hypothesis test with the specified \(\alpha\). Two-tailed values are used for confidence intervals (where \(\alpha/2\) is in each tail) and two-tailed hypothesis tests.

IV. Confidence Intervals for One Population Proportion

A confidence interval provides a range of plausible values for the unknown population proportion \(p\), based on sample data.

A. Purpose and Interpretation

"We are C% confident that the true population proportion \(p\) lies within the interval (Lower Bound, Upper Bound)."

B. Conditions/Assumptions for the One-Proportion z-Interval

  1. Random Sample
  2. Normality Condition: \(x = n\hat{p} \ge 10\) and \(n-x = n(1-\hat{p}) \ge 10\). (Using 10)
  3. Independence Condition: \(N \ge 10n\) if sampling without replacement.

C. Formula for the One-Proportion z-Interval

Point Estimate \(\pm\) Margin of Error
\( \hat{p} \pm E \)

Where the Margin of Error (E) is:

\( E = z^* \times SE(\hat{p}) = z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \)

\(z^*\) is the critical value from the table above for the desired confidence level (use the two-tailed value).

Example: Drug Test Poll (Q3)

\(n=21039\), \(\hat{p} = 0.63\), 99% CI.

  1. Conditions met (using threshold 10).
  2. \(z^* \approx 2.576\) (from table for 99% confidence).
  3. \( E = 2.576 \sqrt{\frac{0.63(0.37)}{21039}} \approx 0.0086 \)
  4. CI: \( 0.63 \pm 0.0086 \Rightarrow (0.6214, 0.6386) \)
  5. Interpretation: We are 99% confident...

D. Determining Required Sample Size

\( n = \hat{p}_{guess}(1-\hat{p}_{guess}) \left( \frac{z^*}{E} \right)^2 \)

Use \(\hat{p}_{guess} = 0.5\) if conservative. Round \(n\) up.

Example: Sample Size Calculation

\(E = 0.03\), 95% CI (\(z^* \approx 1.96\)).

  1. Use \(\hat{p}_{guess} = 0.5\).
  2. \( n = 0.5(0.5) \left( \frac{1.96}{0.03} \right)^2 \approx 1067.11 \)
  3. Round Up: \(n = 1068\).

V. Hypothesis Testing for One Population Proportion

A formal procedure to decide between two claims about \(p\).

A. Purpose

Assess if sample evidence contradicts a claim (\(H_0\)) about \(p_0\).

B. The Steps of Hypothesis Testing (One-Proportion z-Test)

  1. State Hypotheses: \(H_0: p = p_0\) vs. \(H_a\) (\(\neq, <, >\)).
  2. Check Conditions: Random Sample, Normality (\(np_0, n(1-p_0) \ge 10\)), Independence. (Using 10)
  3. Set Significance Level (\(\alpha\)): Usually 0.05.
  4. Calculate Test Statistic (z):
    \( z = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}} \)
  5. Determine P-value OR Find Critical Value(s). (Use Z* table above for critical value approach).
  6. Make Decision: Reject \(H_0\) if P-value \(\le \alpha\) or if \(z\) is in the rejection region.
  7. State Conclusion in Context.

C. Examples of Setting up Hypotheses

Example: India Births (Q1)

\(p_0 = 0.517\). Has proportion changed? \(H_0: p = 0.517\), \(H_a: p \neq 0.517\).

Example: Home Field Advantage (Q2)

\(p_0 = 0.50\). Is win rate greater? \(H_0: p = 0.50\), \(H_a: p > 0.50\).

D. Example Calculation: Driving Test (Q4)

Example: Driving Test Hypothesis Test (Q4)

\(p_0 = 0.80\), \(n=90\), \(x=61\). Test if rate is lower at \(\alpha = 0.05\).

  1. Hypotheses: \(H_0: p = 0.80\), \(H_a: p < 0.80\).
  2. Conditions: \(np_0=72\ge 10\), \(n(1-p_0)=18\ge 10\). Met.
  3. \(\alpha = 0.05\).
  4. Test Statistic: \(\hat{p} \approx 0.6778\). \( z \approx -2.8985 \)
  5. P-value (Left-tailed): \(P(Z \le -2.8985) \approx 0.0019\).
  6. Decision: Reject \(H_0\) (since \(0.0019 \le 0.05\)).
  7. Conclusion: Significant evidence rate is lower than 80%.

Visualization (Left-Tailed Test)

Observed Z ≈ -2.90 | Critical Z (α=0.05, left-tail) ≈ -1.645

E. Example Calculation: Union Membership (Q5)

Example: Union Membership Hypothesis Test (Q5)

\(p_0 = 0.135\), \(n=2000\), \(x=240\). Test if rate is different at \(\alpha = 0.05\).

  1. Hypotheses: \(H_0: p = 0.135\), \(H_a: p \neq 0.135\).
  2. Conditions: \(np_0=270\ge 10\), \(n(1-p_0)=1730\ge 10\). Met.
  3. \(\alpha = 0.05\).
  4. Test Statistic: \(\hat{p} = 0.12\). \( z \approx -1.963 \)
  5. P-value (Two-tailed): \(2 \times P(Z \le -1.963) \approx 0.0496\).
  6. Decision: Reject \(H_0\) (since \(0.0496 \le 0.05\)).
  7. Conclusion: Significant evidence city rate differs from national rate.

Visualization (Two-Tailed Test)

Observed Z ≈ -1.96 | Critical Z (α=0.05, two-tail) ≈ ±1.96

VI. Inferences for Two Population Proportions

Comparing proportions (\(p_1, p_2\)) from two independent groups.

A. Notation

Group 1: \(p_1, n_1, x_1, \hat{p}_1\). Group 2: \(p_2, n_2, x_2, \hat{p}_2\).

B. Sampling Distribution of \(\hat{p}_1 - \hat{p}_2\)

C. Confidence Interval for \(p_1 - p_2\)

  1. Conditions: Indep. Random Samples; Normality (\(n\hat{p}, n(1-\hat{p}) \ge 10\) for both); Independence.
  2. Formula:
    \( (\hat{p}_1 - \hat{p}_2) \pm z^* \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}} \)
  3. Interpretation: C% confident difference is in interval. Check if 0 is included.

D. Hypothesis Test for \(p_1 - p_2\)

  1. Hypotheses: Usually \(H_0: p_1 = p_2\) vs. \(H_a\).
  2. Conditions: Indep. Random Samples; Normality (use \(\hat{p}_p\): \(n\hat{p}_p, n(1-\hat{p}_p) \ge 10\) for both); Independence.
  3. Significance Level \(\alpha\).
  4. Pooled Proportion (\(\hat{p}_p\)):
    \( \hat{p}_p = \frac{x_1 + x_2}{n_1 + n_2} \)
  5. Test Statistic (z):
    \( z = \frac{(\hat{p}_1 - \hat{p}_2) - 0}{\sqrt{\hat{p}_p(1-\hat{p}_p) \left( \frac{1}{n_1} + \frac{1}{n_2} \right)}} \)
  6. P-value or Critical Value.
  7. Decision.
  8. Conclusion.

VII. Formula Summary

Concept Formula Notes
Sample Proportion
\( \hat{p} = \frac{x}{n} \)
Estimate of \(p\).
Standard Error (1-Prop CI)
\( SE(\hat{p}) = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \)
Used for 1-prop CI.
Standard Deviation (1-Prop Test)
\( SD(\hat{p}) = \sqrt{\frac{p_0(1-p_0)}{n}} \)
Used for 1-prop HT (under \(H_0\)).
Margin of Error (1-Prop CI)
\( E = z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \)
Width determinant for 1-prop CI.
One-Proportion z-Interval
\( \hat{p} \pm z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \)
Interval estimate for \(p\).
One-Proportion z-Test Statistic
\( z = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}} \)
Tests \(H_0: p=p_0\).
Sample Size (1-Prop CI, guess)
\( n = \hat{p}_{guess}(1-\hat{p}_{guess}) \left( \frac{z^*}{E} \right)^2 \)
Round up.
Sample Size (1-Prop CI, conservative)
\( n = 0.25 \left( \frac{z^*}{E} \right)^2 \)
Use \(\hat{p}_{guess}=0.5\). Round up.
Pooled Sample Proportion
\( \hat{p}_p = \frac{x_1 + x_2}{n_1 + n_2} \)
Used for 2-prop z-test (under \(H_0\)).
Standard Error (2-Prop CI, unpooled)
\( SE = \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}} \)
Used for 2-prop CI.
Standard Error (2-Prop Test, pooled)
\( SE_p = \sqrt{\hat{p}_p(1-\hat{p}_p) \left( \frac{1}{n_1} + \frac{1}{n_2} \right)} \)
Used for 2-prop z-test (under \(H_0\)).
Two-Proportion z-Interval
\( (\hat{p}_1 - \hat{p}_2) \pm z^* \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}} \)
Interval estimate for \(p_1 - p_2\).
Two-Proportion z-Test Statistic
\( z = \frac{(\hat{p}_1 - \hat{p}_2) - 0}{\sqrt{\hat{p}_p(1-\hat{p}_p) \left( \frac{1}{n_1} + \frac{1}{n_2} \right)}} \)
Tests \(H_0: p_1 = p_2\).

VIII. Final Conclusion

Inferences for population proportions involve estimating unknown proportions using confidence intervals or testing claims using hypothesis tests. These procedures rely on sample proportions and the properties of their sampling distributions, particularly the normal approximation under certain conditions.

Careful checking of conditions (Randomness, Normality, Independence) is essential before applying these z-procedures for either one or two proportions.