Vinay Kanth Rao Kodipelly — https://kanth-vinay.github.io/

Introductory Hypothesis Testing in R (R Code Examples)

1a. Z‑Test (Using Known \(\sigma\))

Raw Data Example:

# Data and parameters:
data <- c(54, 57, 53, 59, 58)
x_bar <- mean(data)         # Sample mean
sigma <- 5                  # Known population standard deviation (sigma)
n <- length(data)           # Sample size
mu_0 <- 55                  # Hypothesized population mean

# Compute the z-statistic:
z <- (x_bar - mu_0) / (sigma / sqrt(n))

# For a left-tailed test:
p_left <- pnorm(z)
critical_value_left <- qnorm(0.05)  # Example with alpha = 0.05

# For a right-tailed test:
p_right <- 1 - pnorm(z)
critical_value_right <- qnorm(0.95)

# For a two-tailed test:
p_two <- 2 * (1 - pnorm(abs(z)))
critical_value_two_upper <- qnorm(0.975)  # Example with alpha = 0.05
critical_value_two_lower <- qnorm(0.025)

# Compute Confidence Interval (Two-tailed):
# CI = x_bar ± (critical_value_two_upper)*(sigma/sqrt(n))
CI_lower <- x_bar - critical_value_two_upper*(sigma/sqrt(n))
CI_upper <- x_bar + critical_value_two_upper*(sigma/sqrt(n))

# Output the z-statistic, p-values, and Confidence Interval:
z
p_left
critical_value_left
p_right
critical_value_right
p_two
critical_value_two_lower
critical_value_two_upper
CI_lower
CI_upper
              

Summary Statistics Example:

# When using summary statistics, comment out the Raw Data part.
x_bar <- 56.2  # Example
sigma <- 5
n <- 5
mu_0 <- 55
z <- (x_bar - mu_0) / (sigma / sqrt(n))

# Left-tailed test:
p_left <- pnorm(z)
critical_value_left <- qnorm(0.05)

# Right-tailed test:
p_right <- 1 - pnorm(z)
critical_value_right <- qnorm(0.95)

# Two-tailed test:
p_two <- 2 * (1 - pnorm(abs(z)))
critical_value_two_upper <- qnorm(0.975)
critical_value_two_lower <- qnorm(0.025)

# Compute Confidence Interval (Two-tailed):
CI_lower <- x_bar - critical_value_two_upper*(sigma/sqrt(n))
CI_upper <- x_bar + critical_value_two_upper*(sigma/sqrt(n))

# Output the z-statistic, p-values, and Confidence Interval:
z
p_left
critical_value_left
p_right
critical_value_right
p_two
critical_value_two_lower
critical_value_two_upper
CI_lower
CI_upper
              

Formula: \( z = \frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}} \)
Confidence Interval: \( \bar{x} \pm z_{\alpha/2}\frac{\sigma}{\sqrt{n}} \)

1b. One-Sample T‑Test

# Data and parameters:
data <- c(54, 57, 53, 59, 58)
x_bar <- mean(data)          # Sample mean
s <- sd(data)                # Sample standard deviation (s)
n <- length(data)            # Sample size
mu_0 <- 55                   # Hypothesized population mean

# Compute the t-statistic:
t <- (x_bar - mu_0) / (s / sqrt(n))

# Degrees of freedom:
df <- n - 1

# Two-tailed test:
p_two <- 2 * (1 - pt(abs(t), df = df))
critical_value_two_upper <- qt(0.975, df = df)
critical_value_two_lower <- qt(0.025, df = df)

# Left-tailed test:
p_left <- pt(t, df = df)
critical_value_left <- qt(0.05, df = df)

# Right-tailed test:
p_right <- 1 - pt(t, df = df)
critical_value_right <- qt(0.95, df = df)

# Compute Confidence Interval (Two-tailed):
CI_lower <- x_bar - critical_value_two_upper*(s/sqrt(n))
CI_upper <- x_bar + critical_value_two_upper*(s/sqrt(n))

# Output the t-statistic, degrees of freedom, p-values, and Confidence Interval:
t
df
p_two
critical_value_two_lower
critical_value_two_upper
p_left
critical_value_left
p_right
critical_value_right
CI_lower
CI_upper
              

Formula: \( t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}, \quad df = n-1 \)
Confidence Interval: \( \bar{x} \pm t_{\alpha/2,\,n-1}\frac{s}{\sqrt{n}} \)

2a. Pooled t‑Test (Equal Variances)

# Data for two independent groups:
group1 <- c(10, 12, 11, 13, 12)
group2 <- c(9, 11, 10, 12, 11)

# Sample sizes:
n1 <- length(group1)
n2 <- length(group2)

# Sample means:
x_bar1 <- mean(group1)
x_bar2 <- mean(group2)

# Sample standard deviations:
s1 <- sd(group1)
s2 <- sd(group2)

# Calculate the pooled variance (S_p^2):
S_p2 <- (((n1 - 1) * s1^2) + ((n2 - 1) * s2^2)) / (n1 + n2 - 2)

# Compute the t-statistic:
t <- (x_bar1 - x_bar2) / sqrt(S_p2 * (1/n1 + 1/n2))

# Degrees of freedom:
df <- n1 + n2 - 2

# Two-tailed test:
p_two <- 2 * (1 - pt(abs(t), df = df))
critical_value_two_upper <- qt(0.975, df = df)
critical_value_two_lower <- qt(0.025, df = df)

# Left-tailed test:
p_left <- pt(t, df = df)
critical_value_left <- qt(0.05, df = df)

# Right-tailed test:
p_right <- 1 - pt(t, df = df)
critical_value_right <- qt(0.95, df = df)

# Compute Confidence Interval (Two-tailed):
CI_lower <- (x_bar1 - x_bar2) - critical_value_two_upper * sqrt(S_p2*(1/n1 + 1/n2))
CI_upper <- (x_bar1 - x_bar2) + critical_value_two_upper * sqrt(S_p2*(1/n1 + 1/n2))

# Output the t-statistic, pooled variance, degrees of freedom, p-values, and Confidence Interval:
t
S_p2
df
p_two
critical_value_two_lower
critical_value_two_upper
p_left
critical_value_left
p_right
critical_value_right
CI_lower
CI_upper
              

Formula:
\( t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{S_p^2\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}}, \quad S_p = \sqrt{\frac{(n_1-1)s_1^2+(n_2-1)s_2^2}{n_1+n_2-2}} \)
Confidence Interval: \( (\bar{x}_1-\bar{x}_2) \pm t_{\alpha/2,\,n_1+n_2-2}\sqrt{S_p^2\left(\frac{1}{n_1}+\frac{1}{n_2}\right)} \)

2b. Unpooled (Welch's) t‑Test (Unequal Variances)

# Data for two independent groups:
group1 <- c(10, 12, 11, 13, 12)
group2 <- c(9, 11, 10, 12, 11)

# Sample sizes:
n1 <- length(group1)
n2 <- length(group2)

# Sample means:
x_bar1 <- mean(group1)
x_bar2 <- mean(group2)

# Sample standard deviations:
s1 <- sd(group1)
s2 <- sd(group2)

# Compute the t-statistic:
t <- (x_bar1 - x_bar2) / sqrt(s1^2/n1 + s2^2/n2)

# Calculate degrees of freedom using the Welch-Satterthwaite formula:
df <- ( (s1^2/n1 + s2^2/n2)^2 ) / ( ((s1^2/n1)^2)/(n1 - 1) + ((s2^2/n2)^2)/(n2 - 1) )

# Two-tailed test:
p_two <- 2 * (1 - pt(abs(t), df = df))
critical_value_two_upper <- qt(0.975, df = df)
critical_value_two_lower <- qt(0.025, df = df)

# Left-tailed test:
p_left <- pt(t, df = df)
critical_value_left <- qt(0.05, df = df)

# Right-tailed test:
p_right <- 1 - pt(t, df = df)
critical_value_right <- qt(0.95, df = df)

# Compute Confidence Interval (Two-tailed):
CI_lower <- (x_bar1 - x_bar2) - critical_value_two_upper * sqrt(s1^2/n1 + s2^2/n2)
CI_upper <- (x_bar1 - x_bar2) + critical_value_two_upper * sqrt(s1^2/n1 + s2^2/n2)

# Output the t-statistic, degrees of freedom, p-values, and Confidence Interval:
t
df
p_two
critical_value_two_lower
critical_value_two_upper
p_left
critical_value_left
p_right
critical_value_right
CI_lower
CI_upper
              

Formula:
\( t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}},\quad df = \frac{\left(\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}\right)^2}{\frac{(s_1^2/n_1)^2}{n_1-1}+\frac{(s_2^2/n_2)^2}{n_2-1}} \)
Confidence Interval: \( (\bar{x}_1-\bar{x}_2) \pm t_{\alpha/2,\,df}\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}} \)

Note: Students must round down the degrees of freedom to the nearest integer.

2c. Paired t‑Test

# Paired data (e.g., before and after intervention):
before <- c(200, 195, 210, 205, 198)
after <- c(190, 185, 205, 200, 192)

# Calculate the differences:
diff <- before - after

# Summary statistics for differences:
mean_diff <- mean(diff)
s_diff <- sd(diff)
n <- length(diff)

# Compute the t-statistic:
t <- mean_diff / (s_diff / sqrt(n))

# Degrees of freedom:
df <- n - 1

# Two-tailed test:
p_two <- 2 * (1 - pt(abs(t), df = df))
critical_value_two_upper <- qt(0.975, df = df)
critical_value_two_lower <- qt(0.025, df = df)

# Left-tailed test:
p_left <- pt(t, df = df)
critical_value_left <- qt(0.05, df = df)

# Right-tailed test:
p_right <- 1 - pt(t, df = df)
critical_value_right <- qt(0.95, df = df)

# Compute Confidence Interval (Two-tailed):
CI_lower <- mean_diff - critical_value_two_upper * (s_diff/sqrt(n))
CI_upper <- mean_diff + critical_value_two_upper * (s_diff/sqrt(n))

# Output the t-statistic, mean difference, standard deviation of differences,
# degrees of freedom, p-values, and Confidence Interval:
t
mean_diff
s_diff
df
p_two
critical_value_two_lower
critical_value_two_upper
p_left
critical_value_left
p_right
critical_value_right
CI_lower
CI_upper
              

Formula: \( t = \frac{\bar{d}}{s_d/\sqrt{n}}, \quad df = n-1 \)
Confidence Interval: \( \bar{d} \pm t_{\alpha/2,\,n-1}\frac{s_d}{\sqrt{n}} \) (where \( d \) are the paired differences)

Mann‑Whitney U Test (Wilcoxon Rank Sum Test)

# Two independent groups:
group1 <- c(15, 18, 16, 17, 19)
group2 <- c(14, 16, 15, 13, 17)

# Two-tailed test:
result_two <- wilcox.test(group1, group2, alternative = "two.sided")
result_two

# Left-tailed test:
result_left <- wilcox.test(group1, group2, alternative = "less")
result_left

# Right-tailed test:
result_right <- wilcox.test(group1, group2, alternative = "greater")
result_right
              

Note: Nonparametric test based on ranks. Confidence intervals are not typically computed via simple formulas.

Wilcoxon Signed Rank Test

# One-sample data (or paired differences) compared to a hypothesized median:
data <- c(48, 52, 47, 53, 50)
mu0 <- 50

# Two-tailed test:
result_two <- wilcox.test(data, mu = mu0, alternative = "two.sided")
result_two

# Left-tailed test:
result_left <- wilcox.test(data, mu = mu0, alternative = "less")
result_left

# Right-tailed test:
result_right <- wilcox.test(data, mu = mu0, alternative = "greater")
result_right
              

Note: Nonparametric test for a one-sample median (or paired differences).

Variables & Definitions

Variable Definition
\(\bar{x}\) Sample mean (one‑sample test)
\(\sigma\) Known population standard deviation
\(s\) Sample standard deviation (one‑sample test)
\(n\) Sample size
\(\mu_0\) Hypothesized population mean
\(\bar{x}_1\) Sample mean of Group 1 (two‑sample test)
\(\bar{x}_2\) Sample mean of Group 2 (two‑sample test)
\(s_1\) Sample standard deviation of Group 1
\(s_2\) Sample standard deviation of Group 2
\(S_p^2\) Pooled variance
\(df\) Degrees of freedom
\(d\) Paired differences (paired test)
\(s_d\) Standard deviation of paired differences