# What is the pooled estimate of the population variance

## Pooled variance t-procedure

The pooled variance t-procedure uses a pooled variance for comparing two means. It assumes that the two population variances are equal which is rarely known, and therefore, the pooled variance t-procedure is not commonly applied in statistics.

## When to use pooled variance t-procedure?

The advantage of the pooled variance t-procedure is that it follows an exact t-distribution. But its disadvantage is that the population variances are assumed to be equal. As we do not know the population variances, neither do we know if they really are equal.

If we expect the population variances to be equal the pooled variance t-procedure might be adequate. We can calculate a ratio of the sample standard deviations: s1/s2

The closer the ratio is to 1 the greater the probability that the pooled variance t-procedure is adequate because the more alike these sample standard deviations are. A general guideline is that the pooled variance procedure can be applied when the ratio (s1/s2) is within 0.5 and 2, which means that neither deviation is more than twice the other.

One of the alternatives to the pooled variance t-procedure is the Welch unpooled t-procedure. The advantage of this procedure is that it does not assume equal population variances. The disadvantage is that it does not follow an exact t-distribution.

## Assumption for the pooled variance t-procedure

• Independent simple random samples, or randomized experiments
• The two populations follow a normal distribution
• Population variances are equal

## Main purposes

The main purposes of estimating and testing with the pooled variance t-procedure are:

• Estimating a confidence interval for the difference of mean-1 and mean-2 returns a range of value in which we can be (for example) 95% confident that our true difference of means lies within. If this doesnt include zero, we can be 95% confident that the true population mean difference is different from zero and thereby that there is a difference in the two means. This is mathematically expressed with a hypothesis test:
• Typically, we test if there is evidence that mean-1 is different from mean-2. Is there a significant difference between the populations? The p-value is the likelihood of getting a result at least as extreme as the one we get from our sample assuming that there is no difference between the two means.

## Pooled sample variance

As we assume the two population variances to be equal, we can pool the two sample variances taking a weighted average of these two. It will end up somewhere between the two sample variances tending to be closer to the one with the largest sample size, as it can be deducted from the formula:

The pooled sample variance is the estimator of the common population variance (σ2).

## The standard error (SE) of the difference

The standard error (SE) of the difference in sample means is applied when comparing the two means through confidence intervals and hypothesis testing. The SE formula:

SE is the estimator of the standard deviation of the sampling distribution of the difference.

## Confidence interval for the difference

Having the pooled sample variance and the SE, we can complete the formula for the confidence interval of the difference:

## Hypothesis test for the difference

A hypothesis test for the difference tests if there is evidence to support a rejection of H0 as we know it from hypothesis testing in other statistical procedures. Usually, the null hypothesis if the difference is equal to or different from zero, like we have seen it in Comparing two means and Comparing two proportions.

The p-value expresses how likely we are to get that an extreme of a result as the one we got from our sample assuming that there is no difference in the means.

Typically, the hypotheses are expressed as follows:

As we are usually interested in knowing if there is a difference or not, we are looking both on the lower side of the mean and as on the upper side, which is a two-tailed test.

The test statistic is calculated similarly to what we know from other statistical procedures. We compare the difference to the SE of the difference:

Concluding on the hypothesis test: As in other kind of hypothesis test a very low p-value expresses that there is very little chance of getting as extreme a result as the one we got from our sample assuming that the means should be equal.

Therefore, a very low p-value, gives very strong evidence against the null hypothesis and thus against the claim that the two means should be equal. We reject the null hypothesis when the p-value is lower than the pre-set significance level (α).

## Worked example

In the following, we will run through an example making inference through confidence interval and hypothesis test.

### The story

Say, that a curious teacher who works in a large international education organization wish to test the difference in test scores between students who attend courses run during morning hours with the ones run in the afternoon.

### The sample

She takes two randomly selected samples, one amongst the population of morning-students and one amongst the evening-students. Each sample size is of 10 students, and she assumes that the population variances are equal and therefore runs a pooled variance t-procedure.

The sample outcomes for morning-student is a mean-score of 72.8 with a sample variance of 15.43. The sample outcomes for evening-student is a mean-score of 64.7 with a sample variance of 12.29.

### Confidence interval

To make inference on this difference of (72.8 64.7=) 8.1, she runs a 95% confidence interval. First, she calculates a pooled variance which is then applied to calculate the standard error of the difference (SE), and the SE finally serves to calculate the confidence interval. The critical value for a 95% confidence interval can be looked up in a t-table or in statistical software. It is 2.10.

The 95% confidence interval spans from -5 to 21. This means that, based on her samples of 10 each, she can feel confident that there is a 95% probability that the true population mean difference oscillates between -5 and 21.

This confidence interval includes 0, which means that a null hypothesis stating that there is no difference in means, could not be rejected.

### Hypothesis test

The teacher wishes to state a mathematical expression for what her confidence interval, above, already tells her: That there is no evidence that there should be a difference. She conducts a confidence interval:

The test statistic is not in the rejection area as it lies between the critical values of -2.10 and 2.10. The teacher therefore fails to reject the null hypothesis. There is no evidence to conclude that there is a difference between the test scores of the two groups.

The p-value is 0.211, which means that there is a 21.1% chance that she would get this extreme a result assuming that there is no difference in the means. That is a fairly large chance and she can therefore not reject the null hypothesis.

### New and larger samples

The teacher, who really feel that she must be right in her intuition, suspects that the sample size simples has been too small. She now takes two samples, each of size 28. For the sake of this exercise, say she gets almost the sample result (sample mean difference = 9.0 and a pooled sample variance of 215.5). She, therefore, reject the null hypothesis with test statistic of 2.29 compared to a critical value at α = 0.05) = 2.00.

The p-value is 0.026, which means that there is a 2.6% probability that she would get this extreme a result assuming that there is no difference in the means. At a significance level of alpha = 0.05, she will reject the null hypothesis and consider that the 2.6% probability is too extreme assuming that the two means should be equal.

We recall that this t-procedure assumes that the population variances are equal, which we didnt know in this example. Therefore this pooled variance t-procedure might not be the most adequate for the situation.

## Pooled variance t-procedure in MS Excel

In Excel we run a t-Test: Two-Sample Assuming Equal Variances from the Data >> Data Analysis menu. This does not calculate the confidence interval which Im calculating by

Coming

## Learnings on pooled variance t-procedure

• Penn State Eberly College of Science:
• Text page. Short step-by-step examples: Comparing Two Independent Means Unpooled and Pooled
• Text page: Pooled Variances
• JBstatistics: Video (11:03): Pooled variance T tests and confidence intervals: Introduction

#### Carsten Grube

Freelance Data Analyst

##### Probabilities

Sample spaces & events
Complement of an event
Independent events
Dependent events
Mutually exclusive
Mutually inclusive
Permutation
Combinations
Conditional probability
Law of total probability
Bayes' Theorem

p
##### Summarizing data

Mean, median and mode
Interquartile range (IQR)
Populationσ² &σ
Sample s² & s

##### Discrete distribution

Discrete vs. continuous
Disc. prob. distribution
Mean, var. & std. dev.
Mean of sum & dif.
Binomial distribution
Poisson distribution
Geometric distribution
Hypergeometric dist.

##### Modelling data

Continuous vs. discrete
Density curves
Significance level
Critical value
Z-scores
P-value
Central Limit Theorem
Skewness and kurtosis

##### Normal distribution

Normal distribution
Empirical Rule
Z-table for proportions
Student's t-distribution

p
##### Study design

Statistical questions
Census and sampling
Non-probability sampling
Probability sampling
Bias

##### Confidence intervals

Confidence intervals
CI for a population
CI for a mean

##### Hypothesis tests

Hypothesis testing
One-tailed tests
Two-tailed tests
Test around 1 proportion
Hypoth. test for a mean
Statistical power
Stat. power calculation
Chi-square test

##### Simple linear regression, fundamentals

Scatter plots
Correlation coefficient
Regression line
Squared errors of line
Coef. of determination, r²

p
##### Simple linear regression, Inference

Inference on regression
LINER model
Residual plots
Std. error slope
Confidence interval slope
Hypothesis test for slope
Response intervals
Influential points
Precautions in SLR
Transformation of data

##### Two-sample inference

Comparing 2 proportions
Comparing 2 means
Pooledvariance t-proced.

##### ANOVA & the F-distribution

One-way ANOVA
Multiple comparison
Two-way ANOVA

### Submit a Comment Cancel reply

+34 616 71 29 85

Call me

Spain: Ctra. 404, km 2, 29100 Coín, Malaga

...........

Denmark: c/o Musvitvej 4, 3660 Stenløse

Drop me a line

What are you working on just now? Can I help you, and can you help me?

Name

Message

Submit

Learning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children.

• Follow
• Follow
• Follow
• Follow
• Follow
• Follow
##### What they say

20 years in sales, analysis, journalism and startups. See what my customers and partners sayabout me.

• Services
• What they say
• Statistics