Pooled variance tprocedureThe pooled variance tprocedure uses a pooled variance for comparing two means. It assumes that the two population variances are equal which is rarely known, and therefore, the pooled variance tprocedure is not commonly applied in statistics.
When to use pooled variance tprocedure?The advantage of the pooled variance tprocedure is that it follows an exact tdistribution. But its disadvantage is that the population variances are assumed to be equal. As we do not know the population variances, neither do we know if they really are equal. If we expect the population variances to be equal the pooled variance tprocedure might be adequate. We can calculate a ratio of the sample standard deviations: s1/s2 The closer the ratio is to 1 the greater the probability that the pooled variance tprocedure is adequate because the more alike these sample standard deviations are. A general guideline is that the pooled variance procedure can be applied when the ratio (s1/s2) is within 0.5 and 2, which means that neither deviation is more than twice the other. One of the alternatives to the pooled variance tprocedure is the Welch unpooled tprocedure. The advantage of this procedure is that it does not assume equal population variances. The disadvantage is that it does not follow an exact tdistribution. Assumption for the pooled variance tprocedure
Main purposesThe main purposes of estimating and testing with the pooled variance tprocedure are:
Pooled sample varianceAs we assume the two population variances to be equal, we can pool the two sample variances taking a weighted average of these two. It will end up somewhere between the two sample variances tending to be closer to the one with the largest sample size, as it can be deducted from the formula: The pooled sample variance is the estimator of the common population variance (σ2). The standard error (SE) of the differenceThe standard error (SE) of the difference in sample means is applied when comparing the two means through confidence intervals and hypothesis testing. The SE formula: SE is the estimator of the standard deviation of the sampling distribution of the difference. Confidence interval for the differenceHaving the pooled sample variance and the SE, we can complete the formula for the confidence interval of the difference: Hypothesis test for the differenceA hypothesis test for the difference tests if there is evidence to support a rejection of H0 as we know it from hypothesis testing in other statistical procedures. Usually, the null hypothesis if the difference is equal to or different from zero, like we have seen it in Comparing two means and Comparing two proportions. The pvalue expresses how likely we are to get that an extreme of a result as the one we got from our sample assuming that there is no difference in the means. Typically, the hypotheses are expressed as follows: As we are usually interested in knowing if there is a difference or not, we are looking both on the lower side of the mean and as on the upper side, which is a twotailed test. The test statistic is calculated similarly to what we know from other statistical procedures. We compare the difference to the SE of the difference: Concluding on the hypothesis test: As in other kind of hypothesis test a very low pvalue expresses that there is very little chance of getting as extreme a result as the one we got from our sample assuming that the means should be equal. Therefore, a very low pvalue, gives very strong evidence against the null hypothesis and thus against the claim that the two means should be equal. We reject the null hypothesis when the pvalue is lower than the preset significance level (α). Worked exampleIn the following, we will run through an example making inference through confidence interval and hypothesis test. The storySay, that a curious teacher who works in a large international education organization wish to test the difference in test scores between students who attend courses run during morning hours with the ones run in the afternoon. The sampleShe takes two randomly selected samples, one amongst the population of morningstudents and one amongst the eveningstudents. Each sample size is of 10 students, and she assumes that the population variances are equal and therefore runs a pooled variance tprocedure. The sample outcomes for morningstudent is a meanscore of 72.8 with a sample variance of 15.43. The sample outcomes for eveningstudent is a meanscore of 64.7 with a sample variance of 12.29. Confidence intervalTo make inference on this difference of (72.8 64.7=) 8.1, she runs a 95% confidence interval. First, she calculates a pooled variance which is then applied to calculate the standard error of the difference (SE), and the SE finally serves to calculate the confidence interval. The critical value for a 95% confidence interval can be looked up in a ttable or in statistical software. It is 2.10. The 95% confidence interval spans from 5 to 21. This means that, based on her samples of 10 each, she can feel confident that there is a 95% probability that the true population mean difference oscillates between 5 and 21. This confidence interval includes 0, which means that a null hypothesis stating that there is no difference in means, could not be rejected. Hypothesis testThe teacher wishes to state a mathematical expression for what her confidence interval, above, already tells her: That there is no evidence that there should be a difference. She conducts a confidence interval: The test statistic is not in the rejection area as it lies between the critical values of 2.10 and 2.10. The teacher therefore fails to reject the null hypothesis. There is no evidence to conclude that there is a difference between the test scores of the two groups. The pvalue is 0.211, which means that there is a 21.1% chance that she would get this extreme a result assuming that there is no difference in the means. That is a fairly large chance and she can therefore not reject the null hypothesis. New and larger samplesThe teacher, who really feel that she must be right in her intuition, suspects that the sample size simples has been too small. She now takes two samples, each of size 28. For the sake of this exercise, say she gets almost the sample result (sample mean difference = 9.0 and a pooled sample variance of 215.5). She, therefore, reject the null hypothesis with test statistic of 2.29 compared to a critical value at α = 0.05) = 2.00. The pvalue is 0.026, which means that there is a 2.6% probability that she would get this extreme a result assuming that there is no difference in the means. At a significance level of alpha = 0.05, she will reject the null hypothesis and consider that the 2.6% probability is too extreme assuming that the two means should be equal. We recall that this tprocedure assumes that the population variances are equal, which we didnt know in this example. Therefore this pooled variance tprocedure might not be the most adequate for the situation. Pooled variance tprocedure in MS ExcelIn Excel we run a tTest: TwoSample Assuming Equal Variances from the Data >> Data Analysis menu. This does not calculate the confidence interval which Im calculating by Pooled variance tprocedure in R statistical programmingComing Learnings on pooled variance tprocedure
Carsten GrubeFreelance Data Analyst
ProbabilitiesSample spaces & events p Summarizing dataMean, median and mode Discrete distributionDiscrete vs. continuous Modelling dataContinuous vs. discrete Normal distributionNormal distribution p Study designStatistical questions Confidence intervalsConfidence intervals Hypothesis testsHypothesis testing Simple linear regression, fundamentalsScatter plots p Simple linear regression, InferenceInference on regression Twosample inferenceComparing 2 proportions ANOVA & the FdistributionOneway ANOVA 0 CommentsSubmit a Comment Cancel replyYou must be logged in to post a comment. +34 616 71 29 85 Call me Spain: Ctra. 404, km 2, 29100 Coín, Malaga ........... Denmark: c/o Musvitvej 4, 3660 Stenløse Drop me a line What are you working on just now? Can I help you, and can you help me? Name Email Address Message Submit About meLearning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children. Connect with me
What they say20 years in sales, analysis, journalism and startups. See what my customers and partners sayabout me.
