Pooled variance t-procedureThe pooled variance t-procedure uses a pooled variance for comparing two means. It assumes that the two population variances are equal which is rarely known, and therefore, the pooled variance t-procedure is not commonly applied in statistics.
When to use pooled variance t-procedure?The advantage of the pooled variance t-procedure is that it follows an exact t-distribution. But its disadvantage is that the population variances are assumed to be equal. As we do not know the population variances, neither do we know if they really are equal. If we expect the population variances to be equal the pooled variance t-procedure might be adequate. We can calculate a ratio of the sample standard deviations: s1/s2 The closer the ratio is to 1 the greater the probability that the pooled variance t-procedure is adequate because the more alike these sample standard deviations are. A general guideline is that the pooled variance procedure can be applied when the ratio (s1/s2) is within 0.5 and 2, which means that neither deviation is more than twice the other. One of the alternatives to the pooled variance t-procedure is the Welch unpooled t-procedure. The advantage of this procedure is that it does not assume equal population variances. The disadvantage is that it does not follow an exact t-distribution. Assumption for the pooled variance t-procedure
Main purposesThe main purposes of estimating and testing with the pooled variance t-procedure are:
Pooled sample varianceAs we assume the two population variances to be equal, we can pool the two sample variances taking a weighted average of these two. It will end up somewhere between the two sample variances tending to be closer to the one with the largest sample size, as it can be deducted from the formula: The pooled sample variance is the estimator of the common population variance (σ2). The standard error (SE) of the differenceThe standard error (SE) of the difference in sample means is applied when comparing the two means through confidence intervals and hypothesis testing. The SE formula: SE is the estimator of the standard deviation of the sampling distribution of the difference. Confidence interval for the differenceHaving the pooled sample variance and the SE, we can complete the formula for the confidence interval of the difference: Hypothesis test for the differenceA hypothesis test for the difference tests if there is evidence to support a rejection of H0 as we know it from hypothesis testing in other statistical procedures. Usually, the null hypothesis if the difference is equal to or different from zero, like we have seen it in Comparing two means and Comparing two proportions. The p-value expresses how likely we are to get that an extreme of a result as the one we got from our sample assuming that there is no difference in the means. Typically, the hypotheses are expressed as follows: As we are usually interested in knowing if there is a difference or not, we are looking both on the lower side of the mean and as on the upper side, which is a two-tailed test. The test statistic is calculated similarly to what we know from other statistical procedures. We compare the difference to the SE of the difference: Concluding on the hypothesis test: As in other kind of hypothesis test a very low p-value expresses that there is very little chance of getting as extreme a result as the one we got from our sample assuming that the means should be equal. Therefore, a very low p-value, gives very strong evidence against the null hypothesis and thus against the claim that the two means should be equal. We reject the null hypothesis when the p-value is lower than the pre-set significance level (α). Worked exampleIn the following, we will run through an example making inference through confidence interval and hypothesis test. The storySay, that a curious teacher who works in a large international education organization wish to test the difference in test scores between students who attend courses run during morning hours with the ones run in the afternoon. The sampleShe takes two randomly selected samples, one amongst the population of morning-students and one amongst the evening-students. Each sample size is of 10 students, and she assumes that the population variances are equal and therefore runs a pooled variance t-procedure. The sample outcomes for morning-student is a mean-score of 72.8 with a sample variance of 15.43. The sample outcomes for evening-student is a mean-score of 64.7 with a sample variance of 12.29. Confidence intervalTo make inference on this difference of (72.8 64.7=) 8.1, she runs a 95% confidence interval. First, she calculates a pooled variance which is then applied to calculate the standard error of the difference (SE), and the SE finally serves to calculate the confidence interval. The critical value for a 95% confidence interval can be looked up in a t-table or in statistical software. It is 2.10. The 95% confidence interval spans from -5 to 21. This means that, based on her samples of 10 each, she can feel confident that there is a 95% probability that the true population mean difference oscillates between -5 and 21. This confidence interval includes 0, which means that a null hypothesis stating that there is no difference in means, could not be rejected. Hypothesis testThe teacher wishes to state a mathematical expression for what her confidence interval, above, already tells her: That there is no evidence that there should be a difference. She conducts a confidence interval: The test statistic is not in the rejection area as it lies between the critical values of -2.10 and 2.10. The teacher therefore fails to reject the null hypothesis. There is no evidence to conclude that there is a difference between the test scores of the two groups. The p-value is 0.211, which means that there is a 21.1% chance that she would get this extreme a result assuming that there is no difference in the means. That is a fairly large chance and she can therefore not reject the null hypothesis. New and larger samplesThe teacher, who really feel that she must be right in her intuition, suspects that the sample size simples has been too small. She now takes two samples, each of size 28. For the sake of this exercise, say she gets almost the sample result (sample mean difference = 9.0 and a pooled sample variance of 215.5). She, therefore, reject the null hypothesis with test statistic of 2.29 compared to a critical value at α = 0.05) = 2.00. The p-value is 0.026, which means that there is a 2.6% probability that she would get this extreme a result assuming that there is no difference in the means. At a significance level of alpha = 0.05, she will reject the null hypothesis and consider that the 2.6% probability is too extreme assuming that the two means should be equal. We recall that this t-procedure assumes that the population variances are equal, which we didnt know in this example. Therefore this pooled variance t-procedure might not be the most adequate for the situation. Pooled variance t-procedure in MS ExcelIn Excel we run a t-Test: Two-Sample Assuming Equal Variances from the Data >> Data Analysis menu. This does not calculate the confidence interval which Im calculating by Pooled variance t-procedure in R statistical programmingComing Learnings on pooled variance t-procedure
Carsten GrubeFreelance Data Analyst
ProbabilitiesSample spaces & events p Summarizing dataMean, median and mode Discrete distributionDiscrete vs. continuous Modelling dataContinuous vs. discrete Normal distributionNormal distribution p Study designStatistical questions Confidence intervalsConfidence intervals Hypothesis testsHypothesis testing Simple linear regression, fundamentalsScatter plots p Simple linear regression, InferenceInference on regression Two-sample inferenceComparing 2 proportions ANOVA & the F-distributionOne-way ANOVA 0 CommentsSubmit a Comment Cancel replyYou must be logged in to post a comment. +34 616 71 29 85 Call me Spain: Ctra. 404, km 2, 29100 Coín, Malaga ........... Denmark: c/o Musvitvej 4, 3660 Stenløse Drop me a line What are you working on just now? Can I help you, and can you help me? Name Email Address Message Submit About meLearning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children. Connect with me
What they say20 years in sales, analysis, journalism and startups. See what my customers and partners sayabout me.
|