BUS308 – Week 5 Discussion Assignment

BUS308 – Week 5 Discussion Assignment

A Different View

Expected Outcomes

After reading this lecture, the student should be familiar with:

1. What a confidence interval for a statistic is. 2. What a confidence interval for differences is. 3. The difference between statistical and practical significance. 4. The meaning of an Effect Size measure.

Overview

Years ago, a comedy show used to introduce new skits with the phrase “and now for something completely different.” That seems appropriate for this week’s material.

BUY A PLAGIARISM-FREE PAPER HERE

This week we will look at evaluating our data results in somewhat different ways. One of the criticisms of the hypothesis testing procedure is that it only shows one value, when it is reasonably clear that a number of different values would also cause us to reject or not reject a null hypothesis of no difference. Many managers and researchers would like to see what these values could be; and, in particular, what are the extreme values as help in making decisions. Confidence intervals will help us here.

The other criticism of the hypothesis testing procedure is that we can “manage” the results, or ensure that we will reject the null, by manipulating the sample size. For example, if we have a difference in a customer preference between two products of only 1%, is this a big deal? Given the uncertainty contained in sample results, we might tend to think that we can safely ignore this result. However, if we were to use a sample of, say, 10,000, we would find that this difference is statistically significant. This, for many, seems to fly in the face of reasonableness. We will look at a measure of “practical significance,” meaning the likelihood of the difference being worth paying any attention to, called the effect size to help us here.

Confidence Intervals

A confidence interval is a range of values that, based upon the sample results, most likely contains the actual population parameter. The “most likely” element is the level of confidence attached to the interval, 95% confidence interval, 90% confidence interval, 99% confidence interval, etc. They can be created at any time, with or without performing a statistical test, such as the t-test.

A confidence interval may be expressed as a range (45 to 51% of the town’s population support the proposal) or as a mean or proportion with a margin of error (48% of the town supports the proposal, with a margin of error of 3%). This last format is frequently seen with opinion poll results, and simply means that you should add and subtract this margin of error from the reported proportion to obtain the range. With either format, the confidence percent should also be provided. BUS308 – Week 5 Discussion Assignment

Confidence intervals for a single mean (or proportion) are fairly straightforward to understand, and relate to t-test outcomes simply. Details on how to construct the interval will be given in this week’s second lecture. We want to understand how to interpret and understand them in this discussion.

In Week 2, we looked at how to test sample means against a constant, and we found that the female compa-ratio mean was not equal to or less than 1.0. The related confidence interval for the female compa-ratio mean would be 1.0367 to 1.0977, or 1.0687 +/- 0.0290 (all values rounded to 4 decimal places). This result relates to possible t-test outcomes directly. If, again in the one sample situation, the standard/constant we are comparing our sample result against is within this range, then we would NOT reject the null hypothesis of no difference. If the standard is outside of this range, as our 1.00 test in Week 2, then we reject the null and say we have a significant difference. It is clear in this case, that the female mean is not even close to the mid- point value of 1.0 that we looked at.

Confidence intervals allow us to make some informed “gut level” decisions when more precise measure may not be needed. For example, if the means of two variables are fairly close, the wider confidence interval will have more variation within the data, and be less consistent. We could test this with the F-test for variance that we covered in week 2. While a hypothesis result of “reject the null hypothesis” or “do not reject the null hypothesis” with an alpha of 0.05 is definite; it does not convey the “strength” of the rejection. Comparing the endpoints against the standard used in our one-sample t-test would give a sense of how “close” we came to making the other decision.

Confidence intervals can also be used to examine the difference between means. The most direct way is by constructing a confidence interval for the difference. Again, the details on how to develop one of these will be presented in the second lecture for this week. This result is very similar to the intervals we constructed while doing the ANOVA comparisons. While we use a different calculation formula when comparing only two means (rather than two means at a time with the ANOVA situation), the interpretation is the same. If the range contains a 0, then the population means could be identical and we would not reject the null hypothesis of no difference.

If we have two single mean confidence intervals, for example intervals for the male and female compa-ratios; using them to determine if the means are significantly different is a bit trickier than simply seeing if they might contain the same value within their range. If, the top ¼ of one interval and the bottom ¼ of the other overlap, then we have a significant difference at the alpha = 0.05 level. If the endpoints barely overlap, we have a significant difference around the alpha = 0.01 level.

The natural question at this point is why does an overlap when comparing means show a significant difference when it does not do so when comparing a mean against a constant. The answer lies in how the intervals are constructed. Without getting too technical, the intervals use a t-value to establish the level of confidence. And, as the sample size gets larger, the corresponding t-value gets smaller for any specific alpha level. So, in our example of comparing

compa-ratio means, we had samples of 25 when constructing the individual intervals and used a slightly larger t-value than we would use with our overall sample of 50 when comparing the two groups together. This means the individual intervals are a bit longer when compared to the larger sample result, hence why some of the overlap shows a significant difference rather than the more “logical” interpretation of only no overlap at all means they differ.

Effect Size – Practical Importance

A popular saying a few years ago was “if you torture data long enough, it will confess to anything.” 😊😊 Unfortunately, many regard statistical analysis this same way. Some think that if you do not get a rejection of the null hypothesis that you want, simply repeat the sampling with a larger group, at some sample size virtually all differences will be found to be statistically significant. Note, this is somewhat unethical for professional researchers; however, those who feel that proving their point is more important than following professional guidelines have been known to do this.

But, does statistical significance mean the findings should be used in decision making? If, for example we typically round salary to the nearest thousand dollars when making decisions, does a significant difference based on a $500 difference have any practical importance? Probably not, even if we could find a sample size large enough to make this difference statistically significant.

So, how do we decide the practical importance of a statistically significant difference? Once, and this is important, we have rejected the null hypothesis – and only if we have rejected the null hypothesis – we calculate a new statistic called the effect size.

The name comes from the effect changing a variable’s value would have on the outcome. To understand this idea, let’s look at the male and female compa-ratios. We found in week 2 that the male and female compa-ratio means were not significantly different. So, the “effect” of changing from male to female when doing an analysis with the compa-ratio mean would not be very big. However, if we switched from the male to female average salary, we would expect to see a large effect or difference in the outcome since their salaries were so different.

The effect size measure – however it is calculated for different statistical tests – can be interpreted in a similar fashion. Effect sizes generally have their value translated into a “large,” “moderate,” or “small” label. If we have a large effect, then we know that the variable interaction caused the rejection of the null hypothesis, and that our results have a strong practical significance. If, however, we have a small effect, then we can be fairly sure that the sample size caused the rejection of the null hypothesis and the results have little to no practical significance for decision making or research results. A moderate outcome is less clear, and we might want to redo the analysis with a different sample. (Note: since we have already rejected the null, repeating the experiment with a different sample in this case is not to manipulate the findings, but rather to study the effect of the variables in more detail. This is done in research all the time – providing evidence that the findings are replicable and correct.) Examples of different effect size measures and how to determine what is large, medium, and small are presented in the third lecture for this week. BUS308 – Week 5 Discussion Assignment

Summary

Some are concerned about statistical outcomes as many different values can produce a statistically significant outcome, so they are not really sure about what the outcome means. This concern is frequently addressed with the use of a confidence interval, a range of values that could be the population parameter being estimated based on the sample’s uncertainty. Remember that since we have only a sample, we know the resulting estimates are a bit off, but are generally considered close enough to use for decisions. The confidence interval gives us a range of values that could be used in decision-making to see if outcomes might differ with the more extreme possibilities.

Just as we cannot use two single sample t-test outcomes to determine if sample means differ, we cannot use two individual sample CIs to make decisions about differences between two samples. A difference Confidence Interval needs to be constructed for this outcome.

A second criticism about statistical significance outcomes involves the impact that the sample size has on an outcome. Some think that statistically significant results are not always significant for real world practical decisions and situations. This is due to the impact of sample sizes on the decision to reject the null. Almost any difference can be found to be statistically significant if the sample size is large enough. So, we can have statistical significance of a difference that has no practical importance at all.

This concern lead to the development of the effect size measure. Used only when the null hypothesis is rejected (meaning we have found a statistically significant difference), the effect size tells us if the rejection was due to the variables changing (a large effect size and an important difference) or due more to merely having a large sample size (a small effect size and an unimportant difference for all practical purposes). For example, what impact would be changing from being a male to a female have on mean salary; in this case a big impact. The impact of the same change on compa-ratio however is relatively small. Different statistical tools have different calculations for this measure.

If you have any questions on this material, please ask your instructor.

After finishing with this lecture, please go to the first discussion for the week, and engage in a discussion with others in the class over the first couple of days before reading the second lecture.

BUS308 – Week 5 Discussion Assignment