- New JobMondelezCI Engineer
I have to appeal form you guys. I was reading a lot of materials and it tells me that I should use Z distribution for hypothesis testing if:
– Population Standard Deviation is KNOWN, sample size is either less than or more than 30
– Population Standard Deviation is UNKNOWN, sample size is either more than or equal to 30
I would use T distribution for hypothesis testing if:
– Population Standard Deviation is UNKNOWN, sample size is less than 30
Is this correct? I am asking this because there seem to be a misunderstanding in my group. Some are thinking that even if Population Standard Deviation is known, as long as sample size is less than 30, T distribution should be used.
What is correct? Please help. Thanks.
I have seen the T distribution used in scenarios where there is an estimate of the population standard deviation and looking to perform inferential statistics with small confirmatory samples. With a sample size that is less than 30, a conservative would use the T vs the Z since there is some added buffer in the wider tails of the T. My recommendation would be to run both distributions and see how comparable the results are. Utilizing a Z vs T distribution results in a higher likelihood of committing a Type I error. At the same level of significance, barring large sample sizes 30+, the critical value for the Z will be less than the corresponding T. Another factor to consider is what is the alternate hypothesis (e.g. nondirectional vs. directional) this will also have a huge impact on the results of your analysis.
The rule is simple:
If the population standard deviation is known, use the z-distribution.
If the population standard deviation is estimated using the sample standard deviation, use the t-distribution.
It so happens that the t-distribution tends to look quite normal as the degrees of freedom (n-1) becomes larger than 30 or so, so some users use this as a shortcut.
Your question can be answered in two ways.
It depends on what statistic you are attempting to delve into. It is difficult to determine what % of products would have a value of X or above–an individual distribution (shown as Z). Note the t distributions assume a minimum df = 1, meaning an average of 2 items. I would use a Z distribution if I wanted to know what % of items would be above some value assuming I had enough data (samples > 30) to approximate the population.
If you are comparing means and need to decide between the Z and t distributions, it does not matter if sigma (greek) is known or not if the sample is large enough in each group, greater than 30–you can use either Z or t because they become similar enough past n = 30.
I quote “Introduction to Business Statistics” by Kvanli, Guynes, and Pavur (4th edition) which says on p.250 “Remember, however, that a more accurate confidence interval is always obtained using the t table when the sample standard deviation (s) is used in construction of this interval.
I hope this helps.
The forum ‘General’ is closed to new topics and replies.