Hypothisis Testing – Exponential Distribution
Six Sigma – iSixSigma › Forums › Old Forums › General › Hypothisis Testing – Exponential Distribution
- This topic has 18 replies, 9 voices, and was last updated 17 years, 8 months ago by
Jonathon Andell.
-
AuthorPosts
-
November 30, 2004 at 2:38 am #37700
I am a new BB and have a process (wait time), which is Exponentially distributed. Improvements to the process have been made and we are now trying to complete a hypothisis test of the 2 means (original process and post improve process), to determine if a difference exists. An F-test was competed and it indicated the variances are Not equal. Minitab suggests a nonparametric test, however all the nonparametric tests offered indicate that the variances are to be considered “equal”. Can you offer the recommended approach based on this situation?
0November 30, 2004 at 12:08 pm #111472Doug,
You should be able to use either Mood’s Median or Kruskal-Wallis tests…. Neither require HOV but do need the samples to be independent and the population distributions to have the same shape…
Best Regards,
Bob J
0November 30, 2004 at 2:53 pm #111492Since you know the underlying distribution, you can find confidence intervals for the existing distribution. If your improved process falls within these confidence intervals, you have not proved an improvement, if it is outside (and in the desired direction) you have proved a difference. Thats all a hypothesis test is anyway.
When going to nonparametrics you potentially loose a lot of statistical power if the underlying distribution is known.
Go read the help menus again, not all of the non parametrics require equal variances. The advice the other poster gave was correct.0November 30, 2004 at 3:13 pm #111496
JorgeStatParticipant@JorgeStatInclude @JorgeStat in your post and this person will
be notified via email.hi
Moods Median Test (Non-Parametric)
or betterKruskal-Wallis (Non-Parametric)
Those methods are for variances unequal and non-normal data
regardss..
JorgeA0November 30, 2004 at 9:41 pm #111525Stan –
You are basically correct as usual, but not in the particulars.
Confidence intervals for statistics from statistically significant populations may in fact overlap by a considerable amount. Reference “Statistical Rules of Thumb” , van Bulle, Wiley 2002.(section 2.5). He shows for a 95% CI on means there could be a 29% overlap. The rule of thumb proposed is to do further work if as much as 25% overlap is observed.
If the CIs do not overlap, significance can be assurred. If they do, another test may be needed to prove they are truely non-significant or the alpha increased until no overlap.
I too often state that comparing CIs and hypothesuis testing are equivalent but with this important caveat.
0December 1, 2004 at 2:18 am #111528Stan/All:
This is great feedback and I plan to follow your suggestion and compare the confidence intervals. This seems to be the most straight forward solution to the problem. Since this is non-normal data, I believe that I should use the “one sample sign test” to obtain each distributions confidence interval correct?
By the way, I did go back and re-look at the help screens within Minitab as suggested. Try the following yourself: (open the program, click 1. help (from the top tool bar), 2. Statguide, 3. Help topics, 4. Nonparametic, and then 5. click on any of the following tests (Mann-Whitney, Kruskal-Wallis, Mood’s Median, and Friedman). You will find the Summay screen for each of these tests assumes that the two distributions should have the “same shape” and “equal variances”.
I contacted Minitab tech support and they are consulting with their Phd. Statistician on the issue. The person I spoke with did however pass along some information from one of their references that I thought might be worth sharing. The reference was the “Third Edition – Handbook of Parametric and Nonarametric Statistical Procedures by David J. Sheskin (pg.757-758):
“Various sources (e.g., Conover (1980, 1999), Daniel (1990), and Marascuilo and McSweeney (1977)) note that the Kruskal-Wallis one-way analysis of variance by ranks is based on the following assumptions: a) Each sample has been randomly selected from the population it represents; b) The k samples are independent of one another; c) The dependent variable (which is subsequently ranked) is a continuous random variable. In truth, this assumption, which is common to many nonparametric tests, is often not adhered to, in that such tests are often employed with a dependent variable which represents a discrete random variable; and d) The underlying distributions from which the samples are derived are identical in shape. The shapes of the underlying population distributions, however, do no have to be normal. Maxwell and Delaney (1990) point out that the assumption of identically shaped distributions implies equal dispersion of data within each distribution. Because of this, they note that, like the single-factor between-subjects analysis of variance, the Kruskal-Wallis one-way analysis of variance by ranks assumes homogeneity of variance with respect to the underlying population distributions. Because the latter assumption is not generally acknowledged for the Kruskal-Wallis one-way analysis of variance by ranks, it is not uncommon for the sources to state that violation of the homogeneity of variance assumption justifies use of the Kruskal-Wallis one-way analysis of variance by ranks in lieu of the single-factor between-subjects analysis of variance. It should be pointed out, however, that there is some empirical research which suggests that the sampling distribution for the Kruskal-Wallis test statistic is not as affected by violation of the homogeneity of variance assumption as is the F distribution (which is the sampling distribution for the single-factor between-subjects analysis of variance).”
The Mann-Whitney U test and found the exact same paragraph as above (except it compared the Mann-Whitney to the t test for two independent samples). (pg. 423-424)
The section covering the Friedman two-way analysis of variance by ranks test assumptions said nothing about variance. (pg. 845-846)
that the Kruskal-Wallis one way analysis of variance test by ranks does assume homegeneity of variance with respect to the underlying population distributions. However, it is not uncommon for the sources to state thatalso indicated that violation of this HOV assumption may at times be considered justified by some in lieu of the single-factor between -subjects analysis of variance.
By the way, technical support is also going to look at why their help sceens vary on this subject.
Thanks to all for your help on this one.0December 1, 2004 at 3:32 am #111529U R absolutely correct. The rigth test is Likelihood Ratio test when you know the distribution family. All nonparametric tools are not recommended when you know your distribution.
0December 1, 2004 at 4:26 am #111531
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Doug,
Kruskal-Wallis is based on ranks, so intuitively unequal variance is not going to be a problem.
To quote Sheskin:
It should be pointed out, however, that there is some empirical research which suggests that the sampling distribution for the Kruskal-Wallis test statistic is not as affected by violation of the homogeneity of variance assumption as is the F distribution (which is the sampling distribution for the single-factor between-subjects analysis of variance). One reason cited by various sources for employing the Kruskal-Wallis one-way analysis of variance by ranks, is that by virtue of ranking interval/ratio data a researcher can reduce or eliminate the impact of outliers.0December 1, 2004 at 4:33 am #111532
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Doug,
Minitab has a built-in hypothesis test for the exponential distribution. It is found in:
Stat > Reliability/Survival > Distribution Analysis (Right Censoring) > Parametric Distribution Analysis. Choose Exponential. Click Test. You can select scale and shape.
Note that the data does not have to be cencored to use this tool.
0December 1, 2004 at 4:35 am #111533
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Sorry I meant “censored”.
0December 1, 2004 at 4:41 am #111535John:
Thank you for the clarifications and the information on the exponential testing in Minitab. As you can tell I am still in the learning stage.
Doug0December 1, 2004 at 3:13 pm #111561
Marc ThysParticipant@Marc-ThysInclude @Marc-Thys in your post and this person will
be notified via email.Doug,
I am going to stick my neck out here and give advice that goes against anyone else’s so far.
You should be able to run a 2-sample t-test that does not assume equal variances (uncheck the option in Minitab). This is a valid test as long as your sample sizes are big enough – to ensure robustness against the non-normality you have (based on the Central Limit Theorem).
Before you ask me – I am not 100% sure what would be “big enough” as a sample size in this case but I guess you should have at least 30 in each sample.
If it really is the difference in means that you are interested in, not medians, then you should rule out the non-parametric tests anyway because they test the medians. They are also not very powerful.
By the way, your F-test is not valid in this case because this one really does need normality for it to be reliable! Use Levene’s test instead (the test for comparing variances in Minitab does both).0December 1, 2004 at 6:46 pm #111576Mark:
If there is a more direct way to accomplish this task, I am certainly interested, however I did try this and ran into a couple of challenges. First, Minitab does not allow zeros in the data (which mine has), and second my sample sizes are not equal. I have between 150-200 data points for each. How would one decide which to eliminate to make them equal? They were both taken over a period of a week and represent Monday-Friday by design to ensure unique conditions present on some days were included. Thoughts?0December 1, 2004 at 6:50 pm #111577Mark, sorry the previous message was intended for John Noguera.
0December 1, 2004 at 6:54 pm #111579John:
Please see message 60090 which was in response to your suggestion.
Thanks0December 2, 2004 at 4:39 am #111603
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Doug,
You are welcome.
The unequal sample size is not a problem.
You could simply add a shift, or use the three parameter Weibull which includes a shift parameter. The exponential distribution is a special case of the Weibull (Shape=1). Note that with your negative values you will get an error message saying the variance-covariance matrix is singular, but the confidence intervals on the scale and shape ratios will work.
I am assuming that you have a theoretical basis for using the exponential, otherwise the use of more basic tools discussed earlier such as nonparametrics or shift + Box-Cox would be appropriate.0December 2, 2004 at 8:24 am #111608
Marc ThysParticipant@Marc-ThysInclude @Marc-Thys in your post and this person will
be notified via email.OK.
Still try the 2-sample t-test though!!! Again, it is a valid test regardless of the underlying distribution, as long as your sample sizes are large enough – and over 100 is definitely OK! And you don’t have to worry about unequal sample sizes nor zeroes!
Why make it complex if simple will do?
Cheers0December 2, 2004 at 10:25 am #111614
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Marc,
I agree with you. Simplicity is best. But be careful – extremely skewed data is probably better handled by transformation than assuming CLT will work for you.
In my previous posts I am assuming that there is a theoritical basis for the exponential distribution.0December 7, 2004 at 2:16 am #111854
Jonathon AndellParticipant@Jonathon-AndellInclude @Jonathon-Andell in your post and this person will
be notified via email.It might be worth visualizing the data on a control chart, possibly with control limits computed separately for the two samples. This allows us to view the overall groupings of data, and it enables us to see if the data display any time-related patterns of variation. If there are such special causes, you may find that a stable process would yield a different distribution.
0 -
AuthorPosts
The forum ‘General’ is closed to new topics and replies.