NonNormal Data
Six Sigma – iSixSigma › Forums › General Forums › Methodology › NonNormal Data
 This topic has 9 replies, 5 voices, and was last updated 12 years, 1 month ago by MBBinWI.

AuthorPosts

June 23, 2010 at 8:59 pm #53491
Hello, I´ve been reading some topics about Nonnormal Data, but a I still having some doubts…
1) If the distribution is NonNormal always its necesary to transform the data to have a normal distribution???
2) if the answer is no, Can i continue applying the metholodogy with non normal data??? That depend on???
Thank you
0June 24, 2010 at 12:05 pm #190368
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.Your question is far too general. The answer is yes, no, maybe. If you can tell us more about what you are doing and what tests/methods you are interested in using then perhaps I or someone else could offer some suggestions.
Just to give you some understanding of why it is too general consider the following:
1. Data – both X and Y do not have to be normal for regression, however, in order for the tests of significance to be accurate the residuals must be normally distributed.
2. ttest and ANOVA are robust with respect to nonnormality, however, you can have situations where the distributions are so extreme that these tests will declare nonsignificance when, in fact, the reverse might be true – how nonnormal – good question – no real good answers – best defense – when in doubt run the WilcoxonMannWhitney or the KruskalWallace tests.
etc.
0June 24, 2010 at 1:25 pm #190369Robert, thank you for your help. you have answered part of my doubts.
At the moment i don’t have this kind of problems , but i have seen in other projects , for example when you use a regression, that you mentioned above , the most important thing its that the residuals have to be normal.
I think this kind of explanation its important for practicioners like me, because when we find a non normal data, we usually wants to transform the data inmediately and the are somes cases how you explained above its non necessary.
I have another question…. Maybe is too general
When you are in Analysis stage, and you are doing an hypothesis testing is important to explained the data with a test ( anova, regression, t test, etc) .
The question is:
Do you have to demostrate or show conclusions with a statistical analysis?The question is because i have seen like people explained some kind of data only doing a pareto. is it correct?
Thank you
0June 24, 2010 at 2:12 pm #190370
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.Statistical analysis is not limited to formal statistical tests. Properly executed graphics and/or summary tables are a form of analysis and may be all that you need.
For example, if you are trying to explain your choice of problems to address then the pareto is one of the tools of choice. It is a clean graphic and it allows everyone to see your data in some kind of rank order of importance – and it is a statistical summary.
You could also use a pareto to to illustrate the effects of your actions – in this case it would be two paretos – a before and an after.
If we consider the world of graphs in general then there are cases on record where the graph was the primary analysis and was used, in conjunction with summary tables, to solve a problem – a good example of this is Snow’s graphical summary of the cholera epidemic in the Broadstreet area of London back in 1854. His graph, coupled with his summary of failed attempts to find any other meaningful relationship between those who died and those who survived in that area forced all of the governments of Europe to take steps to insure water quality (remember this was 30+ years before the germ theory of disease transmission was even put forward).
If you want to explore this a bit more I’d recommend reading the three books by Tufte
1. The Visual Display of Quantitative Information
2. Visual Explanations
3. Envisioning Information0June 24, 2010 at 2:25 pm #190371Robert, thank you for your explanation you have clarified all my doubts
0June 24, 2010 at 6:17 pm #190375Several months ago, Breyfogle and Pyzdek had quite an interesting debate on this very subject on Quality Digest’s website. I suugest you search the term “tranform data” at: http://www.qualitydigest.com/
0June 24, 2010 at 8:46 pm #190376ok Thank you, I will check
0June 24, 2010 at 11:22 pm #190377JHJ, interesting article ,
in your opinion What will you do? transforming or not?0June 28, 2010 at 6:04 pm #190381To Zackaris on your original question.
1) The answer is no. It is anything but necessary to transform the data. There are very limited instances where transforming the data is even appropriate. Why transform the data where there is an expectation of a different distribution and that is what you find? If you are smart enough to know how to answer that question, then you are smart enough to apply the assumptions of that distribution and make a decision.
2) Yes you can apply the method which is DMAIC, not a specific tool. Doesn’t depend on anything. DMAIC basically is a thought process of understanding the behavior of a process before trying to fix it. Tools are just an aid in this. Use your brain and don’t just check boxes.By the way that was Wheeler and Breyfogle and Wheeler is right. Breyfogle is just an academic practitioner who does not actually touch processes. Breyfogle is mumbo jumbo, Wheeler is pretty straight.
0June 30, 2010 at 6:09 pm #190386
MBBinWIParticipant@MBBinWI Include @MBBinWI in your post and this person will
be notified via email.Robert: As always, spot on. You may also wish to plot a run chart, or control chart with 2 (or more) groups to graphically show changes. And Tufte is the grandmaster of visual data representation. Back in school when we were studying the Napoleonic campaign to Russia, the Tufte graph was a very powerful graphic. Those reading this who have no idea of what I’m talking about should Google Tufte and Napoleon. Truly amazing.
0 
AuthorPosts
You must be logged in to reply to this topic.