Non Normal Data
Six Sigma – iSixSigma › Forums › Old Forums › General › Non Normal Data
 This topic has 16 replies, 13 voices, and was last updated 17 years, 3 months ago by Leon.

AuthorPosts

February 10, 2005 at 8:26 pm #38374
I went through Black Belt training about a year ago and am struggling with analyzing data because the tools I learned in training were mostly for normal data. My job is service and transaction related and I have not come across a data set that is normal yet.
Do you know of any good resources for analyzing non normal data? A book? A course? I have searched this site and have found some information but need more indepth help. Thank you.0February 10, 2005 at 9:15 pm #114729
DrSeussParticipant@DrSeuss Include @DrSeuss in your post and this person will
be notified via email.Sally,
Your dilemma is not a new one. Believe it or not, in many cases the non normal tools are easier to apply than the ones for normal data. You really need to take a course in non parametric statistics and distributions. If your schedule is like mine, attending a course that lasts longer than one or two days at the most is almost impossibility. If you are really comfortable with the parametric tools, especially hypothesis testing, then here is my short cut of non parametric analysis.
Get yourself a copy of the Minitab Handbook from Amazon or Minitab. Look at all the tool listed in the nonparametric tools section. Most are easy to understand and use (approx 24 evenings work). The help in Minitab is excellent.
Next got to the distributions section of that book and read about how to use the probability density function, inverse probability function and the cumulative probability density calculators. These three calculator features will allow you to calculate probabilities (areas above a point) of many known distributions. The normal happens to be just one of many important distributions that quality improvement folks typically encounter. You need to also become familiar with the distribution identify features found in the Reliability/Survival tools in Minitab. To keep your learning curve even shorter, use the probability distribution tool under the Graph tab, it is simpler to start using rather than going to the Reliability/Survival tab.
Last but not least, I have been in your situation many years ago, so if you still have questions, please contact me at [email protected]. I have several other excel developed macros and links that would facilitate your learning. Hope this helps.
0February 10, 2005 at 10:00 pm #114732I recommend you read Understanding Statistical Process Control by Wheeler & Chambers, chapter 4. He gives clear examples of different types of charts and compares them to one another. Not to let the cat out of the bag (really suggest you read this chapter and make up your own mind), but his synopsis is just make a control chart: “they will work, and the will work well…”
0February 11, 2005 at 3:18 am #114739
Jason BourneParticipant@JasonBourne Include @JasonBourne in your post and this person will
be notified via email.Sally ,
The book I would like to recommend is “Rapid Statistical Calculation” By M.H. Quenouille . Its an old book I think published 1972 . A Calculation of Distribution Free and Easy Methods for Estimation and Testing .
Regards,
Arvin0February 17, 2005 at 7:54 am #114997nNonnormal data is commonly encountered. It can often be managed with boxcox transformation (make nonnormal data normal)
regards, Arnold0February 17, 2005 at 8:32 am #114998
Shereen A. MosallamMember@ShereenA.Mosallam Include @ShereenA.Mosallam in your post and this person will
be notified via email.Hi
There is an excellent resource for your situation. It is:
Handling NonNormal DataBy Shree Padnis
https://www.isixsigma.com/library/content/c020121a.asp
Read this one and you will find it really good.
Shereen A. MosallamSix Sigma Master Black Belt0February 17, 2005 at 11:38 am #115004Would agree with FTSBB – Control Charts are the way to go (your note suggests you may not have considered them). Identify whether you have special cause variation and work on that.
Another tack would be to look at stratifying the data – think about all possible variables (location, shift, line of business etc) and then prepare histograms and control charts under each variable.
My guess is that one of these two might help you!
Good fortune.
0February 17, 2005 at 2:57 pm #115017
ALEK DEParticipant@ALEKDE Include @ALEKDE in your post and this person will
be notified via email.Sally ,
One way to handle this situation is make subgroups ( may be week , month , person etc) & take average & then your data will be the Xbars of the subgroups. Atleast on some cases I could get rid of non normality problem with these technique. Thanks to CLT !
Alek0February 17, 2005 at 9:04 pm #115050BMG has a good training called Tool Master, they teach you haw to deal with Non normal Data, I will recomend take that training because also teach you DOE staff to reduce variation, I work in a chemical plant and I deal with that every day.
0February 17, 2005 at 9:22 pm #115052BMG’s stuff is trivial.
Do they teach analysis of distribution vs what should be expected? That is where the greatest gains are and I only know of two places it is taught these days. Neither is BMG.0February 18, 2005 at 8:26 am #115069
Ritesh ChatterjeeMember@RiteshChatterjee Include @RiteshChatterjee in your post and this person will
be notified via email.In case of transactional data in the services industry ,it is always useful to convert it into defect data. In case it is continuous data… check for normalityif non normal,calculate product capability (treat as discrete data).If it is discrete data..then directly calculate product capability.
The more interesting part comes during hypothesis testing in Analyse phase. In my experience the use of 1way ANOVA,Multivary test,Chi square test and binary logistic regression suit the purpose.
In improve 2 sample t test will do the trick.0February 18, 2005 at 9:19 am #115071In case of nonnormal data there are some general guidelines to follow to go ahead with the analysis. Consider the list below as a starting point
1. Check if you can group your data in rational subgroups. In that case you don’t need to force the normality to calculate the process capability
2. Consider whether you have sufficient resolution, you could have a problem due the rounding
3. Investigate outliers
4. Be cautious of small sample sizes: a small sample from a normal population could test as nonnormal
5. Check you have a multimodal distribution (get help with a probability plot). In that case you could segment data and get normal distribution for each group
6. Check if your data fits another typical distribution using some tools like Crystal Ball. In that case you can determine the process capability referring this distribution
7. Transform your data: log, sqrt, BoxCox etc., but only if you have already made sure that you dont have outliers, small (negligible) departures from normality etc
8. Use percentiles for calculating the process capability (for example 0.135% and 99.865% corresponding to +/ 3 sigma in a normal distribution)
9. Treat your data as discrete data for process capability purposes and use parametric tests for hypothesis testing
I would always start the analysis with a runchart and a normal probability plot
0February 18, 2005 at 10:14 am #115074fer –
Not a bad list, except for number 4. That is absolute crap.
I see this statement on this site occasionally. Where did you learn this? Who is teaching this nonsense?
If you will spend 10 or 15 minutes trying some simulations in Minitab or whatevfer softwarew you prefer, you might get rid of this false notion.
In fact, the converse is true. It is possible to have too much data and fail a restrictive test like Anderson Darling when in fact the underlying population is normal.0February 18, 2005 at 11:08 am #115080My suggestion is to be cautious in case of small samples. Whatever the test you are perfoming, the confidence is small, so in case my sample looks normal or non normal I’d be careful before making any decision
Of course if you run a simulation with Minitab or Crystal Ball you get almost always normal data, but it is true also the converse. Try to run a simulation with 10 samples assuming a non normal distribution and you’ll see that your small sample looks almost always normal . Would you conclude that you have normal data?
So, to conclude: I wouldn’t assume that my data is not normal because the Anderson Darling tells me that on a sample size of 7 or 10.
0February 18, 2005 at 12:35 pm #115085fer –
Previously you wrote “Be cautious of small sample sizes: a small sample from a normal population could test as nonnormal”
Now you throw in about
“Of course if you run a simulation with Minitab or Crystal Ball you get almost always normal data, but it is true also the converse. Try to run a simulation with 10 samples assuming a non normal distribution and you’ll see that your small sample looks almost always normal . Would you conclude that you have normal data?”
Are you reversing yourself? Previously you claimed that small samples could show nonnormality out of a normal population, now you say “Of course if you run a simulation with Minitab or Crystal Ball you get almost always normal data, but it is true also the converse. ” What do you really believe, seems you are of two minds on this.
To answer thae question Would you conclude that you have normal data?”NO. One cam never prove ,i.e. accept the null hypothesis.
You then state:
So, to conclude: I wouldn’t assume that my data is not normal because the Anderson Darling tells me that on a sample size of 7 or 10
Well, I and any one else that understands hypothesis testing WOULD reject the null if the significance level were below the alpha and I would do so with the confidence reflected by the p value. The sample size is irrelevant, it is built into the reference distribution.
I don’t mean to start a war with you. I usually let what I believe to be myths just go by in this forum, but thisa idea that small samples from a normal universe will test as nonnormal is, based on all my previous training – just a crock. Could you provide one numerical example of where you have sen this? Can anyone?
0February 18, 2005 at 1:10 pm #115087Your assumption is that the extraction is done in a perfectly random way. In that case (easy to simulate with CB or Minitab) I said you are right.
Unfortunately, especially when dealing with historical data whose source and collection method is uncertain, is not unlikely to have a nonnormal distribution even if the population must be normal by its nature. In that case, despite of any statistical test, I don’t rely on what any test may tell me. I simply know that my data should be normal and therefore I investigate why it is not. Outliers? Rounding problems? Not “first attempt data” in a machining process? Data invented by operators to avoid scraps? I simply don’t know, but I want to investigate better before starting any further analysis (for example process capability)This is my approach, much more based on physics than on statistics. Maybe it’s wrong, but since I’m an engineer and not a statistician I prefer relying more on physics than on statistics based on a small sample0February 18, 2005 at 2:37 pm #115096I guess we are looking for base non normal data analysis, Why we need to complicate all??
keep simple, It is one of the good thinks using Data:
This is my own procedure: Using Minitab
1. basic Stat>graph summary
2. if is not normal, find the best distribution shape.
3.I will transfor the Data if the relation between min and max is >1.4
4. I can calculate Zscore, cpk.
5. Finally Individual trend control chart, by the way, in non normal data I will use the median.
I have a flow chart that I can share in how to perform the data analysis and which method or tools to use.
Small sample size stay away, you can not take desition on small sample size is like try to know the composition of the sea water with one sample. YES small samples has the tendency to be normal or granular distribution, this is not the correct way to do stat or even sugest someting base on small sample size.0 
AuthorPosts
The forum ‘General’ is closed to new topics and replies.