# Non Normal Data

Six Sigma – iSixSigma Forums Old Forums General Non Normal Data

Viewing 17 posts - 1 through 17 (of 17 total)
• Author
Posts
• #38374

Whitehill
Member

I went through Black Belt training about a year ago and am struggling with analyzing data because the tools I learned in training were mostly for normal data.  My job is service and transaction related and I have not come across a data set that is normal yet.
Do you know of any good resources for analyzing non normal data?  A book?  A course?  I have searched this site and have found some information but need more indepth help.  Thank you.

0
#114729

DrSeuss
Participant

Sally,
Your dilemma is not a new one.  Believe it or not, in many cases the non normal tools are easier to apply than the ones for normal data.  You really need to take a course in non parametric statistics and distributions.  If your schedule is like mine, attending a course that lasts longer than one or two days at the most is almost impossibility.  If you are really comfortable with the parametric tools, especially hypothesis testing, then here is my short cut of non parametric analysis.
Get yourself a copy of the Minitab Handbook from Amazon or Minitab.  Look at all the tool listed in the non-parametric tools section.  Most are easy to understand and use (approx 2-4 evenings work).  The help in Minitab is excellent.
Next got to the distributions section of that book and read about how to use the probability density function, inverse probability function and the cumulative probability density calculators.  These three calculator features will allow you to calculate probabilities (areas above a point) of many known distributions.  The normal happens to be just one of many important distributions that quality improvement folks typically encounter.  You need to also become familiar with the distribution identify features found in the Reliability/Survival tools in Minitab.  To keep your learning curve even shorter, use the probability distribution tool under the Graph tab, it is simpler to start using rather than going to the Reliability/Survival tab.
Last but not least, I have been in your situation many years ago, so if you still have questions, please contact me at [email protected].  I have several other excel developed macros and links that would facilitate your learning.  Hope this helps.

0
#114732

FTSBB
Participant

I recommend you read Understanding Statistical Process Control by Wheeler & Chambers, chapter 4.  He gives clear examples of different types of charts and compares them to one another.  Not to let the cat out of the bag (really suggest you read this chapter and make up your own mind), but his synopsis is just make a control chart: “they will work, and the will work well…”

0
#114739

Jason Bourne
Participant

Sally ,
The book I would like to recommend is “Rapid Statistical Calculation” By M.H. Quenouille . Its an old book I think published 1972 . A Calculation of Distribution -Free and Easy Methods for Estimation and Testing .
Regards,
Arvin

0
#114997

Arnold
Participant

nNon-normal data is commonly encountered.  It can often be managed with box-cox transformation (make non-normal data normal)

regards, Arnold

0
#114998

Shereen A. Mosallam
Member

Hi
There is an excellent resource for your situation. It is:
https://www.isixsigma.com/library/content/c020121a.asp
Read this one and you will find it really good.
Shereen A. MosallamSix Sigma Master Black Belt

0
#115004

Al
Participant

Would agree with FTSBB – Control Charts are the way to go (your note suggests you may not have considered them).  Identify whether you have special cause variation and work on that.
Another tack would be to look at stratifying the data – think about all possible variables (location, shift, line of business etc) and then prepare histograms and control charts under each variable.
Good fortune.

0
#115017

ALEK DE
Participant

Sally ,
One way to handle this situation is make subgroups ( may be week , month , person etc) & take average & then your data will be the Xbars of the subgroups. Atleast on some cases I could get rid of non normality problem with these technique. Thanks to CLT !

Alek

0
#115050

Leon
Participant

BMG has a good training called Tool Master, they teach you haw to deal with Non normal Data, I will recomend take that training because also teach you DOE staff to reduce variation, I work in a chemical plant and I deal with that every day.

0
#115052

Mikel
Member

BMG’s stuff is trivial.
Do they teach analysis of distribution vs what should be expected? That is where the greatest gains are and I only know of two places it is taught these days. Neither is BMG.

0
#115069

Ritesh Chatterjee
Member

In case of transactional data in the services industry ,it is always useful to convert it into defect data. In case it is continuous data… check for normality-if non normal,calculate product capability (treat as discrete data).If it is discrete data..then directly calculate product capability.
The more interesting part comes during hypothesis testing in Analyse phase. In my experience the use of 1way ANOVA,Multivary test,Chi square test and binary logistic regression suit the purpose.
In improve 2 sample t test will do the trick.

0
#115071

Fer
Participant

In case of non-normal data there are some general guidelines to follow to go ahead with the analysis. Consider the list below as a starting point

1.      Check if you can group your data in rational subgroups. In that case you don’t need to force the normality to calculate the process capability
2.      Consider whether you have sufficient resolution, you could have a problem due the rounding
3.      Investigate outliers
4.      Be cautious of small sample sizes: a small sample from a normal population could test as non-normal
5.      Check you have a multimodal distribution (get help with a probability plot). In that case you could segment data and get normal distribution for each group
6.      Check if your data fits another typical distribution using some tools like Crystal Ball. In that case you can determine the process capability referring this distribution
7.      Transform your data: log, sqrt, Box-Cox etc., but only if you have already made sure that you dont have outliers, small (negligible) departures from normality etc
8.      Use percentiles for calculating the process capability (for example 0.135% and 99.865% corresponding to +/- 3 sigma in a normal distribution)
9.      Treat your data as discrete data for process capability purposes and use parametric tests for hypothesis testing
I would always start the analysis with a run-chart and a normal probability plot

0
#115074

DaveS
Participant

fer –

Not a bad list, except for number 4. That is absolute crap.
I see this statement on this site occasionally. Where did you learn this? Who is teaching this nonsense?
If you will spend 10 or 15 minutes trying some simulations in Minitab or whatevfer softwarew you prefer, you might get rid of this false notion.
In fact, the converse is true. It is possible to have too much data and fail a restrictive test like Anderson Darling when in fact the underlying population is normal.

0
#115080

Fer
Participant

My suggestion is to be cautious in case of small samples. Whatever the test you are perfoming, the confidence is small, so in case my sample looks normal or non normal I’d be careful before making any decision
Of course if you run a simulation with Minitab or Crystal Ball you get almost always normal data, but it is true also the converse. Try to run a simulation with 10 samples assuming a non normal distribution and you’ll see that your small sample looks almost always normal . Would you conclude that you have normal data?
So, to conclude: I wouldn’t assume that my data is not normal because the Anderson Darling tells me that on a sample size of 7 or 10.

0
#115085

DaveS
Participant

fer –
Previously you wrote  “Be cautious of small sample sizes: a small sample from a normal population could test as non-normal”
“Of course if you run a simulation with Minitab or Crystal Ball you get almost always normal data, but it is true also the converse. Try to run a simulation with 10 samples assuming a non normal distribution and you’ll see that your small sample looks almost always normal . Would you conclude that you have normal data?”
Are you reversing yourself? Previously you claimed that small samples could show non-normality out of a normal population, now you say “Of course if you run a simulation with Minitab or Crystal Ball you get almost always normal data, but it is true also the converse. ” What do you really believe, seems you are of two minds on this.
To answer thae question Would you conclude that you have normal data?”NO. One cam never prove ,i.e. accept the null hypothesis.
You then state:
So, to conclude: I wouldn’t assume that my data is not normal because the Anderson Darling tells me that on a sample size of 7 or 10
Well, I and any one else that understands hypothesis testing WOULD reject the null if the significance level were below the alpha and I would do so with the confidence reflected by the p value. The sample size is irrelevant, it is built into the reference distribution.
I don’t mean to start a war with you. I usually let what I believe to be myths just go by in this forum, but thisa idea that small samples from a normal universe will test as non-normal is, based on all my previous training –  just a crock. Could you provide one numerical example of where you have sen this? Can anyone?

0
#115087

Fer
Participant

Your assumption is that the extraction is done in a perfectly random way. In that case (easy to simulate with CB or Minitab) I said you are right.
Unfortunately, especially when dealing with historical data whose source and collection method is uncertain, is not unlikely to have a non-normal distribution even if the population must be normal by its nature. In that case, despite of any statistical test, I don’t rely on what any test may tell me. I simply know that my data should be normal and therefore I investigate why it is not. Outliers?  Rounding problems? Not “first attempt data” in a machining process? Data invented by operators to avoid scraps? I simply don’t know, but I want to investigate better before starting any further analysis (for example process capability)This is my approach, much more based on physics than on statistics. Maybe it’s wrong, but since I’m an engineer and not a statistician I prefer relying more on physics than on statistics based on a small sample

0
#115096

Leon
Participant

I guess we are looking for base non normal data analysis, Why we need to complicate all??
keep simple, It is one of the good thinks using Data:
This is my own procedure: Using Minitab
1.- basic Stat>graph summary
2.- if is not normal, find the best distribution shape.
3.-I  will transfor the Data if the relation between min and max is >1.4
4.- I can calculate Zscore, cpk.
5.- Finally Individual trend control chart, by the way, in non normal data I will use the median.
I have a flow chart that I can share in how to perform the data analysis and which method or tools to use.
Small sample size stay away, you can not take desition on small sample size is like try to know the composition of the sea water with one sample. YES small samples has the tendency to be normal or granular distribution, this is not the correct way to do stat or even sugest someting base on small sample size.

0
Viewing 17 posts - 1 through 17 (of 17 total)

The forum ‘General’ is closed to new topics and replies.