Home › Forums › General Forums › Tools & Templates › Dealing with Non-normal Data
This topic contains 16 replies, has 6 voices, and was last updated by Fos 1 month, 3 weeks ago.
Hi,
I have observed a data in one of the process of my company . The Normality test shows P value – 0.002 . Since it is non normal data , How to deal with this ? . Even I tried Box cox transformation of the data , the resultant data was also not normal . When I tried through a software StatAssist , it showed the data approximately follows Burr distribution .
Actually the data is of Hardness of Powder metallurgy part , where there is excessive within part variation on account of pores in the part and this is natural . Can pl. anyone suggest how to handle this .
Regards,
Bala
You will have to tell us what it is that you are trying to do with your data before anyone can offer much in the way of advice.
Control charting? – non-normality is not an issue.
Process capability? – there are methods (Chapter 8 in Measuring Process Capability- Bothe)
to handle this situation.
t-test or ANOVA? – no problem – the t-test is robust to non-normal data as is ANOVA,
Regression? – no issues here – approximate normality is an issue with residuals only.
Concern that the process is out of control? – Maybe, maybe not – hardness data will be non-normal even when everything is in control because it is from a process that is bounded. Besides, there is no conncection between data that is normally distributed and whether or not the process is in control.
Can you post a picture of the histogram for the data?
What kind of a sample size are we looking at? If that is a small sample then the fact that there is a gap between what looks to be about 625-650 probably isn’t much to worry about and the data looks to be normal enough to just press on with a capability measure calculation.
On the other hand, if that histogram represents hundreds of samples then there are some intersting possibilities. You could have a bi-modal process with some curious process behavior around 625.
Nice gathering of data…the first step to success. :)
concur with @rbutler.
@b1a5l9a2 – OK. We’re getting closer. Can you observe, or ask the workers, if there is adjustment going on to bring the value back to nominal? It looks to me that the lower side is happening randomly, but when the values get to the upper side, an adjustment is made to get the value back to target.
If this is the case, then a fundamental premise is being violated in evaluating normality – that of outside adjustment of the data.
Looking at your probability plot, we use something called the “fat pencil test.” Back when these graphs were created by hand, one would take the pencil used and lay it over the data. If the pencil covered the data points, you could be fairly confident of normality. Now, with statistical tests able to calculate probabilities, we tend to rely on them. However, the statistics are susceptible to individual points which can influence the statistics that visual examination would call “close enough.”
As @rbutler states, the question as to normality depends on the use of the data. Many statistical tests are robust to non-normality, particularly when the data is similar to what you have presented.
If I were mentoring you as one of my belts, I would have you check on the adjustment. If that’s happening, then I would go on and accept normality based on the histogram and prob plot. If not, then I would check on the sensitivity of the stat test that I’m looking to apply and see if it is robust to non-normality, and if so, then proceed. If it is sensitive to normality, then I would take some more data to ensure I have a full and complete picture. Even at 100 data points, you may have only captured one side of the distribution and over more time/data it may fill out.
Hope this helps.
Don’t forget….are the data points giving enough precision, use an MSA to check.
However, use your process map and gather data on X’s in the process and see if there’s an explanation/confirmation of the second grouping of data on the right.
When looking at non-normal data, rule out a few things before trying to calculate the process capability (and these are good general guidelines which can be easily forgotten if the data just so happens to be normally distributed):
1. Are you looking at different levels of an X being captured? Run a dot plot, and SPC, and a few other graphs, research the process. With 100 samples, and the data looking bi-modal, you’ll want to rule this out first. I almost got “burned” by this once. I just wanted the probability of exceeding the specification limit and was so eager for “just the answer” that I initially missed that there were two separate behaviors going on in my data: normal conditions and when there were special events going on at the company.
2. Perhaps the process is unstable? Stability is a requirement of most distributions. An SPC chart can help you determine if you are seeing random noise or potentially special cause noise (remember, the data needs to be in time sequence, if not you can only use Test 1).
3. Data is not truly continuous. Histograms are a tough tool to use to detect this specific measurement issue. Run a dot plot. If the data stacks in nice bins, then this may be some form of attribute data.
4. Perhaps the data is just naturally non-normally distributed? But I would check on the other 3 conditions first to be safe. If you still feel it is naturally non-normally occurring data, there are a wide variety of other distributions besides the normal distribution and transformations, that may be suitable models.
As a last resort, I’ve seen some people convert the data to pass/fail data and treat it as a binomial process capability. This is not my favorite and there are arguments fore and against this approach. But, I’d rule the other conditions out and truly understand why the data was non-normally distributed before considering this strategy.
If you’re using Minitab, I’d recommend trying the following tool.
Stat > Quality Tools > Individual distribution identification
This tool will check your data against 14 distribution and 2 transformations. It might provide some insights.
it’s a great tool! I still remember when it showed up and I’m like….how did people live without this cool tool
You can also use a distribution fitting tool in CrystalBall.
But some times this tool is not sufficient and any distribution or transformation have a p-value > alfa.
And this case you need to discretize the data and calculate inthe hand, but for our luck you will find excel tables in the internet.
And if you have more than one distribution or transformation with p-value > alfa use with the less AD (Anderson Darling number).
But this tool is used only to do the capability of no normal data.
© Copyright iSixSigma 2000-2017. User Agreement. Any reproduction or other use of content without the express written consent of iSixSigma is prohibited. More »