# Strange data

Six Sigma – iSixSigma Forums Old Forums General Strange data

Viewing 24 posts - 1 through 24 (of 24 total)
• Author
Posts
• #46011

None
Participant

I have 84 data points out to 4 decimals.  The data has a relatively normal looking distribution but P-values indicate that is is not normal.
I noticed that the last two decimal points are 05, 10, 15, 20, etc. as in 6.2105 or 6.2115
The data doesn’t transform to normal so I’m considering using a Weibel distribution to analyze the capability.  Any risks or thoughts on this?
Thanks.

0
#151443

None
Participant

Thanks for the input.  I would expect a normal distribution.  It seems like the data really isn’t continuous but is “bucketed” into decimals of .0005, .0010,.0015,.0020
I have to meet with who collected the data to ask, but I suspect they rounded or perhaps don’t have a gauge of proper discrimination.

0
#151444

Iain Hastings
Participant

It may be that you have a quantization error which is throwing out a low p-value. This could well be the case if the data is within a narrow range (ie all at 6.xxxx).
Do as Adrian suggests. Plot on normal probability paper and do the fat pencil test. Also look to see if there are little vertical “stacks” of dots staddling or close to the normal line. That is, although the data may be continuous it actually looks discrete because of the quantization.
This can be enough to affect the p-value just enough to make it appear non-normal.

0
#151439

Participant

Hi,
Couple of thoughts:
1. What is your tolerance ? What is the range of your data ? Is all your data 6 point something ?
2. Forget AD p values. Do a normal probability plot and use the fat pencil test. If this and the histogram look normal you should be OK to treat it as normal. Did you expect the data to be normal ? What are you measuring ?
3. Don’t use Weibull unless the data matches this distribution. Test first using a probability plot.

0
#151440

Ashman
Member

Data has no meaning without a process. Without a process it is just numbers.
You could fit a Burr distribution … but fitting distributions serves no purpose … except to pad the pockets of SS consultants who teach it.

0
#151446

None
Participant

Lain,
that is exactly how I would describe it.  All of the data points appear to be stacked in column’s instead of continuously distributed.  And the spread of the data is very narrow – 6.3545 to 6.3560or thereabouts.
Thanks

0
#151449

Participant

You definitely have a problem with the measurement system.
Have you performed a Gage R&R ? If you do one you will almost certainly find the number of distinct categories is insufficient.
As a rule of thumb the resolution of your gage should be 1/10th your tolerance. E.g. if the tolerance is 10mm the gage should be accurate to 1mm.

0
#151447

SIAM
Member

You tell him he could fit a Burr distribution, then tell him distribution fitting serves no purpose.  Absolutely brilliant – moron.

0
#151456

None
Participant

So if quantization error is the problem and the data passes the fat-pencil test, is it acceptable to evaluate as normal data?
Thanks very much

0
#151473

Participant

The problem is that if your gage is not acceptable your capability study will not be correct.
Quantisation of this sort is not a problem if it is small wrt your spec limits. After all, all continuous data is quantised to some respect. Measurements in millimetres are quantised in steps of 1 millimeter. It doesn’t matter how for down the scale you go (micrometers, nanometers, etc) your data will always be quantised at your highest resolution. It’s all about the quantisation relative to your spec limits.
If all your data is 6 point something, with quantisation in the 3 and 4th decimal you may or may not have a problem. A gage R&R is the way to go.

0
#151727

Reshma
Participant

You can identify the distribution of the data by “Individual distribution identification” in Minitab and then you can put the identified distribution in the Process Capability Analysis for Non Normal and determine the process performance.
Regards,
Reshma

0
#151729

K S Sharma
Participant

I suggest the following.
1. Take 9 samples (30 observation in each sample)
2. Take mean of all these 9 samples ( you will have 30 means) . These means will be normal as per Central Limt Theorem.
3. Then convert standard error to standard deviation using below mwntioned equation.
Std.Dev = Std.Error   X    Sq root of no of samples (Here it is 9)
All the best
.

0
#151730

K S Sharma
Participant

I suggest the following.
1. Take 9 samples (30 observation in each sample)
2. Take mean of all these 9 samples ( you will have 30 means) . These means will be normal as per Central Limt Theorem.
3. Then convert standard error to standard deviation using below mwntioned equation.
Std.Dev = Std.Error   X    Sq root of no of samples (Here it is 9)
All the best
.

0
#151732

hari
Participant

What is your objective? Why do you worry about calculating the Sigma Value? Don’t  worry too much about capability calcualtion as that is not helpful to improve your business problem except reporting a number.
What is your business problem? Sigma value will not be helpful to solve your business problem as that depends on the mean and standard deviation. If you want to improve or reduce your project CTQ then focus on the Central Tendency and Spread characteristics.
In this case, your data is discrete in nature like countables in each bucket. Through CLT or Box-Cox transformaion, you can’t convert this to Normal and proceed. I suggest you to look whether the problem is with Gauge R&R or not. If Gauge R&R is good then just forget the capability calculation and keep the target for your data mean/standard deviation and proceed.
Don’t go with Weibull distribution as you can fit a Weibull distribution with any type of data and capability calculation is not robust for a weibull distribution

0
#151735

Klerx
Participant

As Anderson very sensitive to bucketing (or quantisation), the chi-square goodness-of-fit is much in this type of problems, but this method is not provided by Minitab.
Please could you forward your data, I am interested to have a better look at the data, and decide on histogram.
Rene

0
#151736

Klerx
Participant

As Anderson-Darling is very sensitive to bucketing (or quantisation), the chi-square goodness-of-fit is much better in this type of problems, but this method is not provided by Minitab.
Please could you forward your data, I am interested to have a better look at the data, and decide on the sahpe of the histogram.
Rene

0
#151737

D
Participant

I suggest you wait with further analysis until you have validated your measurment system.  Further analysis doesn´t give you anything valid if the measurement error is greater than the process variation.
Good luck.D

0
#151741

empirical accrington
Participant

This is one of the most bizarre threads I’ve seen on this website.
Shewhart and Deming must be turning in their graves. To quote Wheeler: ‘The purpose of analysis is insight’. All this malarkey about distributions, etc., has very little to do with process improvement (which generally means analytic studies), and more to do with metaphysics.
” How many angels can you get on the end of pin?”

0
#151748

clb1
Participant

” How many angels can you get on the end of pin?”   Surely the answer is obvious
1. Take 9 flights (30 angels in each flight)
2. Take mean of all these 9 flights ( you will have 30 mean flights) . These flights will consist of perfectly normal angel flights as per Central Limt Theorem.
If you then check the distribution of these flights with the Angel-Darling test you will find you can stuff any number of them into a bucket which can be transformed to the head of a pin! Threfore you can fit as many angels on the head of a pin as space, time, and transformations, will permit
I hope this helps

0
#151744

Angelic Voice
Participant

yesterday, we fit 5,000 angels on one pin. we’re becoming leaner up here by the century.

0
#151746

JB
Participant

Amen.  It feels good to hear a rational voice.
JB

0
#151753

Andy Sleeper
Participant

This situation, which Wheeler calls “chunky data,” is very common. The discrete bins in the data are caused by the resolution of the measurement system. Gauge R&R does not help this. In many cases, an adequate measurement system with at least 10 discrete values within the tolerance (per AIAG guidelines) will produce very chunky data.
If you want to understand or improve the process, the chunkiness is an irrelevant nuisance. It’s irrelevant because it’s created by an otherwise acceptable measurement system. It’s a nuisance because, typically, goodness-of-fit tests will reject every distribution tested. The chi-squared test could work if the bins match the chunks, but no software automates it for this situation.
On the probability plot, the dots will form a set of short vertical lines at regular intervals. To see which distribution model fits best, choose the one where the diagonal line goes through the middle of each short vertical line. This is similar to the “fat pencil” test others have mentioned. But I’d rather see all the data and think about which model is best, rather than covering up part of the plot.
As others have pointed out, distribution models are optional. You can do many things without a distribution model. However, you cannot make any statements about tail probabilities outside the range of the data. As a result, you cannot distinguish between a 5-sigma and a 6-sigma process with a reasonable sample size, unless you use a distribution model.
If you are going to use a distribution model, normal or otherwise, pick one that fits the data. There is a consequence to using a model that does not fit.

0
#151792

Jonathon Andell
Participant

Have you plotted it in a control chart format? Are there any obvious special causes? If so, you may have more than one distribution. You also could look for bimodality. Whatever you do, PLOT THE DATA!!!

0
#151797

accrington
Participant

Great stuff! I look forward to reading more of your learned posts in the future.
Have a nice weekend

0
Viewing 24 posts - 1 through 24 (of 24 total)

The forum ‘General’ is closed to new topics and replies.