# what if my data look normal except granularity?

Six Sigma – iSixSigma Forums Old Forums General what if my data look normal except granularity?

Viewing 14 posts - 1 through 14 (of 14 total)
• Author
Posts
• #30483

Georgette
Participant

I am mentoring a GB project where the best metric is standard devation.    The GB has thousands of data points,  and doing a full descriptive of the statistics shows a good normal-looking curve,  with minimal kurtosis or skew.    What we do see is granularity,  but that is the only reason we can see for non-normality.
Are there any rules for what to do with granular data?   What is the reason we can’t take standard devation if every other normal test passes?  (this data is from a micrometer reading to the nearest thousandth of an inch.)   Our only option at this point is to take the range,  or some span across the data.   Any advice would be appreciated!!

0
#79455

Robert Butler
Participant

I’d appreciate it if you would give a technical definition of “granular data”.  I am unfamiliar with this term.  I did do a search on the term and it appears to be from data mining.  Unfortunately, while there were numerous cites and sites employing the term, no definitions were forthcoming.  There was enough discussion about various issues on some of the web sites to lead me to believe that perhaps granular data is nothing more than individual data points but I would want to be sure of this before offering anything.
The lack of response to your question suggests that I’m not alone in understanding the reference.  With a proper definintion perhaps I, or someone else, may be able to offer some advice.

0
#79456

Ed Van Haute
Participant

Without seeing the data or understanding the inputs and your measuring system, my first guess is that your measurement system is not “sensitive” enough, or in other words, lacks the necessary discrimination.

0
#79460

Ron
Member

Based on your excessive sample size you would of course obtain normal data based on the central limit theorem.
Use the data you have !

0
#79465

Mike Carnell
Participant

Ron,
Having 1000 data points does not in any way guarantee a normal distribution and it doesn’t have anything to do with the Central Limit Theorum.

0
#79467

Mike Carnell
Participant

Georgette,
Ed is correct – granularity is typically lack of discrimination. Look at the intervals. It is probably an issue with use (rounding). If you can try breaking the data up like you would in a Multi-Vari study (by shift, by day, etc.) and see if the granularity is indigenous to a particular segment. If it is pretty evenly spread then it is something common across shifts.
Minitab does a dot plot by factor and lets you stack one under the other. Sometimes it is easier to see some of this stuff if you stack the graphs over each other.
Good luck.

0
#79511

Georgette
Participant

Thanks,  everyone.
The problem is I can’t break the data up any further.   The measurements were taken with a micrometer,  and the data collected over months.
My guess is the micrometer should be one digit more distinct to accomodate the variation.   since this is historical data,  I don’t have that option.
What I really want to know is,  if this granularity is the only problem,  (the mean and median are very close, the skewness and kurtosis are minimal),  can I use normal metrics for the data set,  like standard devation,  and Cpk?

0
#79512

James A
Participant

Yes.
If you were to do a cap study on a machine using a CMM and your micrometer – the numbers would be slightly different (resolution) but the overall value will be very similar.
With apologies to all who posted earlier for jumping in and answering this.
James A

0
#79517

C Baucke
Participant

As with other responses to this issue, I think you should consider this data continuous data and go forward, as long as it satisfies the “rule of tens”.
Implementing Six Sigma by Forrest Breyfogle says “…increments of measure must be small relative to both process variability and specification limits. A common rule of thumb is that the increments should be no greater than one-tenth of the smaller of either the process variability or specification limits.”
If you can measure to 1000ths of inch precision, then the data is useful iff the process variation and spec limits are in the 100ths of an inch range.

0
#79519

abasu
Participant

The most likely reason for this is that the variation in the incoming parts is too small to be differentiated with a micrometer i.e the number of distinct categories is <6.
If there is any logical way the data can be subgrouped- the averages of the subgroups should be normal.

0
#79520

Georgette
Participant

Thank you!!
This might just be an option,  as we take 30 readings accross the part.  I could average the reading at spot #1,  for instance,  and so on.
I will look into this option .  – Thank you for your help.

0
#79553

abasu
Participant

Just a comment on selecting a good subgrouping strategy.  Subgroups should be selected such that

all allowable (i.e noise) sources of variation within the subgroup
and variations of interest (i.e factors) are between the subgroups.
in your example subgrouping based on part location (1, 2,.. ) will allow you to study causes of variation between the locations.  Subgrouping based on part number (avg of all 30 readings) will allow you to determine variation over time and causes of variation between the different parts.
good luck

1.

0
#79556

Robert Butler
Participant

Abasu has brought up a point that has been bothering me.  In her first posting Georgette stated that “the best metric is standard deviation”.  After some postings from others, myself included, she responded with a post which included the following: “as we take 30 readings accross the part.  I could average the reading at spot #1”.  These two comments lead me to believe that what she might be concerned about is part-to-part surface variation.
If this is the case, then examining the parts by grouping measurements made at “spot #1” across all parts or subgroup samples across parts would be a mistake since such a grouping would address variation at spot#1 (and spot#2 etc.) as opposed to the variation of the surface as a whole. Similarly, you wouldn’t want to take the average of 30 readings within a part since this would change the focus of the investigation from that of surface variation to variation of mean surface measurements.

0
#79960

Georgette
Participant

Just wanted to thank everyone for their different viewpoints here.  It gives me and my greenbelts a lot of fuel to think this through.  I am so glad there is a knowledge bank like this to tap into when we need it.  thanks again!

0
Viewing 14 posts - 1 through 14 (of 14 total)

The forum ‘General’ is closed to new topics and replies.