# Measuring variability of non-normal data

Six Sigma – iSixSigma › Forums › Old Forums › General › Measuring variability of non-normal data

- This topic has 10 replies, 3 voices, and was last updated 16 years, 5 months ago by Statman.

- AuthorPosts
- October 23, 2003 at 1:35 am #33670
A question for the stats enthusiasts…I’m working on a DOE whose purpose is to measure which factors contribute to the ineffectiveness of envelope sealing of a mailing machine. My response variable is the length of the envelope gum line which is not sealed (as evidence by the stripping the envelopes and uncovering and measuring the nonadhesive areas). I took 50 pc samples of each of a variety of different size envelopes and ran the test, recording the data of each mail piece. Here’s my question: if I want to discriminate between samples as a function of which sample shows least variabilty in “unsealed” gum line, what statistical tool/method can provide me with that information when the data is grossly non normal? For example, most of the sample pieces will have zeros (completely sealed or no unsealed length) and the smaller balance will record, say .3 or 1.2 inches giving me a lopsided histogram biased to zero. Transformation data doesn’t appear to help. Can I compare two non normal samples (in this case, of different envelope sizes) for variability so I can comfortably (statistically) make a statement that one set of data was used over the other because it was more “reliable” or has less variance?Any thoughts would be appreciated.Thanks in advance for your suggestions.Frank

0October 23, 2003 at 1:56 am #91385Frank,

Just some clearity

Are there other factors besides envelope size in the DOE? or are you only looking at envelope size?0October 23, 2003 at 2:12 am #91386Statman,

Actually, envelope size is not a factor in this experiment (though one might consider it). The factors include speed, machine, moistening level and feeding mode. The envelope variety offers a dimension to analyze the data between standard envelopes, different sized envelopes like catalog or booklets. I could analyze the effects of each of the factors to the sealing performance of envelopes types separately or in some combination to give my study more credibility.

Thanks.0October 23, 2003 at 2:46 am #91387Did you try for each condition in the DOE using log(std dev(Xi)+1) where Xi is the lenght of the unsealed portion of the ith envelope on that condition?

This should work if the majority of the conditions do not have all envelopes completely sealed. And as this statistic approaches zero the condition approaches all sealed envelopes with no variation.

Not sure if it will be “normal” but should be close enough.

Give it a try and let me know how it works.

Regards,

Statman0October 23, 2003 at 4:14 am #91389since u are handling non-normal data and u want to study the impact of different envelope sizes(discrete x) on the variance of length of gum portion(contnuous y), u should use Test for equal variation and see the levene’s test p-value to accept(p>0.05) or reject(p<0.05) the Null hypothesis that is size do not impact the variation in y.

regards

dinesh0October 23, 2003 at 8:28 am #91398Thanks Statman,

I’m0October 23, 2003 at 8:28 am #91399Thanks Statman,

I’m not0October 23, 2003 at 8:44 am #91400Thanks Statman,

I’m not sure if I understand how I would use that formula but I’ll give it some thought.

However, on another thought, if I have 16 samples of 50 and took the mean and std dev of each (16) and plotted the histogram of the std dev and derived a normal distribution, could I then compare it to different envelope types (taking another 16 samples of 50 ea) and state whether one has more variance (of the variance w/in the sample) than the other, thus I would prefer one envelope type data over the other (to measure the effects of a factor)? I recognize that the std dev with a non normal sample is not appropriate, but it does measure variation considering that all the samples were measured the same way.

Sorry for the multiple postings as I accidently hit the tab button and inadvertenly post incomplete messages.

Thanks again for your input…most appreciated.

Frank0October 23, 2003 at 3:12 pm #91434Hi Frank,

Let me explain in more detail what I meant by the transformation that I suggested. I think that I can answer your second question as well.

By the description of your response variable, area of the “unsealed” gum line, the variability will decrease as the mean value decreases. In other words, for a given condition in your DOE the higher the propensity to completely seal the envelope the lower the average area and the lower the standard deviation in area. If this is the case, there should be a strong correlation between the standard deviation and the average Therefore it doesnt matter if you use the average or the standard deviation in assessing the effects of the factors in the DOE

Since you have 50 repeat envelopes for each condition, then what I am suggesting is to calculate the standard deviation of the fifty on each condition (including the ones that are zero) and transforming the standard deviation with a log transformation to normalize the data with the formula:

Yj = log(Std.Dev.(Xij)+1)

where Xij is the ith envelope of the jth condition.

The one is included for the possibility of a condition having all completely sealed envelopes. This metric will approach zero as the number of completely sealed envelopes increases or the variation in unsealed area decreases.

You have 4 process factors in the DOE and one envelope size factor. It sounds like this is a 24 full factorial of the process factors as you have 16 conditions. Correct?

You also want to assess the different size/type of envelopes. The best way to do this is include the envelopes as another factor in the DOE and evaluate it as a factor effect. By testing the differences in envelopes separate from the process factors you may miss some important envelope X process factor interactions and get invalid conclusions depending on the setup of the process.

So if there are 2 types of envelopes the design becomes a 25 full factorial. If there are more than 2 types of envelopes you will need to analyze as ANOVA.

Frank, I am intrigued by the experiment. If you would like some additional help on this let me know your email and we can correspond off this forum.

Regards,

Statman0October 24, 2003 at 7:47 am #91498Statman,

Thanks for the clarification. Yes, when the measurement process is complete (the sealing measurement process is a long and tedious one by which we are approximately half way complete), I will have 16 conditions, therefore 16 Y variables by which I can plot a (hopefully) normal distribution and assess the variabilty element relative to the other envelope types (by which are 15).

Just an overview of the experiment. I view the experiment as two DOEs with the following discernable factors…

Transport Speed

Transport Mode

Machine number (#4 vs #5)

This experiment is repeated after the machines have been upgraded to an improved(?) engineering design level and measured the same way. I refer to the first experiment (pre-upgrade) as the “baseline” DOE. Sort of like a “before and after”. One could consider this a factor but since we could not randomize this dimension of all the combination (as the identical machines were upgraded in the lab and ran immediately following the baseline DOE).

In both cases (upgrade and baseline DOE) we ran samples of 15 media types (or envelopes, as I have been referring to). I can view them as factors as well except to say that in each configuration, the media types ran in the same order (media #1, #2, etc). However, the variety of media type tested will allow me to discriminate which type of data to used to analyze a specific condition or configuration, thus the question on how the measure variability of non normal data in relation to each media type.

I appreciate your interest in the experiment and your input. It was certainly most helpful and I will keep you updated as we progress (if that is OK with you).

Kind regards,

Frank0October 27, 2003 at 3:53 pm #91657Frank,

I dont have a problem with treating the pre and post upgrade as a factor in the DOE. Yes, you were not able to randomize across this factor but you are going to have to assume that any associated difference is due to the upgrade any way you analyze it. You have a similar issue using machine number as a factor. Probably the proper way to analyze the upgrade factor is as a split plot design where the upgrade is the main plot and the other factors are the subplots nested within upgrade, but I dont think that is necessary. My bottom line is treating it as a factorial with all the factors will provide more power in the analysis.

I didnt realize that you had 15 levels of media type. That must be quite an undertaking to measure 50X16X15 total envelopes. Just out of curiosity, can the 15 levels be considered as pseudo-continuous on a severity of processing scale? In other words, can you rank the 15 from easy to process to difficult to process? My thoughts on this are to use propensity of failure as a metric where the media types are part of the measurement system.

Anyway, good luck with your analysis. I am very impressed with the sophistication of your work.

Highest Regards,

Statman0 - AuthorPosts

The forum ‘General’ is closed to new topics and replies.