# Normality : To Be or Not To Be ?

Six Sigma – iSixSigma Forums Old Forums General Normality : To Be or Not To Be ?

Viewing 42 posts - 1 through 42 (of 42 total)
• Author
Posts
• #53149

bbusa
Participant

Does Control Chart data need to be Normal? Over the last few weeks , we witnessed some shadow boxing on this subject. I am sure the last word is yet to be said on this subject .Various views & opinions have been presented by both the Defense & the Prosecution !The common practioneer ( like me) has been left more confused than ever before . To relive us of the pain & suffering , can we have the Final Verdict form the jury ? A ‘Yes’ OR ‘No’ Answerbbusa

0
#188448

Darth
Participant

We call as our witnesses; Dr. Walter A. Shewhart and Dr. Donald Wheeler both of which stated, under oath, that the data does not need to be normal.

0
#188449

MBBinUSA
Participant

Mr. Darth,Everybody knows the right thing to do is just transform everything.
Dr. Shewhart didn’t have a nice program to make it easy and correct.What kind of BB training did you have anyway?

0
#188450

Allattar
Participant

As Darth says, no control charts do not need to be normal.
Just to expand on a few things though.
With an Xbar chart, you rely on the central limit theorem.
With the Range chart or S chart, ranges or standard deviations aren’t expected to be normal.  You will notice that not all tests are applied to these charts.
However the I-chart is the interesting one.
Let me elaborate, if we have a set of data Weibully distributed with a shape of 1 and scale of 4.  This roughly equates to a normal distribution with mean of 4 and standard deviation of 4.
Now +/- 3sd around the mean is -12 to +16.
A normal distribution would have 0.00135 above +3 sd (16)
The Weibull has 0.0183 above 16.
The normal has 0.00135 below -3sd, or -12.
The Weibull sees 0 below -12.
The mean of 4 for a normal distribution is at the 50th percentile.  4 for the Weibull distribution is at the 63.2 percentile.
9 points in a row below the line on a normal distribution has a probability of 0.00195, above the same.  So 9 points in a row either side of the line has a probability of 0.0039
With the weibull distribution , 0.632^9 is 0.016, 0.0278^9 is very small, 9.9*10^-6.  So the chance of 9 points in a row for the Weibull here is 0.016.
The chance of breaking those two tests, for this Weibull of shape 1, scale of 4, is higher than in a normal distribution.
Moral of the story… understand why your data looks like it does, and then understand the implications for you :)

0
#188451

Allattar
Participant

Well you shouldnt just transform the data.
You should ask, what shape should this data really follow?
Is the data stable over time?
Are their any reasons why it is not normally distributed?
Plenty of mistakes are made just by going, oh it isn’t normal quick we must transform it.
My favourite being that the data is roughly normal, but the measurements appear very discrete.  Looks normal, but fails a normality test, no transform helps there and you usually have people running round wondering what to do.
Till you point out the measurements are all 5.1, 5.2, 5.3, 5.4 etc.. and you get told yes, becuase thats the resolution of the measurement device.

0
#188452

Allattar
Participant

Should

0
#188453

Allattar
Participant

grrr… Forums winning.
Anyway was trying to add, what I missed earlier.
That non-normal data is only really something to consider for I-charts.  See my other post.

0
#188456

Robert Butler
Participant

For more details see the post below and the related thread.
https://www.isixsigma.com/forum/showmessage.asp?messageID=141008

0
#188459

Darth
Participant

We could also dredge up the recent thread/battle between Wheeler and Breyfogle. Bottomline:1. Charts are robust to non normality
2. Has nothing to do with Central Limit Theorem
3. I/MR with severe departures from normality (eg: cycle) may need to be transformed with great care.
4. Read Shewhart and Wheeler’s books.I recently reviewed some BB curriculum which stated that the data must be normal and the process in control before using a control chart. ARGGGGGGGGGG.

0
#188460

Allattar
Participant

An Xbar chart has everything to do with the central limit theorem.
To suggest a plot over time of averages whereby you plot the standard deviation/number of samples in the subgroup as control limits.
How can it not have anything to do with CLT, its practically a demonstration of it.
I know its being picky, but you cannot dissasociate the ideas of CLT from an Xbar chart.

0
#188461

Mikel
Member

Wrong, you don’t know what you are talking about.

0
#188462

Darth
Participant

I will try to be kinder than my dear friend Stan. First please consider how the control limits are calculated. Second is a quote from Dr. Wheeler’s book:Wheeler did an experiment whereby he computed the three sigma coverage for 1,143 different non normal distributions. He demonstrated that he achieved much greater coverage than that using the three sigma limits. There, he says, These wide regions where three-sigma limits will filter out the bulk of the routine variation are the reason why we do not need to define a reference distribution in order to use a process behavior chart. They are also the reason why we do not need to test our data for normality before placing them on a chart. And this blanket of 99% or better coverage is why we do not need to invoke the blessing of the central limit theorem by averaging several values together before placing the result on a chart. Three-sigma limits bracket the bulk of the routine variation by brute force. Therefore, without knowing which probability model might approximate your process, you can still be reasonably sure that any point that falls outside your three-sigma limits is more likely to be due to a dominant assignable cause than it is to be part of the routine variation coming from the lesser causes.The objective is to take the right action, rather than to find limits that correspond to a particular probability with high precision.

0
#188473

MBBinWI
Participant

or, to simplify:  Set up your control charts as if the data were normally distributed, with control limits at +/- 3 std dev’s.  This will filter out the mundane COMMON cause variation for most all actual distributions and indicate when the SPECIAL causes are affecting the output.
Therefore, the answer to the original question is NO, but set up your control charts as if they were normally distributed (yes).

0
#188500

Allattar
Participant

Sorry but I know better than you.
Of course you can demonstrate most results will fit within +/- 3 standard deviations, and of course you can do that for means.  By the very nature of them being means.
Its a little absurd to say I took these means and demonstrated that they all fall within here so it means we dont need to invoke.
All the statement you made Darth implies we don’t need to invoke CLT as a justification, doesn’t mean its still not a present effect on your results. Which you could demonstrate by plotting the probability plot of your means.
Its almost like saying you threw a ball at the ground, therefore we dont need gravity.
I was agreeing, but also stating that it is often overlooked on individual data the implications of a very non normal distribution.  The essence of the post was to say, understand your data before panicking.
You lot do make me laugh.

0
#188501

Mikel
Member

You know better?Wrong

0
#188502

Darth
Participant

Well, you certainly have established your credibility and credentials enough for me to totally discard the work of Shewhart and Wheeler.Or you have proven yourself to be a complete boor and moron. After a scientifically conducted poll we have established the latter.And yes, Stan was consulted in the design of the poll questions so it has validity beyond which anyone can refute or dispute.

0
#188503

Darth
Participant

This will reinforce the Boor’s contention:http://manufacture-engineering.suite101.com/article.cfm/central_limit_theoremThis from Wheeler himself:Myth Two: Control charts work because of the central limit theorem.The central limit theorem applies to subgroup averages (e.g., as the subgroup size increases, the histogram of the subgroup averages will, in the limit, become more “normal,” regardless of how the individual measurements are distributed). Because many statistical techniques utilize the central limit theorem, it’s only natural to assume that it’s the basis of the control chart. However, this isn’t the case. The central limit theorem describes the behavior of subgroup averages, but it doesn’t describe the behavior of the measures of dispersion. Moreover, there isn’t a need for the finesse of the central limit theorem when working with Shewhart’s charts, where three-sigma limits filter out 99 percent to 100 percent of the probable noise, leaving only the potential signals outside the limits. Because of the conservative nature of the three-sigma limits, the central limit theorem is irrelevant to Shewhart’s charts.Undoubtedly, this myth has been one of the greatest barriers to the effective use of control charts with management and process-industry data. When data are obtained one-value-per-time-period, it’s logical to use subgroups with a size of one. However, if you believe this myth to be true, you’ll feel compelled to average something to make use of the central limit theorem. But the rationality of the data analysis will be sacrificed to superstition.From the Deming Network:http://deming-network.org/archive/98.02/msg00028.html

0
#188506

MBBinWI
Participant

don’t forget our friend Robert Butler.

0
#188507

Darth
Participant

Dr. Butler’s response led us to a previous thread regarding the need for normality of the data. Would like to hear what our esteemed colleague has to say about the relevance of the Central Limit Theorem and the claim that it is the foundation of the Shewhart chart.

0
#188508

Robert Butler
Participant

I guess I don’t see where you and I disagree Darth.  True there were individuals on that thread who were insisting on normality and CLT but the specific post I referenced is almost identical in content to the posts you have been making to this one.  The only reason for mentioning the entire thread was because I thought some of the posters to this one might want to see that the arguments for and against had already been made.

0
#188509

Darth
Participant

No, we did not disagree but MBBinWI brought up your name and, as you know, we all value your input. Hope you have the trailer packed and will be heading south in a few days.

0
#188510

Allattar
Participant

Im doing badly at making my point.
So I ran a simulation, 100,000 individual results, or 100,000 subgroups.  Then counted the tests broken, finally displayed for you here as a percentage.
Data as,
Normal distribution, Mean 0 sd 1 individual results
Normal distribution, Mean 0, sd 1, subgroup size 5
Weibull distribution, Shape 1 scale 4, individual results
Weibull distribution, shape 1, scale 4, subgroup size 5
Weibull distribution, shape 1, scale 4, subgroup size 10.
Test 2 is 9 points in a row, test 3 is 6 points, just for reference
(Not hopeful this will format correctly)
Test, I, Xbar, W I, W Xbar, Weib sub 10,1, 0.28%, 0.27%, 2.58%, 0.95%, 0.67%2, 0.37%, 0.42%, 1.62%, 0.56%, 0.47%3, 0.04%, 0.05%, 0.03%, 0.03%, 0.04%4, 0.29%, 0.29%, 0.30%, 0.25%, 0.35%5, 0.21%, 0.18%, 0.77%, 0.31%, 0.29%6, 0.50%, 0.52%, 0.25%, 0.36%, 0.43%7, 0.29%, 0.35%, 1.04%, 0.47%, 0.37%8, 0.01%, 0.01%, 0.00%, 0.01%, 0.01%
Compare the individuals chart for normal and Weibull here.  There is a big difference in the percentages that break test 1, 2, 5 and 7.
As you subgroup the data with more data points, the differences between the tests drop.  The effect of the central limit theorem on averages.
Thats my point here, it is evident in the data you collect.  Now clearly with only a subgroup size of 5 or 10 it wont make the distribution of averages normal, but its pushing it towards it.
There is a very real difference though between tests being broken on individual data between normal and this very skewed Weibull distribution.  However the point being made is that for practical purposes if it breaks a test, its a good chance its a special cause.

0
#188511

Allattar
Participant

Oh I do!

0
#188512

Mikel
Member

Probabilities going all the way out to 2.5%Oh My God!How can we live life with having a signal that shows up 1 in 40! And
as we all know, I’m not smart enough to look at something like a
histogram to make sense of what I am seeing!The sky is falling, the sky is falling!

0
#188513

Allattar
Participant

Interesting, I’m agreeing with you and your flaming?

Well everyone needs a hobby I guess.

0
#188514

Mikel
Member

Honey,You must not recognize sarcasm when you see it.I’m not agreeing with you. Your argument is much ado about nothing.Those that over emphasize normality and waste time on transforms
miss the opportunity to understand and improve.

0
#188521

MBBinWI
Participant

I bet you’d look at a data set of 10,000 items each of A and B showing a statistically valid difference of 0.01 on a mean of 100 and shout the p-value from the rooftops.
Why don’t you try to understand what you have been told – by some of the most experienced practitioners in the industry – and learn something.  This lesson is probably the least expensive you’ll ever get.

0
#188551

Jonathon Andell
Participant

It’s always entertaining reading this forum, especially when the technical debate derails and name-calling kicks in.For what It’s worth, I have found Darth and Robert Butler to have an excellent grasp of what works and what doesn’t.I admit that my statistical credentials fall below some on this list, but I also have been in the game for 20+ years. Here’s how I would approach a data set:1. Plot the raw data on a control chart.2. Use the best computer we ever will possess – the combination of our eyes and our brains – to determine whether blatant special cause exists. 2a. If special cause is detected, the first course of action would be to stabilize the process. Debating about transforming unstable process data ranks high among the most profoundly wasteful discussions this forum has entertained.3. Once the data are stable, you have the option of using a histogram, probability plot, Anderson-Darling, or other ways to decide whether the data appear to follow normality.4. If the data appear non-normal, use something akin to Minitab’s distribution ID utility to get an idea of what distribution might be a good model for the data.5. For extra credit, read James King’s book “Probability Charts for Decision Making.” It discusses kinds of natural phenomena that can give rise to various distributions – which in turn can lead to insights about the process. I hope I don’t need to remind folks that these insights are what we are after.Side comment: I prefer distribution ID over Box-Cox for a few reasons: – More distributions from which to select – The aforementioned potential insight into the process – Once we select a distribution model, Minitab has a sweet utility called “Capability non-normal.” It uses the chosen distribution model to estimate the probabilities and computes an “equivalent” Z value. Best of all, it makes a chart with the real data in real units of measure – no need to back-transform anything – and displays the non-normal distribution curve. I find that students and managers both appreciate this display.Once you’ve done all that, you still have the option of calling people morons.

0
#188552

Darth
Participant

Jonathon,
Always good to hear from you. Thanks for adding to the thread.

0
#188553

Jonathon Andell
Participant

It just isn’t a complete thread without people calling each other morons. Reminds me of some of the playground arguments from my first childhood.The Breyfogle-Wheeler debate was a lot of fun to watch. An odd mix of contradictory dogmas, selective simulations, and occasional bursts of violent agreement. Reminds me of a decades-old issue of “Quality Engineering” with a debate over Taguchi methods.

0
#188554

Darth
Participant

Taguchi???? Anyone using his approach have to be morons…..OK feel at home now? You coming to SoBe for the conference?

0
#188555

Jonathon Andell
Participant

Travel on my own nickel isn’t a very good option these days.I do like a lot of what Taguchi advocates prior to selecting a matrix, but I rarely use any of his matrices. Did you ever read Schmidt & Launsby’s book on DOE? Interesting outlook.Can I be a moron, too?

0
#188558

GB
Participant

Shameless, Darth…driving up your post count like that! It’s only an MVP award!
;-P

0
#188560

Darth
Participant

And your point is…??????? Looking forward to a fun week. I decided to take one of the Master Workshops on Monday and do a site tour on Thursday. Might as well take advantage of the conference plus it gives me time to sign autographs for my adoring fans :-).

0
#188562

MBBinWI
Participant

Both of them?  Shouldn’t take long.
(I just couldn’t resist)

0
#188563

Darth
Participant

Yes, you and HeeBee. Stan is jealous of my fan base so he won’t even talk to me. Carnell wanted to write a long winded autograph so he is out.

0
#188564

GB
Participant

finger’s crossed that my injury stays dormant… I want to do the boat tour, but my flight conflicts :-(

0
#188566

hbgb2
Participant

Now that was funny! (Your’s too, MBBGLENNFIDDITCHDREKFANINWI).Actually, My vote for post count is for Stevo.

0
#188568

Mikel
Member

Darth?Hey just because you live in the desert doesn’t mean you should eat
the peyote buttons.

0
#188569

Mikel
Member

You whole fan base is at 13th St on SoBe if you catch my drift.

0
#188570

Jonathon Andell
Participant

Doesn’t mean I shouldn’t…

0
#188637

Forrest W. Breyfogle III
Member

If you are not willing to keep an open mind to a paradigm shift, please read no further.

I will be providing links to articles that demonstrate how some of the rules that we have been told by either individuals or within classes have issues.

A transformation that MAKES GOOD PHYSICAL SENSE can be very important, especially when we are trying to describe whether a process is capable of producing a desired response or not and also concerned about over reacting to common-cause variability as though it were special cause.

The reason for this statement is described, in detail with simulated and real data, in the following Quality Digest articles (Again, if you are not willing to take the time to really read the articles with an open mind, do not waste your time opening the links):

–  Non-normal data: To Transform or Not to Transform http://www.qualitydigest.com/inside/quality-insider-column/individuals-control-chart-and-data-normality.html

– NOT Transforming the Data Can Be Fatal to Your Analysis: A case study, with real data, describes the need for data transformation.    http://www.qualitydigest.com/inside/six-sigma-column/not-transforming-data-can-be-fatal-your-analysis.html

– Predictive Performance Measurements: Going Beyond Red-Yellow-Green Scorecards http://www.qualitydigest.com/inside/quality-insider-column/predictive-performance-measurements.html