# Normality : To Be or Not To Be ?

Six Sigma – iSixSigma › Forums › Old Forums › General › Normality : To Be or Not To Be ?

- This topic has 41 replies, 11 voices, and was last updated 10 years, 6 months ago by Forrest W. Breyfogle III.

- AuthorPosts
- January 18, 2010 at 4:03 am #53149
Does Control Chart data need to be Normal? Over the last few weeks , we witnessed some shadow boxing on this subject. I am sure the last word is yet to be said on this subject .Various views & opinions have been presented by both the Defense & the Prosecution !The common practioneer ( like me) has been left more confused than ever before . To relive us of the pain & suffering , can we have the Final Verdict form the jury ? A ‘Yes’ OR ‘No’ Answerbbusa

0January 18, 2010 at 7:20 am #188448We call as our witnesses; Dr. Walter A. Shewhart and Dr. Donald Wheeler both of which stated, under oath, that the data does not need to be normal.

0January 18, 2010 at 11:15 am #188449

MBBinUSAParticipant@MBBinUSA**Include @MBBinUSA in your post and this person will**

be notified via email.Mr. Darth,Everybody knows the right thing to do is just transform everything.

Dr. Shewhart didn’t have a nice program to make it easy and correct.What kind of BB training did you have anyway?0January 18, 2010 at 11:18 am #188450

AllattarParticipant@Allattar**Include @Allattar in your post and this person will**

be notified via email.As Darth says, no control charts do not need to be normal.

Just to expand on a few things though.

With an Xbar chart, you rely on the central limit theorem.

With the Range chart or S chart, ranges or standard deviations aren’t expected to be normal. You will notice that not all tests are applied to these charts.

However the I-chart is the interesting one.

Let me elaborate, if we have a set of data Weibully distributed with a shape of 1 and scale of 4. This roughly equates to a normal distribution with mean of 4 and standard deviation of 4.

Now +/- 3sd around the mean is -12 to +16.

A normal distribution would have 0.00135 above +3 sd (16)

The Weibull has 0.0183 above 16.

The normal has 0.00135 below -3sd, or -12.

The Weibull sees 0 below -12.

The mean of 4 for a normal distribution is at the 50th percentile. 4 for the Weibull distribution is at the 63.2 percentile.

9 points in a row below the line on a normal distribution has a probability of 0.00195, above the same. So 9 points in a row either side of the line has a probability of 0.0039

With the weibull distribution , 0.632^9 is 0.016, 0.0278^9 is very small, 9.9*10^-6. So the chance of 9 points in a row for the Weibull here is 0.016.

The chance of breaking those two tests, for this Weibull of shape 1, scale of 4, is higher than in a normal distribution.

Moral of the story… understand why your data looks like it does, and then understand the implications for you :)0January 18, 2010 at 11:26 am #188451

AllattarParticipant@Allattar**Include @Allattar in your post and this person will**

be notified via email.Well you shouldnt just transform the data.

You should ask, what shape should this data really follow?

Is the data stable over time?

Are their any reasons why it is not normally distributed?

Plenty of mistakes are made just by going, oh it isn’t normal quick we must transform it.

My favourite being that the data is roughly normal, but the measurements appear very discrete. Looks normal, but fails a normality test, no transform helps there and you usually have people running round wondering what to do.

Till you point out the measurements are all 5.1, 5.2, 5.3, 5.4 etc.. and you get told yes, becuase thats the resolution of the measurement device.0January 18, 2010 at 12:07 pm #188452

AllattarParticipant@Allattar**Include @Allattar in your post and this person will**

be notified via email.Should

0January 18, 2010 at 12:09 pm #188453

AllattarParticipant@Allattar**Include @Allattar in your post and this person will**

be notified via email.grrr… Forums winning.

Anyway was trying to add, what I missed earlier.

That non-normal data is only really something to consider for I-charts. See my other post.0January 18, 2010 at 12:52 pm #188456

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.For more details see the post below and the related thread.

https://www.isixsigma.com/forum/showmessage.asp?messageID=1410080January 18, 2010 at 1:28 pm #188459We could also dredge up the recent thread/battle between Wheeler and Breyfogle. Bottomline:1. Charts are robust to non normality

2. Has nothing to do with Central Limit Theorem

3. I/MR with severe departures from normality (eg: cycle) may need to be transformed with great care.

4. Read Shewhart and Wheeler’s books.I recently reviewed some BB curriculum which stated that the data must be normal and the process in control before using a control chart. ARGGGGGGGGGG.0January 18, 2010 at 1:46 pm #188460

AllattarParticipant@Allattar**Include @Allattar in your post and this person will**

be notified via email.An Xbar chart has everything to do with the central limit theorem.

To suggest a plot over time of averages whereby you plot the standard deviation/number of samples in the subgroup as control limits.

How can it not have anything to do with CLT, its practically a demonstration of it.

I know its being picky, but you cannot dissasociate the ideas of CLT from an Xbar chart.

0January 18, 2010 at 2:33 pm #188461Wrong, you don’t know what you are talking about.

0January 18, 2010 at 3:29 pm #188462I will try to be kinder than my dear friend Stan. First please consider how the control limits are calculated. Second is a quote from Dr. Wheeler’s book:Wheeler did an experiment whereby he computed the three sigma coverage for 1,143 different non normal distributions. He demonstrated that he achieved much greater coverage than that using the three sigma limits. There, he says, These wide regions where three-sigma limits will filter out the bulk of the routine variation are the reason why we do not need to define a reference distribution in order to use a process behavior chart. They are also the reason why we do not need to test our data for normality before placing them on a chart. And this blanket of 99% or better coverage is why we do not need to invoke the blessing of the central limit theorem by averaging several values together before placing the result on a chart. Three-sigma limits bracket the bulk of the routine variation by brute force. Therefore, without knowing which probability model might approximate your process, you can still be reasonably sure that any point that falls outside your three-sigma limits is more likely to be due to a dominant assignable cause than it is to be part of the routine variation coming from the lesser causes .The objective is to take the right action, rather than to find limits that correspond to a particular probability with high precision.

0January 18, 2010 at 6:47 pm #188473

MBBinWIParticipant@MBBinWI**Include @MBBinWI in your post and this person will**

be notified via email.or, to simplify: Set up your control charts as if the data were normally distributed, with control limits at +/- 3 std dev’s. This will filter out the mundane COMMON cause variation for most all actual distributions and indicate when the SPECIAL causes are affecting the output.

Therefore, the answer to the original question is NO, but set up your control charts as if they were normally distributed (yes).0January 19, 2010 at 10:39 am #188500

AllattarParticipant@Allattar**Include @Allattar in your post and this person will**

be notified via email.Sorry but I know better than you.

Of course you can demonstrate most results will fit within +/- 3 standard deviations, and of course you can do that for means. By the very nature of them being means.

Its a little absurd to say I took these means and demonstrated that they all fall within here so it means we dont need to invoke.

All the statement you made Darth implies we don’t need to invoke CLT as a justification, doesn’t mean its still not a present effect on your results. Which you could demonstrate by plotting the probability plot of your means.

Its almost like saying you threw a ball at the ground, therefore we dont need gravity.

I was agreeing, but also stating that it is often overlooked on individual data the implications of a very non normal distribution. The essence of the post was to say, understand your data before panicking.

You lot do make me laugh.0January 19, 2010 at 11:05 am #188501You know better?Wrong

0January 19, 2010 at 11:25 am #188502Well, you certainly have established your credibility and credentials enough for me to totally discard the work of Shewhart and Wheeler.Or you have proven yourself to be a complete boor and moron. After a scientifically conducted poll we have established the latter.And yes, Stan was consulted in the design of the poll questions so it has validity beyond which anyone can refute or dispute.

0January 19, 2010 at 11:46 am #188503This will reinforce the Boor’s contention:http://manufacture-engineering.suite101.com/article.cfm/central_limit_theoremThis from Wheeler himself:Myth Two: Control charts work because of the central limit theorem.The central limit theorem applies to subgroup averages (e.g., as the subgroup size increases, the histogram of the subgroup averages will, in the limit, become more “normal,” regardless of how the individual measurements are distributed). Because many statistical techniques utilize the central limit theorem, it’s only natural to assume that it’s the basis of the control chart. However, this isn’t the case. The central limit theorem describes the behavior of subgroup averages, but it doesn’t describe the behavior of the measures of dispersion. Moreover, there isn’t a need for the finesse of the central limit theorem when working with Shewhart’s charts, where three-sigma limits filter out 99 percent to 100 percent of the probable noise, leaving only the potential signals outside the limits. Because of the conservative nature of the three-sigma limits, the central limit theorem is irrelevant to Shewhart’s charts.Undoubtedly, this myth has been one of the greatest barriers to the effective use of control charts with management and process-industry data. When data are obtained one-value-per-time-period, it’s logical to use subgroups with a size of one. However, if you believe this myth to be true, you’ll feel compelled to average something to make use of the central limit theorem. But the rationality of the data analysis will be sacrificed to superstition.From the Deming Network:http://deming-network.org/archive/98.02/msg00028.html

0January 19, 2010 at 12:14 pm #188506

MBBinWIParticipant@MBBinWI**Include @MBBinWI in your post and this person will**

be notified via email.don’t forget our friend Robert Butler.

0January 19, 2010 at 12:58 pm #188507Dr. Butler’s response led us to a previous thread regarding the need for normality of the data. Would like to hear what our esteemed colleague has to say about the relevance of the Central Limit Theorem and the claim that it is the foundation of the Shewhart chart.

0January 19, 2010 at 2:24 pm #188508

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.I guess I don’t see where you and I disagree Darth. True there were individuals on that thread who were insisting on normality and CLT but the specific post I referenced is almost identical in content to the posts you have been making to this one. The only reason for mentioning the entire thread was because I thought some of the posters to this one might want to see that the arguments for and against had already been made.

0January 19, 2010 at 2:35 pm #188509No, we did not disagree but MBBinWI brought up your name and, as you know, we all value your input. Hope you have the trailer packed and will be heading south in a few days.

0January 19, 2010 at 2:40 pm #188510

AllattarParticipant@Allattar**Include @Allattar in your post and this person will**

be notified via email.Im doing badly at making my point.

So I ran a simulation, 100,000 individual results, or 100,000 subgroups. Then counted the tests broken, finally displayed for you here as a percentage.

Data as,

Normal distribution, Mean 0 sd 1 individual results

Normal distribution, Mean 0, sd 1, subgroup size 5

Weibull distribution, Shape 1 scale 4, individual results

Weibull distribution, shape 1, scale 4, subgroup size 5

Weibull distribution, shape 1, scale 4, subgroup size 10.

Test 2 is 9 points in a row, test 3 is 6 points, just for reference

(Not hopeful this will format correctly)

Test, I, Xbar, W I, W Xbar, Weib sub 10,1, 0.28%, 0.27%, 2.58%, 0.95%, 0.67%2, 0.37%, 0.42%, 1.62%, 0.56%, 0.47%3, 0.04%, 0.05%, 0.03%, 0.03%, 0.04%4, 0.29%, 0.29%, 0.30%, 0.25%, 0.35%5, 0.21%, 0.18%, 0.77%, 0.31%, 0.29%6, 0.50%, 0.52%, 0.25%, 0.36%, 0.43%7, 0.29%, 0.35%, 1.04%, 0.47%, 0.37%8, 0.01%, 0.01%, 0.00%, 0.01%, 0.01%

Compare the individuals chart for normal and Weibull here. There is a big difference in the percentages that break test 1, 2, 5 and 7.

As you subgroup the data with more data points, the differences between the tests drop. The effect of the central limit theorem on averages.

Thats my point here, it is evident in the data you collect. Now clearly with only a subgroup size of 5 or 10 it wont make the distribution of averages normal, but its pushing it towards it.

There is a very real difference though between tests being broken on individual data between normal and this very skewed Weibull distribution. However the point being made is that for practical purposes if it breaks a test, its a good chance its a special cause.0January 19, 2010 at 2:43 pm #188511

AllattarParticipant@Allattar**Include @Allattar in your post and this person will**

be notified via email.Oh I do!

0January 19, 2010 at 3:17 pm #188512Probabilities going all the way out to 2.5%Oh My God!How can we live life with having a signal that shows up 1 in 40! And

as we all know, I’m not smart enough to look at something like a

histogram to make sense of what I am seeing!The sky is falling, the sky is falling!0January 19, 2010 at 4:07 pm #188513

AllattarParticipant@Allattar**Include @Allattar in your post and this person will**

be notified via email.Interesting, I’m agreeing with you and your flaming?

Well everyone needs a hobby I guess.0January 19, 2010 at 4:48 pm #188514Honey,You must not recognize sarcasm when you see it.I’m not agreeing with you. Your argument is much ado about nothing.Those that over emphasize normality and waste time on transforms

miss the opportunity to understand and improve.0January 19, 2010 at 11:46 pm #188521

MBBinWIParticipant@MBBinWI**Include @MBBinWI in your post and this person will**

be notified via email.I bet you’d look at a data set of 10,000 items each of A and B showing a statistically valid difference of 0.01 on a mean of 100 and shout the p-value from the rooftops.

Why don’t you try to understand what you have been told – by some of the most experienced practitioners in the industry – and learn something. This lesson is probably the least expensive you’ll ever get.0January 21, 2010 at 2:39 pm #188551

Jonathon AndellParticipant@Jonathon-Andell**Include @Jonathon-Andell in your post and this person will**

be notified via email.It’s always entertaining reading this forum, especially when the technical debate derails and name-calling kicks in.For what It’s worth, I have found Darth and Robert Butler to have an excellent grasp of what works and what doesn’t.I admit that my statistical credentials fall below some on this list, but I also have been in the game for 20+ years. Here’s how I would approach a data set:1. Plot the raw data on a control chart.2. Use the best computer we ever will possess – the combination of our eyes and our brains – to determine whether blatant special cause exists. 2a. If special cause is detected, the first course of action would be to stabilize the process. Debating about transforming unstable process data ranks high among the most profoundly wasteful discussions this forum has entertained.3. Once the data are stable, you have the option of using a histogram, probability plot, Anderson-Darling, or other ways to decide whether the data appear to follow normality.4. If the data appear non-normal, use something akin to Minitab’s distribution ID utility to get an idea of what distribution might be a good model for the data.5. For extra credit, read James King’s book “Probability Charts for Decision Making.” It discusses kinds of natural phenomena that can give rise to various distributions – which in turn can lead to insights about the process. I hope I don’t need to remind folks that these insights are what we are after.Side comment: I prefer distribution ID over Box-Cox for a few reasons: – More distributions from which to select – The aforementioned potential insight into the process – Once we select a distribution model, Minitab has a sweet utility called “Capability non-normal.” It uses the chosen distribution model to estimate the probabilities and computes an “equivalent” Z value. Best of all, it makes a chart with the real data in real units of measure – no need to back-transform anything – and displays the non-normal distribution curve. I find that students and managers both appreciate this display.Once you’ve done all that, you still have the option of calling people morons.

0January 21, 2010 at 3:08 pm #188552Jonathon,

Always good to hear from you. Thanks for adding to the thread.0January 21, 2010 at 3:16 pm #188553

Jonathon AndellParticipant@Jonathon-Andell**Include @Jonathon-Andell in your post and this person will**

be notified via email.It just isn’t a complete thread without people calling each other morons. Reminds me of some of the playground arguments from my first childhood.The Breyfogle-Wheeler debate was a lot of fun to watch. An odd mix of contradictory dogmas, selective simulations, and occasional bursts of violent agreement. Reminds me of a decades-old issue of “Quality Engineering” with a debate over Taguchi methods.

0January 21, 2010 at 4:09 pm #188554Taguchi???? Anyone using his approach have to be morons…..OK feel at home now? You coming to SoBe for the conference?

0January 21, 2010 at 4:17 pm #188555

Jonathon AndellParticipant@Jonathon-Andell**Include @Jonathon-Andell in your post and this person will**

be notified via email.Travel on my own nickel isn’t a very good option these days.I do like a lot of what Taguchi advocates prior to selecting a matrix, but I rarely use any of his matrices. Did you ever read Schmidt & Launsby’s book on DOE? Interesting outlook.Can I be a moron, too?

0January 21, 2010 at 7:07 pm #188558Shameless, Darth…driving up your post count like that! It’s only an MVP award!

;-P0January 21, 2010 at 7:28 pm #188560And your point is…??????? Looking forward to a fun week. I decided to take one of the Master Workshops on Monday and do a site tour on Thursday. Might as well take advantage of the conference plus it gives me time to sign autographs for my adoring fans :-).

0January 21, 2010 at 7:38 pm #188562

MBBinWIParticipant@MBBinWI**Include @MBBinWI in your post and this person will**

be notified via email.Both of them? Shouldn’t take long.

(I just couldn’t resist)0January 21, 2010 at 7:45 pm #188563Yes, you and HeeBee. Stan is jealous of my fan base so he won’t even talk to me. Carnell wanted to write a long winded autograph so he is out.

0January 21, 2010 at 7:59 pm #188564finger’s crossed that my injury stays dormant… I want to do the boat tour, but my flight conflicts :-(

0January 21, 2010 at 8:07 pm #188566Now that was funny! (Your’s too, MBBGLENNFIDDITCHDREKFANINWI).Actually, My vote for post count is for Stevo.

0January 21, 2010 at 11:41 pm #188568Darth?Hey just because you live in the desert doesn’t mean you should eat

the peyote buttons.0January 21, 2010 at 11:43 pm #188569You whole fan base is at 13th St on SoBe if you catch my drift.

0January 22, 2010 at 12:45 am #188570

Jonathon AndellParticipant@Jonathon-Andell**Include @Jonathon-Andell in your post and this person will**

be notified via email.Doesn’t mean I shouldn’t…

0January 24, 2010 at 8:20 pm #188637

Forrest W. Breyfogle IIIMember@Forrest-Breyfogle**Include @Forrest-Breyfogle in your post and this person will**

be notified via email.If you are not willing to keep an open mind to a paradigm shift, please read no further.

I will be providing links to articles that demonstrate how some of the rules that we have been told by either individuals or within classes have issues.

A transformation that MAKES GOOD PHYSICAL SENSE can be very important, especially when we are trying to describe whether a process is capable of producing a desired response or not and also concerned about over reacting to common-cause variability as though it were special cause.

The reason for this statement is described, in detail with simulated and real data, in the following Quality Digest articles (Again, if you are not willing to take the time to really read the articles with an open mind, do not waste your time opening the links):

– Non-normal data: To Transform or Not to Transform http://www.qualitydigest.com/inside/quality-insider-column/individuals-control-chart-and-data-normality.html

– NOT Transforming the Data Can Be Fatal to Your Analysis: A case study, with real data, describes the need for data transformation. http://www.qualitydigest.com/inside/six-sigma-column/not-transforming-data-can-be-fatal-your-analysis.html

– Predictive Performance Measurements: Going Beyond Red-Yellow-Green Scorecards http://www.qualitydigest.com/inside/quality-insider-column/predictive-performance-measurements.html

– Are Your Business Metrics Measuring the Right Thing? Don’t base your metrics on your organizational chart http://www.qualitydigest.com/inside/quality-insider-article/are-your-business-metrics-looking-right-thing.html

The point was made that an x-bar and R chart is a means to get around this issue because of the central limit theorem. This may be true if for the specific situation an x-bar and R chart could be used instead of an individual chart. However, x-bar and R charts have some fundamental issues too, as described in the published article available through the link http://www.smartersolutions.com/pdfs/online_database/asset.php?documentid=16

Whenever someone suggests, as I have done above, that there are issues with past methodologies there will be initial resistance. The implication of these points are far greater than just to consider transformations or not; e.g., this has a business system measurement potential that resolves the shortcoming of red-yellow-green scorecards. If anyone wants to discuss these important issues one-on-one let me know, [email protected].0 - AuthorPosts

The forum ‘General’ is closed to new topics and replies.