Normal distribution
Six Sigma – iSixSigma › Forums › Old Forums › General › Normal distribution
- This topic has 11 replies, 9 voices, and was last updated 15 years, 4 months ago by
Hardik.
-
AuthorPosts
-
January 12, 2007 at 11:12 am #45781
Hi Team ,
I want to know , why there is so much stress on normality , statement like ” data should be normally distributed ” is there .
By going word ” normality ” following question arisesWhy Normality is so much in discussion ,
What is ” Normally distributed data “
If possible sujject me some example to understand this concept
Will appreciate if some one guide me on this thanks Hardik0January 12, 2007 at 1:39 pm #150370
Robert ButlerParticipant@rbutlerInclude @rbutler in your post and this person will
be notified via email.You have asked a very broad question and the answer would take far more space and time than a forum of this type would allow. I’d recommend doing some reading both at the library and on the web.
In an effort to point you in the right direction with respect to your first bullet point I’ve quoted excerpts from pp.72-76 of the section “Adequacy of Normal Distribution as a Physical Model from the book cited below.
“The normal is the most widely used of all distributions. For a long time its importance was exaggerated by the misconception that it was the underlying distribution of nature, and that according to the “Theory of Errors” it was supposed to govern all measurements. With the advent of statistical tests about the year 1900, this assumption was shown not to be universally valid.
Instead, the theoretical justification for the role of the normal distribution is the (or more appropriately “a” since there is more than one) central limit theorem, one of the most important result of mathematical statistics. This theorem states that the distribution of the mean of n independent observations from any distribution, or even from up to n different distributions, with finite mean and variance approaches a normal distribution as the number of observations in the sample becomes large – that is, as n approaches infinity. The result holds, irrespective of the distribution of each of the n elements making up the average.
Although the central limit theorem is concerned with large samples, the sample mean tends to be normally distributed even for relatively small n as long as no single element or small group of elements has a dominating variance and the element distributions do not deviate extremely from a normal distribution.
When a random variable represents the total effect of a large number of independent “small” causes, the central limit theorem thus leads us to expect the distribution of that variable to be normal. Furthermore, empirical evidence has indicated that the normal distribution provides a good representation for many physical variables. The normal distribution has the further advantage for many problems that it is tractable mathematically. Consequently, many of the techniques of statistical inference, such as the method known as the “analysis of variance,” have been derived under the assumption that the data come from a normal distribution.
Because of the prominence, and perhaps the name, of the normal distribution, it is sometimes assumed that a random variable is normally distributed unless proven otherwise. Therefore it should be clearly recognized that many random variables cannot be reasonably regarded as the sum of many small effects, and consequently there is no theoretical reason for expecting a normal distribution. This could be the case when one nonnormal effect is predominant.
The errors of incorrectly assuming normality depend upon the use to which this assumption is put. Many statistical methods derived under this assumption remain valid under moderate deviations and are thus said to be robust. The analysis of variance is an example of a method that is robust under deviations from normality. On the other hand, if the normality assumption were used incorrectly in such problems as determining the proportion of manufactured items above or below some extreme design limit at the tail of the distribution, serious errors might result.”
– From Statistical Models in Engineering – Hahn and Shapiro -19680January 12, 2007 at 1:46 pm #150371
Ken FeldmanParticipant@DarthInclude @Darth in your post and this person will
be notified via email.Robert, you’re still the MAN going into 2007. Look forward to learning a lot from your posts.
0January 12, 2007 at 1:53 pm #150372
Chris SeiderParticipant@cseiderInclude @cseider in your post and this person will
be notified via email.Hardik,
My fellow posters did a nice post but let me answer your first question. There is so much stressing of making the decision if your data is from a normal distribution or not because the statistical tools vary in their selection depending on the type of distribution.0January 12, 2007 at 2:36 pm #150373Hi Robert Butler
thanks for gr8 information , do you know any good web url on basic stats .
regards
rahul0January 12, 2007 at 3:15 pm #150375
i am lazyParticipant@i-am-lazyInclude @i-am-lazy in your post and this person will
be notified via email.do some research and google normal distribution…you’ll be surprised what you come up with.
0January 12, 2007 at 3:45 pm #150377
qualityengineerParticipant@qualityengineerInclude @qualityengineer in your post and this person will
be notified via email.Dear all,
Can anbody give some information about where to find about the concept of “distributions”, what it means and from where they have been arisen?0January 12, 2007 at 4:09 pm #150378
Iain HastingsParticipant@Iain-HastingsInclude @Iain-Hastings in your post and this person will
be notified via email.It took me longer to write this post than it did to find the link. More distributions than you can shake a stick at.
http://en.wikipedia.org/wiki/Probability_distribution
0January 12, 2007 at 5:08 pm #150381Forum:When I see data that follow a normal distribution, I am, at first, suspicious. I saw this only once for the daily output of a nuclear fuel processing facility. It was unusual enough that the Quality leader had me fly me out to the site to investigate. The data was gathered in a manner that obscured the problem and was a consequence of the Central Limit Theorem (CLT).On another project, when I demonstrated the deviations from the usual, though non-normal, distributions of cycle time for engineering cycle time, the same Quality Leader told me that I had finally demonstrated the differences in a process that many MBBs had failed to do so. We proved that restructuring the business segment (over $1BB/year in revenue) was the best thing to do. The person responsible for the restructuring was later promoted to CEO of a large business unit of GE.Another application of understanding deviations from the usual, but non-normal, probalility distributions was identifying the process drivers during a major fraud/bad debt project involving on-line transaction processing. Once the parameters of the non-normal distributions were determined, deviations could be flagged, ranked and investigated. The processes are routinely monitored using control charts with tranformed data to flag fraudulent transactions in real time.This philosophy of identifying deviations from well understood, but non-normal, probability distributions is not uncommon in fraud, risk and credit scoring applications and is key to understanding what the data is telling you about the process. You learn more about the process from the “outliers” than anything else.In general?Always have a look at your raw data using a run chart. You should be able to see if there are significant events (potential Vital Xs) that make a difference to your process. The naysayers will tell you that these are unusual events and, therefore, unlikely to reoccur so you should ignore the data. Ignore this advice, but note that they have unconciously given you a potential Vital X).This information will allow you to show the behaviour of the process when it is under only common cause variation and when there is an event that results in the process behaving “out of control.” Examples include special orders, end of month or end of quarter effects, expedited orders, unusual requests, etc.It is quite likely that the process is non-normal because you are looking at the superposition of two processes; the normal process and the “special case” process – each will have its own characteristics, and the result will look very irregular. Have a look for bimodal distributions – this is a BIG clue that you have two processes.When you are trying to characterize the DPMO for your baseline performance as part of Measure, use a discrete measurement (over or under specification). It is likely that your DPMO is VERY high. The mechanics of whether you have discrete or continuous data is unimportant if your baseline DPMO is 900,000 during the Measure phase.Once you have completed your project, you may have two streams for repeat customers and new customers, for example. NOW you may find that each subgroup is normally distributed (or other well-known distribution, exponential or Weibull) and can use continuous data with the same USL and LSL you used to classify your baseline data into error/nonerror in your Measure phase. You may either transform your data using a Box-Cox transformation or Johnson transformation to do the calculation. MINITAB allows you to directly calculate the process capability assuming a particular distribution wihtout having to transform the data to begin with.Cheers, BTDTP.S. – I haven’t read it, but Andy Sleeper has a new book on probability distributions in Six Sigma projects called “Six Sigma Distribution Modeling,” McGraw-Hill 2007.P.P.S – I am not Andy Sleeper.P.P.P.S – He is not paying me either ;)
0January 12, 2007 at 8:07 pm #150396Excellent. The only error you make is the transform. These are not necessary and can make analysis more difficult. I have a lovely example of a time based histogram for a help desk, bimodal and skewed right, which when tranformed becomes bimodal and skewed left.
You would enjoy reading “Normality and the Process Behaviour Chart” by Wheeler. A wonderful little book.0January 12, 2007 at 8:11 pm #150398The Central Limit Theorem was of interest to Shewhart but he did not use it in formulating control charts. This has been the source of much misunderstanding, particularly by authors such as Montgomery.
There is no need for the CLT in practical process improvement.0January 16, 2007 at 5:19 am #150612Thanks
C Seider , i will keep , u r sujjestion , while doing data analysis .
Rahul0 -
AuthorPosts
The forum ‘General’ is closed to new topics and replies.