Probability distributions and control charts.
Six Sigma – iSixSigma › Forums › Old Forums › General › Probability distributions and control charts.
 This topic has 7 replies, 4 voices, and was last updated 18 years ago by lin.

AuthorPosts

September 11, 2004 at 5:53 pm #36832
For those Statistical savvy members, I need an explanation that will allow me to understand how the binomial probability distribution is associated with P and NP charts. Where in the required calculations for the control limits of a P or NP chart does this equation fit?
P(x) = (n!/(nx)!x!)px(1p)(nx)
…if it does at all
I know about the criteria to be used (for defectives characterized by nonconforming units) but I don’t see the connection of the required control charts to the probability distribution function. Is there one?
The same applies to the C and U charts which will model the data characteristic of the Poission.
This has been a bug that has been annoying me for some time. Any clarification for this stubborn BB would be appreciated.
Thanks.
Frank
0September 11, 2004 at 6:05 pm #107213
seen it somewhere…Member@seenitsomewhere... Include @seenitsomewhere... in your post and this person will
be notified via email.Frank, this might help – if you will pardon the blatant cutandpaste plagiarism, I had seen this in the past and placed it in my web browser favorites (tempted as I was to paraphrase and appear smarter than I am):
What About Charts for Count Data? Some data consist of counts rather than measurements. With count data, it has been tradition to use a theoretical approach for constructing control limits rather than an empirical approach for making measurements. The charts obtained by this theoretical approach have traditionally been known as “attribute charts.” There are certain advantages and disadvantages of these charts.Count data differ from measurement data in two ways. First, count data possess a certain irreducible discreteness that measurement data do not. Second, every count must have a known “area of opportunity” to be well defined.With measurement data, the discreteness of the values is a matter of choice. This is not the case with count data, which are based on the occurrence of discrete events (the socalled attributes). Count data always consist of integral values. This inherent discreteness is, therefore, a characteristic of the data and can be used in establishing control charts.The area of opportunity for any given count defines the criteria by which the count must be interpreted. Before two counts may be compared, they must have corresponding (i.e., equally sized) areas of opportunity. If the areas of opportunity are not equally sized, then the counts must be converted into rates before they can be compared effectively. The conversion from counts to rates is accomplished by dividing each count by its own area of opportunity. These two distinctive characteristics of count data have been used to justify different approaches for calculating the control limits of attribute charts. Hence, four control charts are commonly associated with count datathe npchart, the pchart, the cchart and the uchart. However, all four charts are for individual values.The only difference between an XmR chart and an npchart, pchart, cchart or uchart is the way they measure dispersion. For any given set of count data, the Xchart and the four types of charts mentioned previously will show the same running records and central lines. The only difference between these charts will be the method used to compute the distance from the central line to the control limits.The np, p, c and ucharts all assume that the dispersion is a function of the location. That is, they assume that SD(X) is a function of MEAN(X). The application of the relationship between the parameters of a theoretical probability distribution must be justified by establishing a set of conditions. When the conditions are satisfied, the probability model is likely to approximate the behavior of the counts when the process displays a reasonable degree of statistical control.Yet, deciding which probability model is appropriate requires judgment that most students of statistics do not possess. For example, the conditions for using a binomial probability model may be stated as:Binomial Condition 1: The area of opportunity for the count Y must consist of n distinct items. Binomial Condition 2: Each of the n distinct items must be classified as possessing, or not possessing, some attribute. This attribute is usually a type of nonconformance to specifications.Binomial Condition 3: Let p denote the probability that an item has the attribute being counted. The value of p must be the same for all n items in any one sample. While the chart checks if p changes from sample to sample, the value of p must be constant within each sample. Under the conditions, which are considered to be in a state of statistical control, it must be reasonable to assume that the value of p is the same for every sample.Binomial Condition 4: The likelihood that an item possessing the attribute will not be affected if the preceding item possessed the attribute. (This implies, for example, that nonconforming items do not naturally occur in clusters, and counts are independent of each other.)If these four conditions apply to your data, then you may use the binomial model to compute an estimate of SD(X) directly from your estimate of MEAN(X). Or, you could simply place the counts (or proportions) on an XmR chart and estimate the dispersion from the moving range chart. You will obtain essentially the same chart either way.Unlike attribute charts, XmR charts assume nothing about the relationship between the location and dispersion. It measures the location directly with the average, and it measures the dispersion directly with the moving ranges. Thus, while the np, p, c and ucharts use theoretical limits, the XmR chart uses empirical limits. The only advantage of theoretical limits is that they include a larger number of degrees of freedom, which means that they stabilize more quickly.If the theory is correct, and you use an XmR chart, the empirical limits will be similar to the theoretical limits. However, if the theory is wrong, the theoretical limits will be wrong, and the empirical limits will still be correct. You can’t go far wrong using an XmR chart with count data, and it is generally easier to work with empirical limits than to verify the conditions for a theoretical model.About the authorDonald J. Wheeler is an internationally known consulting statistician and the author of Understanding Variation: The Key to Managing Chaos and Understanding Statistical Process Control, Second Edition.
0September 11, 2004 at 6:17 pm #107214
Chris ButterworthParticipant@ChrisButterworth Include @ChrisButterworth in your post and this person will
be notified via email.Hi Frank,
The formula you posted is for finding the exact probability of finding x successes in a sample of size n from a population with a known success rate of p. This is looking for the height of one bar in the histogram. But with p and np charts we are concerned with a process shift so we are only really looking to see if the data exceeds the ± 3 sigma lines. Using average and standard deviation from the Binomial distribution we have
avg = np
std dev = sqrt (np(1p))
Hope this helps
Chris0September 11, 2004 at 7:05 pm #107216Thanks Chris…so there really is no connection to the formula except to say that the probility distribution and the n and np chart elements have the same conditional requirements. That’s my take. thank you
0September 11, 2004 at 7:10 pm #107217Thanks “seen it..”
That certainly helps clarify the notable differences. I’ve cut a paste it into my library of important articles.
I appreciate the input.
Frank0September 12, 2004 at 6:57 pm #107236The equation you gave gives the probability of obtaining exactly x successes occurring in a sample of size n for a population with a probability of p. It is easiest to see this by going back to the excellent red bead experiment. Suppose you have a bowl containing red and white beads. 20% of the beads are red (this is p). You can use this equation to determine the probability of obtaining 0 to n red beads for any sample size. For example, the table below shows the probability for getting 0 to 10 red beads when taking samples of size (n) 10 from the bowl.
p’
Sample Size (n)
No. of Red Beads (x)
Probability
0.2
10
0
10.7%
0.2
10
1
26.8%
0.2
10
2
30.2%
0.2
10
3
20.1%
0.2
10
4
8.8%
0.2
10
5
2.6%
0.2
10
6
0.6%
0.2
10
7
0.1%
0.2
10
8
0.0%
0.2
10
9
0.0%
0.2
10
10
0.0%
You could plot the no. of red beads versus the probability to see the distribution. The shape of the binomial distribution depends on n and p.
The average for the binomial distribution is np. So, for this example, the average is (10)(.2) = 2. The standard deviation for the binomial distribution is sqrt(np(1p)). So, for this example, the standard deviation is 1.26.
Control limits are usually the average plus or minus three standard deviations of what you are plotting. So, you can see that the control limits for the np chart are directly related to the average and standard deviation of the binomial distribution:
UCL = np + 3 sqrt(np(1p))
LCL = np + 3 sqrt(np(1p))
Note that these control limits begin to fail as np < 5 or n(1p) <5. This is because, under these conditions, the binomial distribution is not symmetrical. So, for the example above, you could not use the limits. Also, note that there are criteria that must be met to use the np and p chart. The one some people miss is that the value of p must remain constant for each of the n items in the sample. If this not the case, you cant use the p or np chart. Use the individuals chart instead.
Hope this helped.
0September 17, 2004 at 7:34 am #107506Bill,
your example is an excellent way to illustrate the relationships between the distribution curve and the control limits of the n and np charts. From a very pragmatic point of view, the assumption that p is constant throughout the process, really limits it’s application succumbing to applying Icharts. This is the reason I seldom use n and np charts.
What are your thoughts on C and U charts as it relates to Poisson and is not affected by p and n, rather than by the mean?
I appreciate the insight.
Frank0September 17, 2004 at 2:26 pm #107523Frank,
I seldom use the p chart, never the np chart. The p chart does have some applications though, like a picker in a warehouse monitoring his/her picking accuracy.
For the c chart, the major stumbling block to using it is the requirement that it be related to rare events. The opportunity for defects to occur must be large, but the actual number that occur must be small.
Safety is an easy one to use a c chart on. The opportunity for an accident to happen in a plant, for example, is large but the actual number that occur is small (hopefully).
The control limits on a c chart are cbar +/ 3*sqrt(cbar). The standard deviation of the Poisson distribution is the square root of the average.
If cbar is too small, these limits are not valid either since they are no longer symmetrical. In this case, you can go to a “rate” — the time between rare events and use an individuals chart.
I simply tell people, if you are confused about what chart to use, use the individuals.
Bill0 
AuthorPosts
The forum ‘General’ is closed to new topics and replies.