control charts
Six Sigma – iSixSigma › Forums › Old Forums › General › control charts
 This topic has 28 replies, 13 voices, and was last updated 18 years, 10 months ago by R. Ramirez.

AuthorPosts

July 12, 2002 at 4:21 am #29856
Hi,
We are trying to implement control charts in our organization. But most of our characterstics doesn’t follow normal distribution, they are following either lognormal or gamma distribution. THe one way of calculating the control limits is by transforming the data into normal and then calculating the limits back.
The other way i read some where was, fit the distribution whatever it is following and calculate the control limits as some x percntiles. We calculated our control limits depending the actual and fitted values as UCL as 85th percentile, LCL as 15th percentile, and CL as 50th percentile.
My question is which will be the best method , what are the advantages and disadvantages of both methods.
I will be very grateful if somebody gives some suggestions.
thanks
A.Sridhar
0July 12, 2002 at 5:27 am #77183
Mike CarnellParticipant@MikeCarnell Include @MikeCarnell in your post and this person will
be notified via email.Sridhar,
You need to read about a thing called the Central Limit Theorem.0July 12, 2002 at 5:42 am #77184
Scot ShankParticipant@ScotShank Include @ScotShank in your post and this person will
be notified via email.A.Sridhar,
Try “Statistical Quality Control” by Eugene Grant and Richard Leavenworth.
Good Luck!
Scot Shank
0July 12, 2002 at 7:21 am #77186Hi mike,
I know about the Central limit theorem, but my probelm is somewhat different. Let me give some more informaiton. I am trying to develop a control chart for effort spent per page of a document while inspection. Now i have only one point for each document. I cant have a sample of more than one point where i can get an average and can think of Central limit thereom.
Can you say how to handle this situation, the other thing is checked in minitab that the data is follwoing log normal distribution.
thanks
sridhar
0July 12, 2002 at 7:24 am #77187Hi mike,
I know about the Central limit theorem, but my probelm is somewhat different. Let me give some more informaiton. I am trying to develop a control chart for effort spent per page of a document while inspection. Now i have only one point for each document. I cant have a sample of more than one point where i can get an average and can think of Central limit thereom.
Can you say how to handle this situation, the other thing is checked in minitab that the data is follwoing log normal distribution.
thanks
sridhar
0July 12, 2002 at 8:13 am #77189Hi Sridhar,
Is it a 100% inspection (instead of sampling)? If so, you do not need a control chart.
As this is an inspection process, what actions do you intend to take should the chart display an outofcontrol symptom? Do you really need the ctrl chart to tell you that?
Best Regards,
CT.0July 12, 2002 at 8:52 am #77190Hi CT
Thanks for the reply. Yes i need to have control charts let me explain why. We are inspecting a document, the no.of defects found will definitely depend on the time spent. Suppose our time spent goes out of control and our defect density of the document is within control, then we can find out the root cause so that the same cause cannot occur in the next project. Hope i explained properly.
thanks
sridhar
0July 12, 2002 at 11:30 am #77192Sridhar:
First of all, I agree with you that you do need to control chart your process – not necessarily for the reason you stated but instead because it is the tool you need to be using to ensure that your process is stable and to identify instances of special causes when they occur and, ultimately to gage improvements when they occur. This is true whether you sample or inspect 100%.
Second, it sounds like you are tracking “time spent inspecting” (if I read correctly). Time is variables data and can be placed on an X bar R chart. Considering the central limit theorem, find a subgroup size (up to 10) so that your key measure (time) when averaged across the subgroup becomes normal. Once this is achieved, plot these points on an X bar R chart. If a subgroup size of 10 or more is needed, I would suggest an X bar S chart which measures variability using the standard deviation of the subgroup as opposed to the range.
In either case, if you subgroup correctly, you will achieve normality and will be able to rely on your control limits as being accurate and truly indicative of your process. All of these percentiles and such are just muddying your process because probably the only person who understands them are you and a few engineers that wander around your facility.
Using a variables chart as above would allow better buyin from the local work force who could really benefit from monitoring these charts.0July 12, 2002 at 1:05 pm #77198
Shree PhadnisMember@ShreePhadnis Include @ShreePhadnis in your post and this person will
be notified via email.Dear Sridhar,
I tend to agree with Bob comments.However if you cannot have rational subgroups then contemplate using EWMA Chart.This chart is robust to the normality assumption.
Shree Phadnis0July 12, 2002 at 1:13 pm #77199You asked about the advantages and disadvantages of doing a transformation. From what I understand, you are saying that there is no rational subgroup. How often are you taking your samples? If you are taking them once a day or once a shift then you can rationally subgroup them into days or weeks. Also, have you checked for autocorrelation? If you are taking them at specific times during the day, you can check for autocorrelation and possibly find a rational subgroup this way.
The problem with using a transformation such as a BoxCox transformation is, the data is now normal but does the control chart make sense to the general work force. It probably won’t. If you chart time used for inspection or number of defects found, this will make sense to the local work force but when you transform it into log data or exponential data, all they are going to see is numbers that they can’t relate to their process.
Hope this helps.
Jamie0July 12, 2002 at 1:33 pm #77200Hi bob,
Thanks for the reply. I fully agree with what ever you mentioned but let me exaplin the problem once again. I am working in a software company where we want to introduce control charts for inspection process of documents. For each project i get only one document and this will be inspected once. What we did is we collected the data for the last one year for all the documents which are inspected and we find the distribution , the data is following log norma for a particular characterstic, now how do i calculate the control limits, do i need to transform the data to normal and calculate the control limits, i wont have any subgroups because all the software projects are different in one or other sence. What will be my options and which one will be better
1. Tranform to normal and then calcualte the control limits
2. I know the data is following lognormal then i can fit the distribution and calculated control limits as some percentiles based on the fitted and actual values.
I am very sorry that i am repeating once again but it is very important for me to know which one i have to go for.
thanks
sridhar
0July 12, 2002 at 1:45 pm #77201
Mike CarnellParticipant@MikeCarnell Include @MikeCarnell in your post and this person will
be notified via email.Sridhar,
If you have a control chart on effort spent per page you have to have a measurement system for effort. That would be the first problem I would see with where you are going.
This may not be the answer you are looking for but in another one of your posts you said that the number of defects is a function of time spent inspecting (fyi – 100% inspection isn’t inspection it is sorting). You could do that study between time and number of defects and use Realistic Tolerancing to set up your control limits. The assumption would be anyone who is finding less defects isn’t spending enough time and anyone who finds more is spending to much time. You whould plot number of defects. You really wouldn’t be able to classify it as a c chart since you wouldn’t be using the correct calculation for c or the control limits but you would be controling based on the nmber of defects.
Just a thought.0July 12, 2002 at 2:22 pm #77206
GabrielParticipant@Gabriel Include @Gabriel in your post and this person will
be notified via email.I hope this is the answer that nobody provided:
The control limits are set to +/3 standard deviations (of the averages in a Xbar chart or of the individuals in a Xi chart). In a normal distribution, this covers the 99,7% of the possible points of the stable process so, if you find a poit out of the limits it is more logical to think that there was a special cause than it just was one of the points of the stable process with a 0.3% of chance. In this way, the limits are set at percentiles 0.15% and 99.85%. But you took 15% and 85%. Note that you are living 30% of the stable process out of the limits. This means that 1 out of three points will be taken as outofcontrol when, in fact, there is no special cause of variation. The idea of a control chart is that an outofcontrol signal will very unlikly happen by chance. Setting the limits at 15% and 85% makes the chart unusable (as a mean of process control). I would set the limits for the nonnormal distribution to the same percentiles used for the normal.
I think that, doing this, it is statistically the same to set the limits using percentiles on the real distribution or to transform it to normal and use the usual limits formulas. So take the method that better fits your needs. As someone posted, transforming will work but then you have to make the translation back to the process language when you want to analyze the chart. Note that I never used non of the cases (transforming or using percentiles), so try to confirm what I am saying. But I see no reason not to use percentiles. I like it more than the transformation.
Hope this helps.0July 12, 2002 at 2:54 pm #77209Hi Gabriel,
Thanks for the reply, i am exactly looking for this kind of reply. But as you said about control limits, my data is following log normal distribution whcih is very skewed on the right side, why we have to go for 99.8 percentile i think we are giving very liitle scope for out of control. What do you feel
0July 12, 2002 at 3:27 pm #77212
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.As I read the posts and the replies I’m left with the impression that there are a number of issues here, however, the main thrust of your question seems to focus on the issue of control limits for a lognormal distribution. The way to deal with this is as follows.
1. log your data
2. compute the average (Mln) and standard deviation (Sln) of the logged terms
i.e. Mln = (Sum (logx))/N, Sln =sqrt[( Sum (logx – Mln)^2)/N1]
3. For control limits in terms of the original, unlogged terms compute the following:
exp(Mln – t(1alpha/2)*Sln = lower limit
exp(Mln + t(1alpha/2)*Sln = upper limit.
These limits will be not be symmetric in the unlogged units but they will include whatever percent of the population you wish to call acceptable i.e. alpha = .05 will give 95% limits etc.0July 12, 2002 at 4:02 pm #77214I’ve seen a number of posts dealing with the issue of trying to set statistical control limits on a process that contains nonnormal data. Some I’m comfortable with, some I’d have to really think about before using. However, I guess I’m not the smartest guy in the world because I keep coming back to the quesiton.. “Why would I ever try to do this in the first place?”
If I understand correctly, you are inspecting documents and the more time you spend inspecting the more defects you find and less, the fewer errors. Now you want to set statistical control limits on the time spent inspecting. Why? so you can control the number of defects you FIND? Sounds more like an SOP problem to me. Just give them a clock, tell them how much time they get to inspect and be done with it. I’d be much more concerned with identifying the defects (by type, frequency, severity etc in whatever time you give them to inspect), find the causes of the defects and eliminating the causes.0July 12, 2002 at 5:54 pm #77218
FabrielParticipant@Fabriel Include @Fabriel in your post and this person will
be notified via email.Percentiles 0.15 and 99.85 leave 0.15% of the points above UCL, 99.7% within limits and 0.15% below LCL, regardless of the normality, skewenes, kurtosys or shape in any way. If you find a point above UCL (right side) the probability of that point to belong to the assumed distribution and being that far just by chance is 0.15%, so the probability to belong to another distribution (what would mean unstability) is 99.85%. And the same is valid for the left side of the distribution. And all this is valid regardless of the shape of the distribution, because we are using percentiles. That’s why, if the distribution is not normal, you can still use percentiles while you can not use +/3 sigma (for individuals charts, I mean, where the central limit theorem does not work)
We may agree that a special cause can be “masked” and not appear as an outofcontrol point. SPC is like (just “like”) an hypothesis test where the null hypothesis is that the distribution hasn’t changed (from the assumed when defined the control limits). This “null hypothesis” is rejected when you find an outofcontrol signal. The chances of an outofcontrol poit to be a real outofcontrol are very high, so the chances to wrongly find an outofcontrol when in fact it was incontrol (alpha risk for a type I error) is very low. However, the chances to have an incontrol point when in fact there is a variation due to a special cause (beta risk for a type II error) are not so low. The only way to keep a low risk of a false outofcontrol and, at the same time, reduce the risk of a false incontrol is to INCREAS THE SAMPLE SIZE (which is 1 in your case). This is another adventage of the Xbar chart (other than not to care about normality because the central limit). That’s why it is said that Xbar charts are more sensitive to detect outofcontrol situations (in terms of hypothesis testig, we would say that Xbar has more power than Xi, and that the power of the Xbar increases with subgroup size).
What do I feel? Using other percentiles closer to the center (as 15% and 85%) will improve the power to detect an outofcontrol situation, but also will dramticaly increase the “false alarms”. An out of control situation would be allways charted as an out of control points. But many out of control points will not correspond to an out of control situation. So you will quicly lose your confidence in the outofcontrol signals because you will not be sure if there is an outof control situation behind, and then the chart becomes useless.
Your Xi chart for a not normal distribution using those percentiles (0.15% and 99.85%) will have the same performance that an Xi chart for a normal distribution (which is not a great performance anyway.
Now tell me what do you think.
PS: One thing I forgot in my first reply: I agree with you about using the 50% percentile (it is called “median”) as the central control limit. The average in a skewed distribution leaves more than 50% of the point at one side, then using the average would increas the chance of finding a false “7 in a row at one side of the center lin” type of outofcontrol signal when the distribution hasn’t really changed.0July 12, 2002 at 5:57 pm #77219Just out of curosity , have you tried changing the sample size and check if the data is normal or not.
One more thing if you have taken a historical data have you seen that the data is plotted on the time series. I mean , is it possible that you have randomised the data. OR Can you randomise the data?? and check again for normality with changing sample size.
I think if you do so you will find the data normal .
May not be the correct way, but just an observation0July 12, 2002 at 5:57 pm #77220
GabrielParticipant@Gabriel Include @Gabriel in your post and this person will
be notified via email.The previous message was laso mine (Gabriel, not Fabriel). It seems that I can’t even type my own name without a misspelling.
0July 13, 2002 at 4:53 am #77230Hi ted,
The control chart on preperation time doesn’t say that you have to inspect within the bands. Yes you are right we have to keep control on the defects. One of the reason for having less or more no.of defects is because of time spent on inspection. THere may be many other reasons also like the persons who inspected does’nt have proper experience, the no. of people inspected, the person who had written the document is very new, no. of pages, complexity of document etc. SO what we are trying to say that if any thing goes out of control just look for the root cause and take correcteive action. The root cause may be any of the above mentioned reasons.
thanks
0July 13, 2002 at 5:05 am #77231Hi Gabriel
Once agian thanks for the information. But i need one more clarification, Don’t you think that we also need to look into the actual values in these kind of distribution. Let me explain why, as my distribution is very skewd towards end, i am getting the values as follows.
For 85th percentile i am gettin the value as 0.6 hrs per page
For 90 th percentile as the frequency is very low as go towards right iam getting the value as 0.7hr per page
Similary if you go further 95th percentile the value is 1.4 hr / page, for 98th percentile as 2.hr/page and 99 as 3hrs per page.
Does this actual value hr/ page needs to be considered while setting these contro limits especially in non norma case
thanks
sridhar
0July 15, 2002 at 12:01 pm #77242
GabrielParticipant@Gabriel Include @Gabriel in your post and this person will
be notified via email.Sridhar:
Please, before I answer to this, send me the followin information:
The percentiles you are showing, are from the real data you collected over a year, or from the distribution that you found that best fitted your process (lognormal or whatever)?
Which are the percentiles (85, 90, 95, 98 and 99) in either case? I have one of them, but I don’t know which one (the oneyear data or the assumed distribution)
How many individuals does your oneyear sample has?
Does this sample looks coherent? Imean, does it look as one group with one center and one shape, and without outliers? And does it look coherent over time? If you plot all points from the older to the last, do they look free of random patterns?
I have some ideas of what may be happening, please send me this information to check0July 15, 2002 at 12:10 pm #77243Shridhar
I agree with TED. I think you are spending time more on finding the control limits. Control charts are to more of an analysi after the event. I think once you have the causes you should look more into preventive measures than trying to control the defects after the event.0July 15, 2002 at 12:59 pm #77245Hi Gabriel,
Thanks for the reply. The values i posted last time are the values of fitted distribution. The other thing is before fitting the distribution i removed all the outliers using box plots. No. of points i used are 72 data points. I checked for the random patterns alos, actually these values from different project obviously you wont find any pattern. Hope i answered all the points you asked for.
thanks
sridhar
0July 15, 2002 at 2:19 pm #77250
GabrielParticipant@Gabriel Include @Gabriel in your post and this person will
be notified via email.Sridhar:
Only one point left: The equivalent percentiles for the 72 data point you used. With 72 datapoint the % will not be exact so let’s say, based on your data only (not assuming any distribution):
With how many hours per page do you cover 58 of the 72 points (approx 80%), how many hours to cover 61 out from the 72 (app 85%), and so on for 65, 68, and 71 (app 90, 95 and 99%). I want to compare how well does the assumed distribution matches the sample data in the right tail zone.0July 15, 2002 at 9:56 pm #77259
Mike CarnellParticipant@MikeCarnell Include @MikeCarnell in your post and this person will
be notified via email.Ted,
If you read the original post the company has decided to implement Control charts. You are correct this type of decision makes little or no sense. Since the number of defects found is a function of time the corrective action to an out of control is to adjust the inspectors sorting time. That was why I suggested correlating the number of defects and time. It will speed up the inevitable – the people figuring out they need to find x number of defects in y amount of time and everyone will leave them alone.
0July 16, 2002 at 3:16 am #77266Hi Gabriel,
How many data points needed in this case.
Percentile 85 90 95 98 99
Actual(hr/pg) 0.61 0.67 0.90 1.10 1.15
Fitted(hr/pg) 0.6 0.70 1.40 2.00 3.00
The above table gives actual and fitted (log normal) values. by looking into the above table we decided as 85 the percentile as our UCL. What do you feel the UCL shoudl be.
sridhar
0July 16, 2002 at 1:56 pm #77275
GabrielParticipant@Gabriel Include @Gabriel in your post and this person will
be notified via email.Sridhar: Take a seat. It is a bit long.
First of all, I’d like to address that I am trying to lend you a hand in the technical part, but I agree with others that posted about the utility to chart what you are charting. I also agree that SPC is a tool, not a “policy”. But I don’t know your business, so I assume (hope) that you have already toaken all that into account.
Now, going back to the numbers, I have several feelings.
72 samples semms too few for me to learn a lot about a distribution, and specially in a case like this.
You said you removed several outliers, meaning that the process was not stable in the period of time of the study. This affects the confidence in the remaining points. You know, when the process is out of control, some (many) points will still be displayed as incontrol just by chance (we have already discussed this fact), and you do not remove those points (you can’t because they look incontrol) even when ther really do not belong to what would be the incontrol distribution. That’s why, in a typical capability study, it is ussually accepted to remove a few outofcontrol points and only if you identified the special cause and removed it to prevent recurrence.
You also said that the distribution does not look normal. A good distribution fitting is hard enough with a reasonable sample, and even harder with small samples. And that is specially true when you try to fit the tails of the distribution. It is very hard to fit (or to check the fitness) of any distribution to any set of data in the tails (several sigmas away from the average) just because you ussualy have too few samples in that zone (specially if the whole sample is not large) to compare, for example, the shape of a histogram with the shape of the mathematical distribution. In fact, in your case, the lognormal clearly does not fit the actual data above the 90% percentile. You will have to forget the distribution fitting and work with the distribution of data of the small sample you have as if it was the real process ditribution (unless you find anothe mathematical model that fits better but, because of what I said, I would not insist in that aproach)
All this makes the statistical validity of you control chart questionable. With all this in sight, I see two posible scenarios.
1) You want to make a process follow up to verify that the process is behaving as it was in the one year study, and identify and correct the special causes when it is not (when it is out of control): That is SPC. The distribution you chose does not fit the data in the control limits zone. Forget about distribution fitting, percentiles and all this stuff. If you have 72 points that you consider to be normal for your process (not normal from the normal distribution, but belonging to the process distribution when the process is free of special causes of variation), none of these points should be outside the limits, just because they are not out of control. So why don’t you try to put the limits just below the lower poit (LCL), just above the higher point (UCL) and on the median (CCL)? Under no way would I set the UCL at 0.6 hr/pg because 15% of the points will be above the UCL just by chance, then it is not a good indication that there is a special cause pressent that have to be corrected. I don’t know what was your higher point, but it has to be about 1.15 hr/pg (in 72 points no point can be higher that the 99% percentile, the 72# piont is the 100% and the 71# is the 98.6%, so the value that includes the 99% will also include the 100%).
2) You don’t care how the process was working in the one year study. You want the process to behave different (probably better) this time, and you want to chart to verify that it behaves as you want now: There are processes (like a turning operation) where the average is easily adjustable. In those cases, you ussually make the study and than define the width of the control limits. But when you draw the control chart, you arbitrarily set the CCL on the target, regardless of what the process average was in the study. By doing this, you are introducing a special cause of variation (putting the process intentionally out of control) one time by adjusting the process average. The new process distribution will be different (shifted) from the original, but you now have improved the process (you have the average where you want) and can make the follow up using the control chart to verify that the “new” process behaves as expected (fitting the shifted distribution). Making an analogy with your process, may be in your case what can be adjusted within certain limits is not the average but the spread (maybe the average too). In this case, take the actions you have to take to put the process where you want. Then you may start setting the limits arbitrarily where you think that more than 99% of the points shuld fall with the “new” process running free of special causes (regardless of any data, percentile or distribution fitted with the “previous” process). After getting enough data (may be in one year) you can make a statistical analysis again.
That’s my feeling. What do you think?0July 22, 2002 at 9:00 am #77428
R. RamirezParticipant@R.Ramirez Include @R.Ramirez in your post and this person will
be notified via email.Donald Wheeler´s Book: UNDERSTANDING STATISTICAL PROCESS CONTROL, 4.3 WHAT IF THE DATA ARE NOT NORMALLY DISTRIBUTED?:…”This mean that even wide departure from normality will have virtually no effect upon the way the control charts function in identifying uncontrol variation”…
I suggest also chapter 4.4:…Myths about Shewhart´s charts because: “the data do not have to be normally distributed before one can place them on a control chart…” This fact is demostrated by the EMPIRICAL RULE, page 61 of the book above.0 
AuthorPosts
The forum ‘General’ is closed to new topics and replies.