John Noguera
@John-NogueraMember since April 15, 2002
was active Not recently activeForum Replies Created
Forum Replies Created
- AuthorPosts
- August 27, 2009 at 12:24 pm #185032
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.I agree with your recommendations and the possibility of a bimodal distribution, but caution that the gaps may also be due to sampling error given the small sample size of 35.
0August 26, 2009 at 8:59 pm #185014
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Angie,
Thanks for sharing the data. The problem you have with fitting your data is probably due to the “chunkiness” of your data, likely due to limitation in measurement discrimination.Andrew Sleeper’s book “Six Sigma Distributions” has a good discussion on this topic.
SigmaXL found that the best fit was a three parameter loglogistic, but unfortunately had an AD p-value = .001, indicating a poor fit.0November 6, 2008 at 2:57 pm #177450
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Rene,
One other thing, when you say that found some transformations that visually look to fit, I assume that the AD p-values were still <.05. Do you see "chunky" data, i.e. vertical lines with the same values? This takes us back to measurent discrimination. If this is the case, I would go with the transformation that gives you the best fit, recognizing that the capability will be approximate, but still better than using the untransformed raw data.0November 6, 2008 at 2:43 pm #177445
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Rene,
Darth’s comment is correct. In fact the reason for the non-normality and inability to transform could simply be that you have a truncated normal distribution due to the inspection.
If this is not the case, then check for the usual suspects: outliers (with special causes), bimodal (a hidden X factor), and limited measurement discrimination (as discussed).
If none of the above apply, then you should consider all of:
Box-Cox
Johnson
Families of Distributions
Clements (Pearson) and Burr
Response Model Methodology (by Haim Shore – not to be confused with Response Surface Methodology).
0August 9, 2007 at 2:30 pm #159730
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Robert’s recommendation is good. Here are some additional things to consider:
You might be able to use Box-Cox if you add 1 to your data and spec limits.
Johnson is a system of non-linear transformations (logarithmic and inverse hyperbolic sine). See Minitab’s Help on Methods and Formulas.
See also:
Y. Chou, A.M. Polansky, and R.L. Mason (1998). “Transforming nonnormal Data to Normality in Statistical Process Control,” Journal of Quality Technology, 30, April, pp 133-141.
Nicholas R. Farnum (1996). “Using Johnson Curves to Describe Non-Normal Process Data,” Quality Engineering, Vol. 9, No. 2, December, 329-336.
Box-Cox and Johnson can be applied to t-tests if the data are non-normal and sample size is small. Central limit theorem will work if the sample size is greater than 80 for extreme skewness (Skew approx = 2), 50 for moderate skewness (Skew approx = 1), 30 for extreme kurtosis (Kurt approx = 2) and 15 for moderate kurtosis (Kurt approx = 1).
See J.F. Ratcliffe, The Effect on the t Distribution of Non-normailty in the Sampled Population, Applied Statistics, Vol. 17, No. 1 (1968), pp. 42-48.
I just happen to be doing research on this topic and will soon publish a paper with an equation that gives a precise:
Nmin for normality = f(Skew, Kurt).
0September 15, 2006 at 12:14 pm #143318
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Holly,
You have to be careful in comparing results from different analyses. One Factor ANOVA assumes that your factor is fixed, and the null hypothesis is equality of means. On the other hand Variance Components Analysis assumes that your factor(s) is random and the null hypothesis is equality of variance.
Having said that, using your variance component study to determine how to apply control charts, you now know that 70% of your variation is “unexplained”. So are there other factors that you can consider such as temporal, location, operator, equipment, etc? If you are able to reanalyze or redo your SOV study you can include these to determine the largest component and Pareto the variance components. Hans mentioned specialized control charts for variance components, which could then be applied.
If you are just getting started with SPC I would probably keep it simple and use classical X-bar & S looking for assignable causes. As you mature in the use of the tool then you can look at more advanced techniques such as variance components control charts.0September 2, 2006 at 12:56 pm #142691
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Hey Eric,
Love that wry sense of humor!
John
0May 5, 2006 at 1:34 pm #137330
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Hi Darth,
Thanks for your post. Good comments regarding the scale. Your reference paper does, in the conclusion, allow that the t-test may be somewhat robust.
Ordinal Logistic Regression can also be used for comparisons if you dummy code the predictors.
The challenge is what does one teach at the Green Belt level?0May 5, 2006 at 12:31 am #137310
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.As an analogy, the electric vacuum would be the common tool used by a general practioner. The advanced tools would be akin to the tools used by a professional carpet cleaner.
See http://www.upa.pdx.edu/IOA/newsom/da1/ho_levels.doc
“Common PracticeAlthough Likert-type scales are technically ordinal scales, most researchers treat them as continuous variables and use normal theory statistics with them. When there are 5 or more categories there is relatively little harm in doing this (Johnson & Creech, 1983; Zumbo & Zimmerman, 1993). Most researchers probably also use these statistics when there are 4 ordinal categories, although this may be problematic at times. Note that this distinction applies to the dependent variable used in the analysis, not necessarily the response categories used in a survey whenever multiple items are combined (e.g., by computing the mean or sum). Once two or more Likert or ordinal items are combined, the number of possible values for the composite variable begin to increase beyond 5 categories. Thus, it is quite common practice to treat these composite scores as continuous variables.”0May 4, 2006 at 8:45 pm #137292
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Likert data is ordinal. The “technically correct” analysis with this type of data is Ordinal Logistic Regression and/or Kendall’s Coefficient of Concordance and would be the tools of choice for a Statistician.
Treating Ordinal data as continuous is however a reasonable simplifying approximation. Therefore use of T-tests and ANOVA are not incorrect.0November 30, 2005 at 2:31 am #130483
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Let’s try that again:
It depends – what is driving your non-normality – skewness, bimodal, outliers?
If the data is symmetric or moderately skewed and n for each sample is > 15 central limit theorem will give you approximate validity.
You can also try a box-cox transformation. If you find a suitable transformation, the same transformation must be applied to both samples.
The Mann-Whitney test does assume approximately equal variance but is robust against outliers.
The most robust test that is readily available would be Mood’s Median which will work for two samples.
A little known test that can also be used here is but is not readily available in Six Sigma statistical software is the Kolmogorov-Smirnov Two Sample Test.0November 30, 2005 at 2:29 am #130482
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.It depends – what is driving your non-normality – skewness, bimodal, outliers?
If the data is symmetric or moderately slewed and n for each sample is > 15 central limit theorem will give you approximate validity.
You can also try a box-cox transformation. If you find a suitable transformation, the same transformation must be applied to both samples.
The Mann-Whitney test does assume approximately equal variance but is robust against outliers.
The most robust test that is readily available would be Mood’s Median which will work for two samples.
A little known test that can also be used here is but is not readily available in Six Sigma statistical software is the Kolmogorov-Smirnov Two Sample Test.
or has outliers, is your data non-normal due to moderate skewed, severe skewed, bimodal,0November 29, 2005 at 2:47 pm #130444
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Retract – use two sample t with unequal variance.
0November 29, 2005 at 2:45 pm #130443
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Use Welch’s ANOVA ( Assuming that your sample size is large enough for Central Limit Theorem to work). JMP and SigmaXL include this tool.
0August 4, 2005 at 8:35 pm #124317
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Geckho,
It is an interesting discussion. I do appreciate your answer, but I believe that if you look at the work done by people who have dedicated their careers to best practices in SPC (e.g. Donald Wheeler) you will find a consistent theme: Do not recalculate established limits unless you have deliberately improved/changed the process.
I would challenge you to find any published work (other than Minitab’s help) where auto-recalculation is considered acceptable.0August 4, 2005 at 7:20 pm #124312
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Gekho,
Sorry I did not properly answer your 3rd question. It was a moving window but I am not sure of the window size.
0August 4, 2005 at 7:13 pm #124311
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Geckho,
No problem. See answers below.
Was it an automated SPC system? Yes.
Was there an actual person looking at the data, or was it all recorded, analyzed, and controlled by the equipment? There was an owner of the data, but this person incorrectly assumed that they would be alerted whenever there was a problem.
Were they recalculating using all of the points, or did the calculations only look at the last x number of subgroups? The control limits were automatically recalculated with every new data point.
Did the database constantly refresh, preventing them from looking at the “whole picture”? I’m not sure what you mean by database refresh. The problem was automatic recalculation of control limits led to a false picture that the process was stable.
Which tests were they using to detect out-of-control conditions? Only Test 1. An interesting theoretical question would be whether or not the drift would have been detected had Tests 2-8 or EWMA been used. But again the root cause was the auto-recalculation of limits.0August 4, 2005 at 6:58 pm #124310
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Dave,
Thanks for your reply. The issue I have here is the default of automatic recalculation of limits when the “Update Graph Automatically” is turned on. You should set the default to continue using existing limits (and provide an optional auto-recalculate).
This is a matter of poka-yoke not training. For example you do this very well in Box Cox where you do not even display columns containing values <=0.0August 4, 2005 at 6:11 pm #124299
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Darth is correct. Automatic recalculation of control limits is wrong. I have personally seen a semiconductor supply company use automatic recalculation to their detriment. There was a slow long term drift over a one year period of time. They totally missed it because spc alarms thafailed to trigger due to the automatic recalculation of limits.
I do not like to speak ill of Minitab, but this is one area where they promote bad practice. (Sorry Keith et al). This is an opportunity for improvement (14.3?).0June 20, 2005 at 6:15 pm #121824
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.In this case there is a better alternative to Kruskal-Wallis or Mood’s Median. It is called Welch’s ANOVA currently only available in JMP, but soon to be included in another tool.
0June 16, 2005 at 10:06 am #121527
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Hi Lynette,
Please send me an e-mail: jnoguera at jgna dot com.0June 15, 2005 at 10:14 am #121437
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.I am fine. Keeping very busy with MU internal/external and my usual international. Marc B is now with us.
Is Gabrielle is a pseudonym for Roberta?0June 15, 2005 at 2:07 am #121426
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Sorry how could I forget – is this Roberta?
0June 15, 2005 at 2:05 am #121425
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Dawn? Lynette? Julie?
How are you! By the way it was 2001 – minor correction :-)0May 27, 2005 at 4:12 pm #120356
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Neil,
Sorry I did not mean to imply that the Pearson Standardized residuals would satisfy a KS or AD test for normality. The usefulness of these residuals is to examine potential outliers. Leverage is also important to consider.
A good reference is “Regression Models for Categorical and Limited Dependent Variables” by J Scott Long.0May 26, 2005 at 11:08 pm #120296
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Neil,
Try looking at the the Pearson Standardized Residuals for the logistic. Are they “roughly” normal? Are the Goodness of Fit p-values all > .05? Some software packages also provide a pseudo R-Square value by McFadden.
Can the proportion of the total area be considered a rate? If so you may want to consider Poisson regression.
0May 14, 2005 at 2:12 pm #119528
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Markov Chains are used to take into account conditional probabilities. The practical application to the SS world include reliability modelling and figuring out the average run length characteristics of EWMA, CUSUM or Shewhart control charts with Western Electric and Nelson rules turned on.
The math is not so simple so in typical practise monte carlo simulations are used rather than Markov Chains.
0April 24, 2005 at 1:14 pm #118330
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Blur,
G-Square (Likelihood -Ratio Test) can be approximated by a Chi-Square distribution if n/df > 5. See “Categorical Data Analysis” Second edition, Alan Agresti, Wiley, 2002.0April 20, 2005 at 7:57 pm #118144
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Deep,
In addition to the previous poster’s comments on Likert and controlling the environmemt variables, it is important that you block by taster or use a paired t-test to compare differences within taster and exclude taster to taster variability.
Taste test labs also “calibrate” with standard samples. For example a diet cola test would use 3 standards: “low” sweetness, “mid” sweetness and “high” sweetness. ( This may not be feasible in your situation).
0April 17, 2005 at 3:48 pm #117921
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.To the best of my knowledge Minitab uses standard methods of optimization for the Optimizer. See:Derringer, G., & Suich, R. (1980). Simultaneous Optimization of several response variables. Journal of Quality Technology, 12, 214-219. However newer techniques employing Genetic Algorithms will give better results. See:FRANCISCO ORTIZ, JR., JAMES R. SIMPSON, AND JOSEPH J. PIGNATIELLO, JR., A Genetic Algorithm Approach to Multiple-Response Optimization, Journal of Quality Technology 432 Vol. 36, No. 4, October 2004
0January 26, 2005 at 8:45 pm #114044
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Jeff,
You deserve a better answer than given by the previous poster!
This is an interesting question. Adding WECO and Nelson rules will improve your chart sensitivity to detect small process shifts and, assuming proper corrective action, will reduce the likelihood of a defect. The easiest way to estimate the impact of the rules to dpm level would be via simulation (otherwise the math requires the use of Markov chains). You also need to consider the increased alpha risk, or probability of a false alarm
There are several related papers published in the Journal of Quality Technology and Technometrics searchable via ASQ’s web site. Use the key word “Average Run Length”.
If you have an organization that is mature in use of SPC consider the use of the EWMA chart rather than simply adding WECO to Shewhart charts. This gives you the advantage of improved sensitivity without the increased alpha risk.
0January 15, 2005 at 11:25 pm #113542
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.I purchased the tape in 1990 – not sure if it is still available.
“Mike & Dick” Video
Title: Planned ExperimentationM00105Price was approx. $500.Xerox Media CenterWebster, NY585-422-4915I don’t know about “Fight Back”, but the “Against All Odds: Inside Statistics” series has a program that includes the Energizer Bunny to discuss confidence intervals. See http://www.learner.org/resources/series65.html0December 30, 2004 at 12:42 am #112932
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Pitzou:
See JMP’s Stat Graph Guide, page 121.
The original work is by Welch, B.L. (1951), On the comparison of several mean values: an alternative approach, Biometrika 38, 330336.
0December 20, 2004 at 11:53 pm #112714
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.The RAMS Symposium was 1987.
0December 20, 2004 at 11:52 pm #112713
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Bill’s original work was in reliability. He developed Six Sigma as a driver to improve product reliability.
Two papers by Bill Smith:
“Integrated Product and Process Design to Achieve High Reliability in Both Early and Useful Life of the Product” IEEE Proceedings Annual Reliability and Maintainability Symposium.
“Six Sigma Design” IEEE Spectrum, Sept 1993, pp 43-47.0December 2, 2004 at 10:25 am #111614
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Marc,
I agree with you. Simplicity is best. But be careful – extremely skewed data is probably better handled by transformation than assuming CLT will work for you.
In my previous posts I am assuming that there is a theoritical basis for the exponential distribution.0December 2, 2004 at 4:39 am #111603
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Doug,
You are welcome.
The unequal sample size is not a problem.
You could simply add a shift, or use the three parameter Weibull which includes a shift parameter. The exponential distribution is a special case of the Weibull (Shape=1). Note that with your negative values you will get an error message saying the variance-covariance matrix is singular, but the confidence intervals on the scale and shape ratios will work.
I am assuming that you have a theoretical basis for using the exponential, otherwise the use of more basic tools discussed earlier such as nonparametrics or shift + Box-Cox would be appropriate.0December 1, 2004 at 4:35 am #111533
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Sorry I meant “censored”.
0December 1, 2004 at 4:33 am #111532
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Doug,
Minitab has a built-in hypothesis test for the exponential distribution. It is found in:
Stat > Reliability/Survival > Distribution Analysis (Right Censoring) > Parametric Distribution Analysis. Choose Exponential. Click Test. You can select scale and shape.
Note that the data does not have to be cencored to use this tool.
0December 1, 2004 at 4:26 am #111531
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Doug,
Kruskal-Wallis is based on ranks, so intuitively unequal variance is not going to be a problem.
To quote Sheskin:
It should be pointed out, however, that there is some empirical research which suggests that the sampling distribution for the Kruskal-Wallis test statistic is not as affected by violation of the homogeneity of variance assumption as is the F distribution (which is the sampling distribution for the single-factor between-subjects analysis of variance). One reason cited by various sources for employing the Kruskal-Wallis one-way analysis of variance by ranks, is that by virtue of ranking interval/ratio data a researcher can reduce or eliminate the impact of outliers.0November 11, 2004 at 11:54 am #110582
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.You need to use Weibayes analysis. This is Weibull analysis with no failures (a combined Weibull and Bayesian approach). Dr. Robert Abernethy’s book “The New Weibull Handbook” is a great resource. The analysis is quite straightforward. See chapter 6, page 6-10 for a case study very similar to your problem.
You will need the scale and shape of your original data. Use Minitab’s Reliability tools for this.0November 2, 2004 at 12:27 pm #110112
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Tony,
Your question is a bit unclear. If you are asking about the availability of columns for Box-Cox Transformation, Minitab hides any columns that have text, or cells with negative or zero values.0October 7, 2004 at 8:26 am #108657
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Hi Robert,Thanks for your comments. I agree with you that the actual settings should be used, evaluate VIF and proceed from there. However, as you know, in practice the designs are often analyzed using the original worksheet settings, resulting in increased experimental error and reduction in model “usefulness”. Part of the problem lies with software tools not providing easy to access/interpret design diagnostics.
0October 3, 2004 at 4:25 pm #108475
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Hi Robert,
Thanks for your note. Good point about the loss of information with varying Xs. Of course you can analyze the data using the actual Xs but the design is no longer orthogonal. Collinearity turns what started as a DOE into a “historical regression” problem.0October 3, 2004 at 4:13 pm #108474
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Hi Andy,Sorry for the delayed response. I am on vacation in Cairo.Thanks for the very interesting link. In 1992, I was looking into developing a software product that integrated Prof. Zadeh’s Fuzzy Logic theory with classical SPC techniques (even had venture capital interest). I did not pursue this because of other consulting & software opportunities.I will have to dig deeper but at first glance it appears that the particular strength of fuzzy regression is in outlier detection, as an alternative to robust regression methods.Regards,John
0October 1, 2004 at 6:49 am #108387
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Hi Robert,
Do you know of any software tools that employ Maximum Likelihood to accomodate errors in X?0September 25, 2004 at 8:34 pm #108006
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.“Non-normal”
0September 25, 2004 at 8:33 pm #108005
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Hi Robert,
Thanks for the reply.
By the way, did you see my earlier post with the email from Davis Bothe on extrapolation of Nplots for Non-noramal capability indices?
0September 23, 2004 at 1:46 am #107801
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Hi Robert,
I Have you seen RSM applied in a transactional environment – discrete event simulation or otherwise?0September 10, 2004 at 6:55 am #107115
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.To start with, your sample size must be large enough to satisfy np > = 5.
If you want to get an estimate of your sigma level, compute the exact 95% confidence interval based the Inverse Beta distribution.
If you observe zero defects with a sample size of 1 million, then you can claim Six Sigma quality.
0August 9, 2004 at 2:03 pm #105276
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Far better to consider Average Run Length (ARL) = 1/(1-Beta). So in the case of a 1.5 Sigma fhift, using n=4, the ARL = 2.0. On average you will detect a 1.5 Sigma shift within 2 observations.
These numbers are not applicable when you add tests for special causes. Markov Chains are required to do the calculations, or estimates can be obtained through simulation.0August 7, 2004 at 2:08 pm #105215
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.A proper guardband comes from your actual Sigma Measurement, not some pre-set 20% number.
Upper Guardband = USL – 6*Sigma Measurement
Lower Guardband = LSL + 6 * Sigma Measurement
Of course you could go with a less conservative multiplier like 5.15 or even 3, but do not use less than 3.
As mentioned in my previous post, the Reproducibility part of Sigma Measurment should include tester-to-tester and day-to-day variability, not just operator-to-operator. It gets more challenging if you have to take into account other factors like temperature drift, but let’s leave that one for now.
There is one thing you can do in test that would permit a reduction in the size of the gurdband, and that is repeat the tests and average. This will reduce the Repeatability component (by S Repeat/SQRT(N)) but of course this is more expensive. Trade off the yield loss versus the cost of test.0August 6, 2004 at 2:43 am #105103
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.6*Sigma Measurement is the simplistic (conservative) answer, but you have to take into account other factors such as drift over temperature, and ensure that Sigma Measurement includes Reproducibility factors such as lab-to-lab and/or tester-to-tester.
Of course this does not bode well for yield loss, hence the need to have stable, precise and accurate measurement systems.
I developed a technique in 1986 using regression to compensate for temperature drift. Let me know if this would be of interest.
0July 16, 2004 at 11:49 am #103564
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.You cannot “dial-up” the ARL characteristics. They are a function of lambda, stdev multiplier, and sample size (EWMA) or h, k, sample size (CUSUM). There are also some start-up options for Fast Initial Response.
Search through Journal of Quality Technology and Technometrics for additional information.
0July 8, 2004 at 3:20 pm #103127
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Hi Robert:
I am attaching an e-mail I received from Davis Bothe on this topic. By the way he has read some of your responses on isixsigma and said that you were “quite knowledgeable and experienced in many statistical methods”!
Regards,John
The issue you raise about non-normal distributions is an important one, especially when dealing with small sample sizes (throughout the following discussion, I am assuming the data came from a process that is in statistical control because even just one out-of-control reading in a small sample will greatly influence the apparent shape of the distribution).
I don’t think defaulting to the minimum and maximum values of the data set for the .135 and 99.865 percentiles is a good idea. Doing so would produce an artificially high estimate of process capability because the measurement associated with the .135 percentile would be located much farther below the minimum measurement of the data set and the “true” 99.865 percentile would be much greater than the maximum value of the data set.
For example, in a sample of 30, the smallest measurement would be assigned a percentage plot point of around 2 percent (depending on how you assign percentages). This point represents the 2.00 percentile (x2.00), which is associated with a much higher measurement value than that associated with the .135 percentile (x.135).
The largest measurement would be assigned a percentage plot point of somewhere around 98 percent. This point represents the 98.00 percentile (x98.00), which would be associated with a lower measurement than that associated with the 99.865 percentile (x99.865).
The difference between x98.00 and x2.00 will therefore be much smaller than the difference between x99.865 and x.135. Thus, the Equivalent Pp index using the first difference will be much larger than the one computed with the second difference.
Equivalent Pp = (USL – LSL) / (x98.00 – x2.00) is greater than
Equivalent Pp = (USL – LSL) / (x99.865 – x.135)
So I think it would be best to avoid this approach as it produces overly optimistic results.
So what would be better? I recommend fitting a curve through the points and extrapolating out to the .135 and 99.865 percentiles. However, this can be difficult when there are few measurements. First, there should be at least 6 different measurement values in the sample. Often, with a highly capable process and a typical gage (15% R&R), most measurements will be identical. For a sample size of 30, I have seen 6 repeated readings of the smallest reading, 19 of the middle reading, and 5 of the largest reading. With only three distinct measurement values, it is impossible to use curve fitting or NOPP.
Second, with really small sample sizes (less than 20), you must extrapolate quite a distance from the first plotted point down to the .135 percentile and quite a ways from the last plotted point to 99.865 percentile. This presents an opportunity for a sizable error in the accuracy of the extrapolation. However, I believe this error would always be less than that of using the smallest value as an estimate of the .135 percentile. This method is always wrong whereas the extrapolation method might produce the correct value.
You had asked about my preference between using the last two plot points for the extrapolation or relying on a smoothing technique. I would prefer the smoothing technique as this approach takes into consideration all of the plot points. If there is any type of curvature in the line fitted through the plot points (which there will be since we are talking about non-normal distributions), then this method would extend a curved line out to estimate the .135 and 99.865 percentiles.
The “last-two-points” method will always extend a straight line out to the .135 and 99.865 percentiles (through any two points, there exists a straight line). Because we are dealing with non-normal distributions, we know a straight line is not the best way to extrapolate. Thus, the smoothing approach will produce better estimates.
One note of caution: be careful of using a purely automated approach. Its always best to look at the data for each process rather than relying solely on an automated computerized analysis.
0July 8, 2004 at 12:08 pm #103109
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Robert,
Thanks for the insight and the Hahn Shapiro reference.
John0July 8, 2004 at 10:02 am #103102
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.The Anderson Darling test is used in the following places:
SigmaXL > Statistical Tools > Descriptive Statistics
SigmaXL > Graphical Tools > Histograms & Descriptive Statistics
SigmaXL > Statistical Tools > 2 Sample Comparison Tests
0July 8, 2004 at 2:27 am #103082
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.SigmaXL uses the Anderson Darling test.
0July 6, 2004 at 1:51 pm #102989
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Thanks Robert.
In looking for an alternative for software automation, what’s your thought on the use of Johnson curves verus Box-Cox?
0July 5, 2004 at 12:59 pm #102924
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Hi Robert,
I’m pulling up this old thread, to ask you a question about your use of Bothe’s technique for handling non-normal data. I am planning to implement a similar approach in a software tool.
When the sample size is small (say < 100), do you extrapolate the curve on the nplot out to the 0.135 and 99.865 percentiles or do you simply take the min and max values? If you extrapolate, do you use the last two data points and draw the line, or do you use "smoothing"?
Thanks!
0May 26, 2004 at 10:30 pm #100804
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Folks, whether you like Stan or not is irrelevent. He is correct.
I have seen people use software that automatically recomputes control limits when new data is added (such as Minitab 14 – unless you enter historical mean and stdev – I think that I have persuaded the folks at Minitab to change the defaults on this).
Here is the simple problem you run into: if a slow long term drift occurs in the process – an assignable cause – the recalculation of the limits keeps adjusting, following along with the slow drift, making the detection very difficult.
The company doing this automatic recalculation did not see that they were out of control. By the time the problem was recognized, the required adjustment was severe and caused significant customer grief.
0May 8, 2004 at 4:47 pm #99960
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.James:
Your problem has nothing to do with positive & negative values. Very likely you have small within subgroup variability relative to between subgroup. If you are using Minitab, try the I-MR- R/S (Between/Within) chart. SigmaXL also supports this chart type.
If this does not resolve your problem, you likely have autocorrelated data. Use Minitab or Excel’s exponential smoothing and chart the residuals.0May 7, 2004 at 8:46 pm #99942
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Hi Jonathon,
You are correct. I too had lengthy conversations with Bill. Reliability was a key driver for him. Get the Cpk right in manufacturing, and this will translate to a significant reduction in latent defects.
Bill published (at least) two significant papers on Six Sigma: “Six-Sigma Design”, IEEE Spectrum, September 1993, p.43-47, and an earlier conference paper linking Six Sigma quality and reliability.
0March 22, 2004 at 6:03 pm #97198
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Statistical Process Control and Machine Capability in Three DimensionsPadgett, Marcus M.; Vaughn, Lawrence E.; Lester, Joseph; Quality Engineering, Vol. 7, No. 4, JUNE 1995, pp. 779-796
0February 25, 2004 at 3:01 pm #96037
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.The theory is somewhat complex, requiring the use of Markov Chains. See: Champ, C.W., and Woodall, W.H. (1987). “Exact Results for Shewhart Control Charts with Supplementary Runs Rules”, Technometrics, 29, 393-399. Much easier to simulate this problem! The bottom line is that these rules improve the sensitivity to small process changes (i.e. lower the beta risk) while maintaining a reasonable false alarm rate (alpha risk).Better performance can be achieved with the use of more advanced charts like CUSUM or EWMA but these are difficult to implement on the shop floor.
0February 10, 2004 at 4:21 pm #95274
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Hi Mark,
Thanks for the response. I personally have not created/analyzed I-optimal designs but there are some who believe they are superioir due to the fact that the predicted variance is being minimized. JMP saw fit to include this in their latest release. I don’t know if Minitab plans to do the same.
I was just curious. Thanks!0February 10, 2004 at 11:07 am #95259
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Hi Mark,
Are you planning to incorporate I-optimality soon?0February 9, 2004 at 1:07 pm #95220
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Hi Andy,
Thanks for the moto info. Have we met? I teach at MU.
I would like to see the sample charts. Could you please e-mail me your web-site URL: [email protected].
I could not see Mahalanobis used in any of the multivariate charts in Minitab V14. Is this embedded in the algorithm?
0February 8, 2004 at 4:13 pm #95195
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Minitab V14 now supports Multivariate control charts including Hotelling’s T-Squared and Multivariate EWMA.
These tools are great for analysis, but difficult to implement at a shop floor level. The challenge with Multivariate SPC is determining what action to take when you get an out-of-control signal. Looking at individual charts may not show any instability. For example, the combination of a few variables at, say, + 2.5 sigma would trigger an alarm.
The other challenge you run into is the Beta risk, when the number of variables being monitored gets large (>10).
Doug Montgomery’s book, “Introduction to Statistical Quality Control” is a great reseource for Multivariate SPC.
I would be very interested to know if anyone out there has successfully implemented shop floor multivariate SPC.0October 1, 2003 at 12:41 pm #90494
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Correction, KW is not appropriate. Use Friedman’s available in Minitab.
0October 1, 2003 at 12:29 pm #90493
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Kruskal-Wallis would work well for you. This tool is available in Minitab and SigmaXL software.
0July 6, 2003 at 10:24 pm #87695
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Tony,
The reason you get a difference between Excel’s calculation of StDev and Minitab’s Overall StDev (in process capability) is the use of unbiasing constant C4, ie S/C4.
0June 22, 2003 at 3:26 pm #87213
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.My brain is fuzzy this morning – need more coffee!
The within stdev’s retain their respective unbiasing constants. It’s the overall that causes the confusion withing Minitab.0June 22, 2003 at 2:56 pm #87212
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.In addition to pooling, Minitab defaults to using unbiasing constant C4. This makes it difficult to match hand calculations to Minitab.
By way of introduction to the topic, it is easier to teach short term calculated using Rbar/d2. I have my students select the Rbar option and deselect the unbiasing constant for purposes of comparing hand calculations to Minitab.
0June 2, 2003 at 10:21 pm #86598
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.I agree with your recommendation on Box-Meyer. The Schmidt/Launsby text also has Rules of Thumb for number of replications required per run, for a given Beta risk on the Standard Deviation and number of runs in the experiment.
0May 30, 2003 at 11:59 am #86504
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Hi Jonathon!
How’s the book coming along?
I wanted to mention that SigmaXL creates pivot charts for Excel ’97.
Also, I wanted to mention that SAS JMP has a similar powerful tool for Discrete X and Discret Y that they call Mosaic Plots.0April 8, 2003 at 3:00 pm #84643
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Hi Robert,Thanks for your post. I especially agree with your emphasis on asking why there is a significant difference in variances. This can lead one to a “golden nugget” X factor; the discovery of such is likely far more valuable than that originally searched for with unequal means.
0April 7, 2003 at 1:52 pm #84589
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Classical ANOVA assumes that the variances are equal.
Your choices in Minitab to the following:
1. Apply a variance stabilizing transformation like ln(Y).
2. Use a nonparamentric test like Kruskal-Wallis (note here you are testing medians not means).
3. Limit yourself to 2 samples and apply t-test for unequal variance.
If you have access to JMP, they have a procedure called Welch’s ANOVA which does not assume equal variance.0March 3, 2003 at 1:42 pm #83477
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Hi Chuck!
Good answer. I will use your example from here on!
By the way I did some interesting simulation studies on the difference between multiplicative (ie FMEA) vs additive. The rank results match closely at the high and low end but vary widely in the middle.
0February 14, 2003 at 10:24 am #82994
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.I concur with Robert, but would add that VIF score should be included (with a clear explanation linking score to degree of multicollinearity).
Any relation to Brad Jones?
0December 29, 2002 at 5:23 pm #81757
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.John,
Good answer. Just one thing to add – JMP 5 now supports PLS.
0November 22, 2002 at 1:11 pm #80963
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.It looks like you are having problems with rounding. The NORMSINV function is fine. Here is what I get with your numbers:
N=
1D=
4719O=
1352200000dpmo
3.48986836Sigma=
5.99450016N=
1D=
1413O=
418900000dpmo
3.37312008Sigma=
6.001737530November 7, 2002 at 7:29 pm #80407
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.The only assumptions for Mann-Whitney are: continuous response and both samples random and independent.
Since the test is based on ranks, non-normality and/or unequal variance does not affect the outcome. MW is the preferred test here over t, especially with small sample sizes.
Note that you are testing for unequal medians not means.0August 20, 2002 at 10:18 am #78243
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Correction!
Time for coffee….
If Cpk is estimated to be 1.5 the lower 95% confidence is 1.1 if n=30; if n=5 the lower limit is approximately 0.5!
0August 20, 2002 at 10:15 am #78242
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Cpk is a statistic. You are estimating a population capability. Therefore sample size is very important to obtain a reasonable margin of error or confidence interval.
With n=5, and a calculated Cpk=1.5 (i.e. “Six Sigma” quality), your approximate 95% lower confidence limit is Cpk=1.1.
I believe Minitab 14 will be coming out with confidence intervals for Cpk.0August 15, 2002 at 4:58 pm #78152
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Hi Ken,
I presented this rational to Ron Lawson at Motorola about 10 years ago. He was in agreement with this, but the primary driver of 1.5 sigma was the Bender paper and empirical/simulation modelling.0August 11, 2002 at 10:55 pm #78000
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.John,
Just curious, are you planning to get involved with the rollout of the new GB program at Mot? Send me an e-mail at [email protected]
I just saw the dates for the Lean Six Sigma session. I wish I could attend – I live in Toronto as well, but I will be in Schaumburg then.
0July 21, 2002 at 12:00 am #77416
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.There is no Six Sigma guide, however Eckes book ‘The Six Sigma Revolution’ is available in Palm Reader. There are a few basic statistics programs available (PDA Stats, ZEN Statistical Expert, and ProStats). See the Palm and Handango web sites for further info.
You might want to check with the people who publish pocket guides like Rath & Strong and GOAL/QPC to see if they have any plans to port to the Palm.0June 14, 2002 at 10:54 am #76413
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Your analysis looks correct, but in this case a simpler approach could be used – the two sample proportions test. The p-values work out to be the same, but it is easier to do.
In Minitab: Stat > Basic Statistics > 2 Proportions. Use summarized data. Recommended that under options you check the “used pooled estimate of p for test” option.0June 11, 2002 at 9:15 am #76304
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.A great reference text, published by the ASQ: Shapiro, Samuel S.,”How to Test Normality and Other Distributional Assumptions”, Revised Edition. Volume 3 of the How To Series.
Note that if you want to determine the p-values from A-squared you will have to use linear interplotation or non-linear regression on the values given in the tables. There is no convenient equivalent to Chi-Square.0June 3, 2002 at 11:47 am #76021
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.S is the standard deviation of the unexplained variation (ie residuals). Note that the degrees of freedom for this S term will be N – (# terms in model including constant). It can also be calculated using the SQRT of the Adj MS for Residual Error.
Re the lack of fit – you need replicates to get the pure error. Did you run only one center point?
0May 30, 2002 at 2:30 am #75921
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Mike,
Things have changed for the better since you left. Read the lead article in the May 2002 Six Sigma Forum “Motorola’s Next Generation Six Sigma” by Matt Barney.
This article reflects what MU is doing today.
0May 17, 2002 at 2:11 pm #75551
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.You could lower the individual alphas to obtain an overall alpha of .05. This has the advantage of simplicity but the disadvantage of increased beta. Alternatively you could apply Hotelling’s T-Squared to the test differences. (MANOVA will not work here).
0May 15, 2002 at 1:45 pm #75463
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.See http://www.excelstat.com for a helpful presentation on pareto and multi-vari.
0May 9, 2002 at 9:16 am #75292
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.1. Probability is not always intuitive.
2. Black Belts must understand probability to be effective.0May 3, 2002 at 3:08 pm #75124
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.“So going into the game you have a 1/3 chance of winning” Yes – and that does not change if you do not switch!
Now after the door is open, if you switch, you are taking advantage of the knowledge of the door opener. You are “buying into” the 2/3 probability associated with both of the other doors.
Previously an anonymous poster said that several PhDs were stumped on this one. Those were letters written to Marylin Vos Savant insisting that she was wrong when she stated that the correct probabilities were those posted by Ex-MBB and Calybos. Unfortunately the PhDs were wrong.0May 3, 2002 at 2:23 pm #75117
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Ex-MBB is correct. Switching allows you to take advantage of the person’s knowledge who is opening the door.
0May 1, 2002 at 2:42 pm #75031
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Michael Hammer, “Process Management and the Future of Six Sigma,” MIT Sloan Management Review, Winter 2002, Volume 4, No 2, pp 26-32.
Hal Plotkin, “Six Sigma: What It Is and How To Use It” Harvard Management Update Article, 6/1/99
0April 22, 2002 at 2:38 pm #74635
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.This topic has been covered before, but let me summarize. If you consider the use of add-ins, you really do get a lot of power (and statistical precision). Consider the following tools:
Staplus, which comes with the text “Data Analysis with Microsoft Excel” by Berk & Carey. This includes normal probability plots, multiple histograms, multiple scatterplots, and nonparametric stats.
ExcelStat, see http://www.excelstat.com. This includes Multiple Pareto and Multi-Vari.
SPC XL, by Air Academy Associates, http://www.airacad.com. Excellent SPC.
DOEKISS, DOEPRO, By Air Academy Associates, Excellent DOE tools.
Typically if one has Minitab or JMP, they would not use the Excel tools. If however you want to provide people with tools that are easy to learn, and less expensive, then the Excel add-ins are the way to go. For example, Motorola University uses Statplus and ExcelStat in their Green Belt training.
0April 15, 2002 at 3:56 pm #56158
John NogueraParticipant@John-NogueraInclude @John-Noguera in your post and this person will
be notified via email.Bonjour Yves,
Will you be doing the training in France?0 - AuthorPosts