Robert Butler

Forum Replies Created

Forum Replies Created

Viewing 100 posts - 1 through 100 (of 2,532 total)
  • Author
  • #254841

    Robert Butler

    Is there any way of getting this done? – Yes, start over because, based on your description of what you are doing, you are a very long way from being able to make any kind of decisions concerning employee performance.

    Let’s do a recap of your initial post.

    1. You said,”We convert the survey responses into continuous data (ex.: Liker scale 1 to 5, we could the 4/5 replies and divide by all surveys received to calculate the Customer SAT %).” I think what you meant to say was you have Likert scale data with ranges of 1-5 (presumably these are 1 = very dissatisfied, 2 = dissatisfied, 3 = neither satisfied or dissatisfied, 4 = satisfied, and 5 = very satisfied) and then you arbitrarily take the count of the 1,2,3 ratings and the 4,5 ratings, lump them into two groups, take the count of the 4 and 5 entries, divide this number by the total number of surveys received and call that an estimate of customer satisfaction.

    a. You are deliberately throwing away information – you have a 1-5 scale – use it. Lumping the data in this arbitrary manner does not make sense – for example, you are equating extremely dissatisfied with neither satisfied nor dissatisfied.
    b. The whole point of having a 1-5 Likert scale and examining the counts in each category is so you can detect changes or lack of changes in trending either as ratings improve, decline, or remain constant. Converting to binary removes this capability.

    2. You said,”Each employee receives different volumes of surveys. And here is where the plot tickens. Even if we extend the surveys received time to a full month, we will have, let’s say an average of 100 surveys received, however, the standard deviations are super high.” And later you said,”…simply because we compare results of employees with 100 surveys with employees that have 3.”
    a. In other words you have a really bad case of sampling bias. With those kind of differences in completed responses the fact of “super high” standard deviations is exactly what you should expect.
    b. The first question you should ask and resolve is why the vast differences in customer response?
    1. Are your employees really getting a random sample of customers? How do you know?
    2. Are your employees really getting a random sample of types of customer problems? How do you know?
    3. How are customer problems classified?
    a. If you don’t have some method for problem classification then you need to sit down with the employees and develop a meaningful way to quantify problem type.
    b. If you do have a way to classify problems then is there any correlation between problem type and survey completion?
    4. Assuming you have one of those 24/7 type contact setups what is the story with respect to day time contacts vs. night time and how are you taking this into account?
    a. different kind of customers day vs. night?
    b. different manning levels at your place day vs. night?
    c. different skill levels of day vs. night workers?
    d. etc.
    5. …and on and on.

    3. You said, “The current approach that I’m taking is the calculate the individual employee margin of error (sample of surveys vs closed cases for the period in the analysis), and if I purge the low margins of error I’ll get the population cut down by 30/50%.”
    a. To begin with – how can a ratio of completed surveys vs closed cases be viewed as a margin of error?
    b. As I understand this you are doing the following: Let’s say an employee gets 3 completed surveys but successfully closes out 100 problems – you toss this person from consideration. On the other hand, if an employee gets 50 completed surveys and successfully closes out 50 project you keep this person for consideration. If this is what you mean then the question is why are you computing ratios of completed surveys to closed out projects at all? Assuming random problem/problem difficulty and random client across employees the issue is just one of successful project closing.

    You didn’t specifically state this but it sounds like you look at a month’s data, run some calculations and then make a decision concerning employee performance – in other words you are judging people on the basis of a single monthly data point.
    1. You need to recognize each of your employees is a production line.
    2. Since they are production lines you need to look at their trends over time, construct a control chart for each individual and use the results of the analysis of that kind of data to make decisions concerning employee performance.

    If what I have posted is an acceptable summary of what you are doing then, as I said, you need to start over and the first place you need to start is taking the time to really understand your process which means, among other things, understanding why there are such huge differences in completed customer satisfaction forms.


    Robert Butler

    My guess would be your friend did something wrong in Minitab. If you don’t have subgroups then there is no within to use for calculations. Is there any chance the standard deviation it did compute was either just the sample standard deviation of all 60 points or the standard deviation of the mean of the 60 data points?


    Robert Butler

    A Google search will provide you with what you need.

    Here’s one I found for Cpk


    Robert Butler

    When you say you “have a lot of Attribute data for my analysis (Pass/Fail)” there isn’t much anyone can offer because that statement doesn’t tell anyone what you mean by “a lot.”

    However, if we make the following assumptions:
    1. You take grab samples of a given size from your production process and you inspect each item in the sample and record the count of pass/fail for each sample.
    2. You want to use that data to construct predictive equations for various properties of your product.
    3. You want to use that data for process control.

    Then the situation is this
    1. For each sample you will have a measure of percent defective.
    2. The percentages are variable data
    a. You can build predictive models with this data with the outcome being percent defective (or, if you want, the reverse – percent accepted).
    3. Given that you have data in the form of the very first point under assumptions then the np chart would be a good choice for process control.

    …and now the yeah/buts

    Given that all of the above is true the big question is just what is your pass/fail data? That is are you just looking at something and saying pass/fail or are you looking at various attributes of a given sample and making a pass/fail judgement on each of the different attributes? If it is the latter then you are into questions concerning what it is that you want to do. If it is just a question of in or out of control then an aggregate of the pass/fails will work but it won’t be of much value with respect to process improvement.

    Given the above assumptions are all true and given that you want to do more than just look at in or out of control you will need to do some reading concerning np charts. I would recommend borrowing through inter-library loan a copy of Wheeler and Chambers – Understanding Statistical Process Control and reading the chapter Understanding Attribute Data Effectively.


    Robert Butler

    Addendum – I played around with the data you posted and I must admit I don’t see how Minitab generated those estimates. The average of the non-missing data is -14.1 and the standard deviation of the non-missing data is 90.9 which means your 2 sigma limits would be (upper) = -14.1 +2*90.9 = 167.7 and the lower would be -195.9.

    If I substitute -14.1 (the average) for all of the missing values I get a mean of -14.1 – as one would expect, and the 2 sigma upper and lower are 79.3 and -107.5 – so I don’t see how you are getting your limits.

    The formula I’m using for the 2 Sigma levels is +/-2*sample standard deviation


    Robert Butler

    A missing value is a missing value – what you have done is construct an entirely different set of data with a lot of average values in place of the original missing values. Under these circumstances you should expect to get something different because the two data sets are not equivalent.

    I don’t have Minitab but my guess is that Minitab is using the values that are something other than missing and running the calculation using only the non-missing values.


    Robert Butler

    As written, your post doesn’t provide enough information concerning your objective. My guess is that you are looking for tester agreement when running tests on split samples but maybe not. If you are looking for agreement then that is not the same thing as correlation and you will need more than a simple check of correlation to determine agreement.

    Example: all notation (first tester result, second tester result)
    Correlation only: (1,10), (2,20), (3,30), etc.
    Agreement: (1,1.2), (2.2,2), (2.9,3.1), etc.

    If you can provide more details perhaps I or someone else can offer some advice.


    Robert Butler

    It looks like excel is just computing the standard deviation of the sample and then giving you the plus/minus 2 standard deviations of the sample whereas Minitab looks like it is computing USL and LSL spec limits


    Robert Butler

    @Darth – things are going well here – how about you and yours?


    Robert Butler

    Given what you have posted, I think you are faced with one of two scenarios.

    (In order to provide some clarity with respect to what follows let’s pretend the treatment has as part of its application changes in temperature and pressure. Given this then there are at least two different versions of “no treatment”.)

    1. The situation for “no treatment” actually incorporates fixed settings of temperature and pressure. Since treatment also encompasses changes in temperature and pressure you would need to include the temperature and pressure settings for “no treatment” in the matrix of temperature and pressure settings for treatment. The analysis would consist of two parts.
    a. Evaluate the effects of changes in temperature and pressure under the treatment condition.
    b. Assess the effects of changes between treatment and no treatment at the temperature and pressure settings associated with no treatment.

    2. The situation of “no treatment” does not involve the variables of temperature and pressure. In this case you would treat the “no treatment” situation as an external target and test the findings using treatment for different setting of temperature and pressure against that target.
    a. Since treatment also requires the addition of other variables to the mix you won’t be able to directly compare “no treatment” and “treatment”.


    Robert Butler

    Darth is correct. Here’s the relevant quote from a standard statistics text.

    Applied Regression Analysis 2nd Edition – Draper and Smith pages 22 and 23

    “[with regard to regression] Up to this point we have made no assumptions at all that involve probability distributions. A number of specified algebraic calculations have been made and that is all. We now make the basic assumptions that for a model of Y = fn(Xi +e) (I can’t write epsilon using this platform so I’m using “e” in its place)

    1. e is a random variable with mean zero and variance sigma**2.
    2. e(i) and e(j) are uncorrelated such that cov(e(i),e(j)) = 0
    3. e(i) is a normally distributed random variable, with mean 0 and variance sigma**2″

    There are no other assumptions/requirements.

    The need for approximate normality in the residuals is because it is the residuals that inform the correctness of the t and F tests used for assessing term significance. In order to address the issue of approximate normality you will need to run a residual analysis – and when you do please follow the guidelines for running a proper analysis (Chapter 3 of the above book has the details) this means assessing the residuals graphically – not just dumping them into some test for normality.


    Robert Butler

    As for the second link the following commentary from their site

    “Consider the two regression models, and their residuals plots, shown here:

    The (lower) plots show the residuals for each model (the residuals are the errors between the regression lines and the actual data points). It can be seen that:

    1) The residuals for the ‘good’ regression model are Normally distributed, and random.
    2) The residuals for the ‘bad’ regression model are non-Normal, and have a distinct, non-random pattern.

    Using this knowledge, the validity of a regression model can be assessed by looking at its residuals.”

    isn’t wrong but it is a very poor and misleading “explanation” of the why’s and wherefore’s of residual analysis.

    For starters the choice of the words, in quotes, “good” and “bad” is terrible. The issue isn’t one of “good” or “bad” nor is it one of “validity” – it is one of adequate fit to the data.

    The greatest failing of the text on that site is it doesn’t tell you anything about what non-random residual patterns tell you about the short-comings of your regression effort, nor does it explain how to use those patterns to further analyze your data to resolve those short-comings.


    Robert Butler

    Well, the best I can tell you is what I said in my first post – most of what you quoted is wrong.

    Specifically – to the points made on the first site:

    “There are four assumptions associated with a linear regression model:”

    “Linearity: The relationship between X and the mean of Y is linear. ”

    Not true – see the reference I gave in the first post.

    “Homoscedasticity: The variance of residual is the same for any value of X.”

    True – if you have a fit that accounts for all of the special cause variation – not true otherwise.
    Most importantly -it is not an assumption – it is a result of an adequate fit to the data.

    “Independence: Observations are independent of each other.”


    “Normality: For any fixed value of X, Y is normally distributed.”

    Not true – see the reference I gave in the first post.


    Robert Butler

    It occurred to me your phrase “Homoscedasticity: The variance of residual is the same for any value of X.” could be interpreted as a short verbal summary of the paragraph I wrote concerning what to look for when running the residual analysis. If this is the case then the statement is true but I think it is far too brief and could easily mislead people with respect to what one should do when assessing residuals.


    Robert Butler

    I’m afraid most of what you have stated is wrong.

    My reference is Applied Regression Analysis 2nd Edition – Draper and Smith

    1. There are no restrictions on the distributions for either the X or the Y. The question of normality (or approximate normality) is one that is restricted to just the residuals.

    The variance of the residuals is what it is an there are no caveats concerning that variance as a function of the X’s as far as a requirement for regression is concerned.

    The key points can be found on pages 22 and 23 of the cited reference. The short quoted version is this:

    “[with regard to regression] Up to this point we have made no assumptions at all that involve probability distributions. A number of specified algebraic calculations have been made and that is all. We now make the basic assumptions that for a model of Y = fn(Xi +e) (I can’t write epsilon using this platform so I’m using “e” in its place)

    1. e is a random variable with mean zero and variance sigma**2.
    2. e(i) and e(j) are uncorrelated such that cov(e(i),e(j)) = 0
    3. e(i) is a normally distributed random variable, with mean 0 and variance sigma**2″

    There are no other assumptions/requirements.

    I don’t know what you mean by “Normality of residuals tells us if the regression model is strong.”

    When plotted on normal probability paper if the residual patterns are not “acceptably” normal (-passes the fat pencil test) or if a histogram of the residuals indicates bimodal/log normal, or if a plot of the residuals against the predicted results in patterns with significant linear or curvilinear trends or have < or > shapes, then the residuals are telling you there are still things you need to address before accepting a model for a test. Chapter 3 of the same reference covers most of this territory (there are other shapes for plots of residuals against predicted but those mentioned are the ones most often encountered).

    As for non-linear – that is models that are non-linear in the parameters (not models that just happen to have higher orders of the X’s – these are still linear regression models) – the same rules apply.

    The need for acceptable normality in the residuals is because the t and the F tests are the means used to check for term significance.


    Robert Butler

    I guess my first question would be – where did you get a KPI of 120 books/hour? That amounts to a minimum of 1 book every 30 seconds.

    Given your description “The rate is picking 120 units (books) per hour. They use carts, and are guided through the warehouse using tablets. What are some barriers that I should be looking for?” My personal guess would be the KPI came from wishful thinking… one book every 30 seconds and you are pushing(?) driving(?) carts through a warehouse guided by tablets trying to meet this requirement???!!!

    If we assume you have prior data – it should be hourly by operator since that is the unit of your KPI – the first thing you should do is plot the data. A good start would be boxplots (make sure you can include the raw data in the plot) for individuals for some period of time. Overlay the target (a target with, apparently, no tolerances by the way – not good) and see how the boxplots for each of the operators stack up (sorry about that).

    Next I’d get out there on the warehouse floor and really understand what the operators are facing – one book every 30 seconds…
    1. These books are distributed around the warehouse how?
    2. These books are all at the same level or are in tiers of racks running floor to ceiling?
    3. How does the picker pick the books?
    a. easy reach with hands?
    b. need for some kind of mechanical hand to reach books beyond human physical reach?
    c. If mechanical – how easy to use?
    d. if mechanical -availability of the mechanical picker – can anyone get one of these at any instant or do they have to go searching for them?
    4. “Guided through the warehouse” – and the warehouse floor plan is laid out how?
    a. rabbits warren – lots of dead ends, random blockage of passage due to various circumstances
    b. if isles of some sort – same questions…oh yes, can the carts move down the isle with ease? If two people are on the same isle can they pass with ease?

    …and so on and so forth –

    Do NOT just take some supervisor or some group of supervisors word on how thing go. These people mean well but the day-to-day gathering of books are not their field of expertise – you have to talk to the people doing the work and you have to spend enough time on the floor asking questions and observing before you try to do anything else.


    Robert Butler

    No, the variability identified by your SPC chart is the ordinary variation of your process. If you try to use control limits that have nothing to do with your process all you will do is add to the process variation and make everything worse – this is known as over control.

    Usually a reference material is some kind of gold standard and you use it to give yourself some sense of how your process is operating relative to the reference. If your ordinary process variation is greater than the gold standard and if the object of your effort is to have a process whose ordinary variation is equal to or less than the gold standard you will need to study your process to identify sources of variation that, when removed, will reduce the accepted ordinary process variation.


    Robert Butler

    Because, the contrast (or contrasts) that make the AxB interaction significant may not be the set of contrasts tested by just focusing on comparisons with a control.

    As for your recommendation – if your design matrix is such that the 0 setting for either A or B really means a complete absence of either A or B and not a case of A or B being at some non-zero level lower than whatever corresponds to A or B at the setting of 1 then the recommendation would be to run the combination of A1Bo which is better than A0B0 which corresponds to saying running the process with A present and B not present will be better than running the process with A and B absent – in other words running with A will be better.

    On the other hand, if the matrix designation of 0 for the low level means something other than complete absence then the optimum would be running the process with A at its maximum level with B set at its minimum.


    Robert Butler

    No, the two methods do not show different results – the issue is you are not making the proper comparisons.

    In the regression you found the AxB interaction to be significant.

    In the test for group differences you chose to run a comparison against the control – this isn’t what you want to do.

    There are 4 combinations that comprise the AxB interaction – A0B0, A1B0, A0B1 and A1B1.  Therefore, when testing the AxB interaction the comparisons you want to make are A0B0 vs (A1B0, A0B1, and A1B1), A1B0 vs (A0B1 and A1B1) and A0B1 vs A1B1.  When you do this you will get the following:


    My guess is if you run your analysis using a Tukey-Kramer adjustment you will probably see the A0B0 vs A1B0 and the A1B0 vs A1B1 contrasts as statistically significant.  If this isn’t the case and all that remains significant after adjustment is the A0B0 vs A1B0 comparison then that tells you the term significance is being driven by a single comparison. The situation where just one contrast is the driver for the significance of the AxB interaction is quite common.


    Robert Butler

    I was looking at your data set again this evening and it occurred to me if you look at the data for count and count2 there is major difference between the two columns.  If count and count2 represent the results of a design and a full replication (I doubt this since the total counts are the same for both count and count2 but bear with me) of the 2 level two variable factorial design then one could put in a third variable for replicate where -1 is the “level” corresponding to the initial run and 1 is the level for the “replicate”.  This would turn the entire experiment into a 3 variable 2 level design with a single response of count which, for the purpose of the analysis, is expressed as percent success.

    Thus the design matrix would be


    If you run an analysis on this matrix where you include the terms A, B, AxB and Norm_Rep where Norm_Rep = replicate “level” then the full model is:


    Thus the variable Norm_Rep accounts for the shift between the original and replicate and when this is done the AxB interaction becomes significant.

    All of the above is based on a lot of assumptions about the meaning of count and count2 and may be nothing more than playing games with numbers but I thought it was worth mentioning.


    Robert Butler

    To your first modification of the 2**2 factorial design.  What you said you were doing and what you did are two entirely different things.  If you are going to set D = (0,0) and C = (1,1)  (my error – C and D should be reversed but it doesn’t make a difference) where the combinations are for A and B for those particular experiments then the actual design is:


    What you have done is this:


    This is just a standard matrix description of a one-variable-at-a-time analysis.  The fact that only the main effects are significant is not surprising – this design cannot check for interactions – indeed, this design can’t check for the 4 factors because you have 0 degrees of freedom for an error term.

    Of bigger concern is you choice of regression analysis.  You said, “we can analyze [this data] using Nominal regression/logistic regression.”  In order for this to be true you would have had to run each one of those 4 experimental combinations in excess of 7000 times each because one of the key requirements of ordinary logistic regression is independence of measurements.

    My guess is you ran the experiment once (or maybe twice given that you have a count and a count2 column) and then sampled each one of the runs in excess of 7000 times and determined success/failure on that basis.  This kind of data is repeated measures data and you will have to use repeated measures logistic regression methods for the analysis.

    If you don’t have repeated measures capability then the only way around this is to compute percent success – if you do this with just count or count 2 then you cannot check for AXB because you only have 4 experiments.  If count and count2 are the results of a run and an actual replicate then you can put the two together – when you do – nothing is significant.  Given the percent values this isn’t surprising.


    Robert Butler

    @[email protected] – One of my pet gripes is the fact that when the folks were developing their terminology for six sigma the took the word “sigma” and applied it to the calculation indicated in my first post.  That formula is the formula for the sigma level which has nothing to do with the sigma of standard deviation fame.  Unfortunately, a number of programs and a lot of verbiage, both posted and published in the professional journals, insist on shortening the term “sigma level” to just “sigma.”

    With regard to this confusion – on the less than worrisome end are posts like our exchange – what can really get interesting is when someone makes a statement about sigma being negative (what they mean is the sigma level is negative) and when someone else, who knows that sigma (as in standard deviation) is the square root of the variance, jumps in and calls the first poster an idiot for trying to claim a negative value for a square root….and so it goes


    Robert Butler

    You might want to check the news letter option for this site.  As I understand it that is what the newsletter does.  Try contacting the website for additional details.


    Robert Butler

    …one additional thought.  With measurements in parts per billion/trillion you are sure to run up against round-off error in whatever analysis program you are using.  This will be true even if you have (as most programs do today) double precision.  I would recommend you express the measures in scientific notation, drop the 10’s power and run the analysis on what is left.  At the end you can convert everything back to ppb or ppt.


    Robert Butler

    Cpk = minimum (USL-mean, mean – LSL)/(3*std)

    Therefore, in order for Cpk to be negative your mean will have to either be less than the LSL or greater than the USL.


    Sigma = minimum (USL-mean, mean – LSL)/std

    Therefore, if the mean is greater than USL then (USL – mean) will be the smaller value and it will be negative which means Sigma will be negative.

    If the mean is less than the LSL then the minimum value will be (mean – LSL) which will also be negative which means your Sigma will be negative once again.

    Thus, the results that give you a negative Cpk cannot result in a positive Sigma


    Robert Butler

    The short answer to your question is – your concern is really of no concern.

    1. From Agresti Categorical Data Analysis 2nd Edition page 3

    ” Variables are classified as continuous or discrete, according to the number of values they can take. Actual measurements of all variables occurs in a discrete manner, due to precision limitations in measuring instruments. The continuous-discrete classification, in practice, distinguishes between variables that take lots of values and variables that take few values. For instance, statisticians often treat discrete interval variables having a large number of values (such as test scores) as continuous, using them in methods for continuous responses.”

    …so, go ahead and treat your measures as continuous.

    2. Standard calculations for capability do require the data to be approximately normally distributed.  For those cases where this is not the case one needs to use the methods for calculating capability when the data is not normal. Chapter 8 “Measuring Capability for Non-Normal Variable Data” in Bothe’s book Measuring Process Capability has the details.


    Robert Butler

    It sounds like you are taking some kind of exam or course or something and trying to match whatever is offered on a multiple choice problem – this is fine but there is something you should keep in mind when you are faced with questions like this in the real world and that is the issue of significant digits.

    It is possible to get measurements out to the 100,000th place, or almost any other place for that matter, but when it comes to calculations the rule is the precision of your final result can be no more precise than the precision of the least precise term used in the calculation.  In this case (if we assume 330 mm has actually been measured to 330.0 mm) then when we combine this information with  10.3 this means the final value for sigma should be 1.3 – anything else is just empty precision.


    Robert Butler

    Obviously the problem is assuming a normal distribution of the data.

    The basic formula for Cp in this case is

    Cp = (USL-LSL/(6*std)

    So, LSL = 330, USL = 330+10.3 and you want a minimum of 1.33 for Cp – therefore plug in the numbers in the above equation, re-arrange and solve for std.


    Robert Butler

    I agree with @Straydog – the issue is that of statistics.  If we assume you have a mathematical background that includes algebra and if, for whatever reason,you can’t take a course in basic statistics then I would recommend working your way through the following books:

    1. A Cartoon Guide to Statistics – Gonick and Smith – I’ve recommended this book to a number of my engineers over the years.  It is a well written book and it does a good job of highlighting and explaining many of the basic statistical concepts.

    Given that you have an understanding of algebra I would recommend you learn the basic principles of least-squares regression.  The best short description I know can be found on pages 8-30 of Applied Regression Analysis, 2nd Edition – Draper and Smith.  If you can borrow this book through inter-library loan you could copy these pages for future reference.

    Once you understand the basic idea of simple regression I would recommend you get a copy of

    2. Regression Analysis by Example – Chatterjee and Price

    and work your way through that book.

    The third book I would recommend would be

    3. Statistical Methods – Snedecor and Cochran – whatever the latest edition might be

    This book is statistical boilerplate – it is written in the form of if-you-want-to-do-this-then-you-will-have-to-run-that.

    You don’t have to read all of #3 but it is a good basic how-to book for a lot of statistical techniques. It takes the time to give the reader an understanding of what the methods do and why you might want to use them.


    Robert Butler

    Things are fine on my end @cseider – hope all is well with you too.  As for the homework – it’s pretty quiet around these parts and I was feeling generous…    :-)


    Robert Butler

    It’s just the sums of the squares of the deviation from the mean divided by the total count minus 1.

    So, for a single item the average is  (10 + 0)/2 = 5

    The deviations from the mean are

    10 – 5 = 5

    0-5 = -5

    The squares of those deviations are

    5^2 = 25

    (-5)^2 = 25

    there are two observations so  2 – 1 = 1

    Therefore the variance per item is (25 + 25)/1 = 50  and the standard deviation is the square root of 50 = 7.07


    If this were a real situation, my first reaction would be: why bother with any calculations – better to find out why two different individuals are coming up with ratings that are the exact opposite of one another. As presented, if the two raters are the system, then there is no calibration whatsoever.


    Robert Butler

    I’m not sure where we’re going with this discussion.  The focus of your initial post was that of coil rejection due to defects.  In the follow up discussion I was left with the impression defect reduction and thus increased acceptance of coils was your main concern.  I was also left with the impression that one factor known be a contributor to defects was improper degreasing.  It was for this reason I suggested looking at defect types and using things like Pareto charts to help and perhaps guide the thinking with respect to identifying defect types and the production practices connected to them.

    Your latest post gives the impression this isn’t your goal.  It seems like you are just concerned about your colleagues wanting a defect rating scale while you just want to run a simple defect yes/no count.  Under these circumstances the rating scale amounts to nothing more than colorful jargon and has no value because defect “level” is irrelevant. A defect is a defect and if there is a defect the product will be rejected.


    Robert Butler

    I’ve been giving your problem some thought and I can’t think of any single statistic that will capture consistency both with respect to stability of system variability and constancy of trending.  The issue you are facing is that of an evolution of both variation and trend across time. As far as I know, if time is involved you will have to take it into account which means you will have to do more than examine simple population statistics.  All of the methods I know for tracking consistency of variation and trend across time require graphs and regression/time series analysis.


    Robert Butler

    Based on your post there not only does not seem to be any need for a 1-10 rating but, it doesn’t sound like your colleague has any idea of how to build a meaningful 1-10 rating scale that would be of any value to you.

    What your post does suggest is you have some understanding of why coils are rejected.  For example, you mentioned dark surface on wire is cause for rejection….and what else?

    Assuming you are interested in reducing rejects then the first order of business is to quantify the reasons for rejection over some predetermined period of time and bean count their frequency of occurrence.  Once you have this information you can analyze it and see what you see.

    For example:

    1. Pareto chart the results to identify the most frequent reasons for rejection.  If all reasons for rejection are created equal then you could use that chart as a driver for examination of the process to find the cause(s) of the top two or three reasons for rejection.

    2. Look at the frequency count of reasons for rejection over time – do any of them trend over time – daily, weekly, monthly, quarterly, etc.)  (I once ran an analysis of manufactured component failures and found a huge AM to PM delta – the reason – the plant wasn’t air conditioned and the temperature delta between morning and afternoon resulted in far more failures for PM production because part of the component production process turned out to be very temperature sensitive).

    3. Check your raw material receiving – any trending in rejection reasons that appear to coincide with raw material lot changes/supplier changes, etc.

    …and so on.

    So, short answer – if your post is a reasonable representation of your situation then, at the moment, I don’t see any value in rating scale.

    Changing subjects:

    You said, “I know that there are plenty of tools to use on continuous data and it is easier to work with continuous data. It also follows normal distribution. And one does not need more data points for the analysis of continuous data compared to attribute data.”

    1. There are plenty of tools to use to analyze attribute data too.

    2. Continuous data will follow whatever the underlying distribution of that data happens to be – it could be normal but it could also be a myriad of other things. Continuity of data does not guarantee normality.

    3. Data quantity can be an issue with respect to analyzing continuous vs attribute data but which will need more data will depend on what it is that you are trying to do and the kind of data you have.


    Robert Butler

    The question you are asking is one of a comparison of two proportions.  Specifically you want to know if the difference between 100% and 82.5% is statistically significant for a sample of 315.

    The key is to recognize that the proportions are 315/315 and 260/315.  In other words, with a baseline of 315 in both cases we have a situation where there are 315 presentations and 315 successes and in the second case we have 260 presentations which means there are 260 successes and 55 “failures”.

    As a result, you have a 2×2 table with 0 failures and 315 successes in the first case and 55 failures and 260 successes in the second case.  If you set this up and run the analysis you will find you have a p-value of < .0001 – so, there is a statistically significant difference between 100% and 82.5% for a base of 315 samples.


    Robert Butler

    The short answer to you second post is: there’s nothing like that out there.

    Let’s take another look at what you are asking:

    You said: “I was looking for some current affairs metric that can tell me if something like the historical average growth rate will continue or if a certain ratio will stay stable in the future.”

    If a measurement such as you are requesting existed, the folks who buy and sell stocks would all be multi-billionaires because they would know with certainty which stock was going to do well forever and conversely. This is the reason you will find the following line in every stock offering prospectus:

    “Past performance is no guarantee of future results.”

    The best you can do is analyze data with an eye to identifying process characteristics that will adversely impact your process average growth/stability and, once you have identified those variables, put in place process elements that will minimize/eliminate their adverse impact.  Such an approach should give you some measure of growth rate maintenance/ratio stability for some period of time but not for all time. This is because there is no guarantee those variables you have identified and are controlling are the only variables (both now and in the future) which have a significant impact on your process.



    Robert Butler

    I might be missing something but, as written, your post appears to be expressing two different requirements.  You say you want to predict variation, specifically predict variation trending, however, the examples you give have nothing to do with prediction.  Both both CD and CV are strictly summations of current affairs – they don’t have any predictive capabilities.

    For prediction you would need some kind of equation which would relate your process inputs to variation output.  It is possible to build such an item but the effort/cost involved would be substantial.   What you can do is run an experimental design on your process and then analyze the results using the Box-Meyer’s method to identify those variables that are significantly impacting your variation.  Armed with the results of that analysis you could construct a predictive equation for variation change.


    Robert Butler

    ….an additional thought.  I was going over the case study and looking at the data set for Case #2 and it occurred to me that, if you want to, you can use that data set as a starting point for what would amount to an excellent self teaching exercise.

    In the attachment I’ve rearranged the data to make the composite nature of the design more apparent.  The rows in yellow are the center point replicates and the rows in light blue/gray are the star points.

    From my earlier post you know what the expressions for the reduced models look like and you know how to go about generating the full models.  So now you are in a position to run a lot of what-if scenarios.  For example:

    1. See what happens to the model coefficients when you reduce the number of center points to just 2.

    2. See what happens to the model form when you fractionate the full 2 level factorial design.

    3. See what happens when you randomly drop one or more of the factorial experiments (this happens all the time in real life – one or more of the experiments won’t run, can’t be run, was not run at the levels indicated, etc.) and see what impact that has on the final model, on the coefficients of the remaining model terms, and how severe the loss has to be (how many horses have to die) before backward elimination and stepwise regression converge to different models.

    4. In every one of the above situations you would want to run a complete regression analysis which would mean, among other things, a thorough graphical assessment of the residuals – I would recommend you borrow Chatterjee and Price’s book Regression Analysis by Example to guide you with respect to the world of regression analysis.

    If you try these things and do some reading in the recommended book you should be able to gain a good understanding of DOE,

    Switching subjects.

    In the presentation you provided the author indicated the minimum and maximum values for the factorial part of the design were going to be coded as -1 and 1.  In real life it is rare to have a situation where, for each experiment in the design, you run at exactly the stated minimum and maximum.  What this could mean is the following:  it may be the situation that the data for Case 2 consists of the measurements of the output of each experiment but only records the ideal settings of the X variables in the design.  If this is the case then that would be another reason for my inability to exactly match the author’s predicted settings for minimum and maximum response output.

    1. Goodrich1.xlsx
      You must be signed in to download files.

    Robert Butler

    The way you would determine the minimum and maximum settings shown on page 28 would be to take the regression models for collapse and burst pressure and put them in an optimizing program and run them.  I don’t know Minitab but I do know it has such an option.

    The problem is the author has not provided the models he generated using the data.  Since he did provide the raw data for Case 2 I’m assuming he expects you to use that data to generate models for collapse and burst pressure (assuming you wish to check the validity of his slide)….and thereby hangs a tale….

    Since you have asked and since you have indicated in your post you have been trying to confirm everything in his presentation, I went ahead and made up an Excel file of the data on the last slide and had a go at model building.

    A test of the data matrix using VIF’s (Variance Inflation Factors) and condition indices indicates the design will support a model consisting of all main effects, all curvilinear effects, and all two way interactions.

    Given the differences in the magnitudes of the three variables of interest it is best if, before you start running regressions, you take the time to scale pressure, weld time, and amplitude to the ranges of -1 to 1.  The reason you want to do this is because significant parameter selection can be influenced by parameter magnitude and all you really want is for parameter selection to be influenced only by the degree of correlation.***

    To this end you would compute the following for each of the X variables

    A = (Max X + Min X)/2

    B = (Max X – Min X)/2

    Scaled X = (X – A)/B

    In order to identify the reduced model (the regression model containing only significant terms  – in this case terms with P < .05) you will want to run both backward elimination and stepwise (forward selection with replacement) regression on the design/response matrix.  The reason you want to do this is because you want to make sure both methods converge to the same reduced model.

    Once you have the reduced model in terms of the scaled X’s you will  take these terms and re-run the reduced model using the raw X’s so your final model will have coefficients which are associated with actual X values.  If you do all of this what you get are the following models:

    Predicted Collapse = pcollapse = -0.01985 + 0.00011785*Amplitude + 0.00138*Pressure + 0.05207*Weld_T -0.00001279*Pressure*Pressure

    Predicted Burst = pburstp = -1715.33689 + 10.20952*Amplitude + 69.08662*Pressure + 3832.92683*Weld_T -0.96918*Pressure*Pressure

    The problem is, when you plug in the optimal values for Amplitude, Pressure, and Weld Time, as indicated in the presentation, you do not get the indicated predicted values for Collapse and Burst Pressure.

    As  check I built the full model just for burst pressure

    fmburstp = 1718.56889 -20.45806*Amplitude + 21.56008*Pressure -10670*Weld_T + 0.18066*amplitude*amplitude

    -0.92215*Pressure*Pressure + 18780*weld_t*weld_t + 0.06500*amplitude*pressure +  15.00000*amplitude*weld_t

    + 162.50000*pressure*weld_t

    The predicted results for the full model for burst pressure are quite close to the values the author reports but they are not exact.

    (pressure, weld time, amplitude) = (18, 0.26, 72) which gives (pcollapse, pburstp, fmburstp) = (0.022869, 945.85, 892.30)

    (pressure, weld time, amplitude) = (22, 0.29, 78) which gives (pcollapse, pburstp, fmburstp) = (0.028612, 1243.38, 1222.66)

    My guess is the author did one or both of the following:

    1. He built the full model and did not run the proper regression analysis to identify the reduced models.

    2. He rounded off the values for the optimum pressure, weld time, and amplitude when he wrote the report.

    If he did use the full model then his analysis is wrong. The whole point of a design is to identify the significant parameters and to use the resulting model (once it is verified) for purposes of prediction and control.

    If he rounded off the predicted optimum values then the best predictions you can generate with those values will not give you the Y values shown in his presentation.

    *** If, in this instance, we run backward elimination and stepwise regression on the raw X values the two methods do not converge to the same model.

    • This reply was modified 1 year ago by Robert Butler. Reason: typo

    Robert Butler

    I might be missing something but my understanding of the scenario you described is as follows:

    a. Quality team takes a 10% sample of the total output.  With a total output of 1000 this becomes a sample of 100.

    b. Quality team does a 100% inspection of the 100 samples and finds 10 defects in that sample.  Therefore, their claim, based on the examination of the 100 samples, is the  quality score is 10/100 = 90%.  This value, as I understand your post, is the SLA.

    The second crew takes the same batch of 100 samples, pulls 10 samples from this group and finds 1 defect.  From your post I understand this to be the ATA.

    1. The ATA defect cannot be viewed as an additional defect to add to the 10 found during the 100% check (the SLA) because the first group did a 100% inspection and found a grand total of 10 defects. Therefore all the second group did was find one of the original defects in their 10 sample sub-sample of the 100 and confirm, via the smaller sample, what was found with the complete inspection of the larger sample.

    What I don’t follow is the reasoning concerning the way the ATA sub-sample is taken and assessed.  After all, there is a chance of pulling 10 samples from the original 100 and finding no defects and there is also the chance of pulling 10 samples and finding you have drawn the original 10 defects.


    Robert Butler

    If you are going to re-sample the internal audit sample then the short answer is the sample size is no longer 100 it is 110 and 11/110 = 90%. Another way to look at this is you have a sample of 100 – you find 10 defects in that sample.  You dump everything back in the bin and run a re-sample of the same population taking 10 samples and finding 1 defect. Given what you found with 100 samples this is not surprising and your re-sampling is just a short hand confirmation of what you found with the larger sample.

    If the people you are talking to insist on only adding an additional found defect to the original defect count and do not take into account the fact of all of the sample in the re-sample then why stop there?  Let’s see, I take the 100 samples and I re-test the 100 samples 10 times. The chances are I’ll get 10 defects each time I do this -therefore, at the end of 10 sample/re-sample cycles I’ll probably have 100 defects which when I add them together will give me 100/100 = 0% accuracy.



    Robert Butler


    This sounds like a homework problem but I’m done with my remote work and I’m not really ready to start back in on my latest book read so…

    7 = Xbar1 = the average over 153 days.


    7 x 153 = the sum of the individual measures for the 153 day period (remember: the sum of individual measures/153 = average = 7)

    12 = the desired average over 153 + 94 days

    Therefore the sum of individual measures for the remaining 94 days = Xbar2*94

    and the grand average which is 12 is expressed as:

    [(7 x 153) + (Xbar2 x 94)]/(153+94) = 12

    Solving for Xbar2 we get 20.1


    Robert Butler

    To do things like this you really need to spend the money, purchase some good books on experimental design, and read them and keep them close by as a reference.

    The two I would recommend are:

    1. Understanding Industrial Designed Experiments by Schmidt and Launsby

    2. The Design and Analysis of Industrial Experiments by Davies

    To your point – all composite designs have as their core construct a two level factorial design. This factorial design can be fractionated, as you have done, so the only remaining questions are:

    1. How many center points?

    2. What is the level of the alpha of the star points?

    From the books we have the following:

    To retain orthogonality the number of center points = Nc = 4*(sqrt(Nf) +1) -2k

    Where Nc = number of center points,

    Nf = number of experiments in the factorial = 16 – since you are running a fractional factorial

    k = number of factors in the design = 5

    Therefore Nc = 4*(sqrt(16) +1) – 2*5 = 10

    The level of alpha for the star points of a rotatable composite is

    alpha = (Nf)**(1/4) = (16)**(1/4) = 2.

    Thus, the star points are located at +-2 for each of the factors which means you will have 10 additional runs where the minimum and maximum alphas are two times the minimum/maximum levels for each of the factors in the fractional factorial.

    So, if you want a perfectly orthogonal rotatable composite design with a 2**(5-1) fractional factorial core you will need 16 +10 + 10 design points.  That is a lot of experimentation and if you are pressed for time and/or the cost per experiment is high it may be far more than you can run.

    A possible alternative:

    1. If your situation is that of just starting out then instead of trying to do everything at once (all mains, all two ways, all curvilinear) a better approach from the standpoint of time/money/effort would be to run a near saturated design (5 factors in 8 experiments) with two center points for replication and analyze the results of this design and see what you see.

    This approach buys you a couple of things:

    a. First – it gives you a check to see if there is any point in investigating all 5 factors and if there is any need to spend time looking for curvilinear behavior.  If you run this smaller design and find a) all five factors matter and/or b) there is serious curvilinear behavior present then, you can augment your basic design with additional points and fill out the full composite design.

    b. It provides a reality check with respect to minimum and maximum settings for the design space.  I could bore you senseless with story after story about how everyone KNEW (I mean they REALLY KNEW) we could run the process at the specified minimum and maximum levels for all of the variables of interest in the design. Unfortunately, when we went to do this, Physical Reality raised its ugly head and said, “No way – go back and try again.”  In short, running the smaller design is great way to make sure you can really investigate the things that are of interest.  If the smaller design works with the levels of interest, all well and good and if it doesn’t you haven’t wasted valuable time/money/effort. Also if things don’t work, you can regroup, use what you have learned, and build a design with new minima and maxima for each variable that will permit an investigation.

    2. Perfect orthogonality is a nice idea but, in practice, even with the number of runs listed above, you won’t achieve it for the simple reason that it is highly unlikely you will run each and every design point at the exact specified minimum or maximum settings for every experimental condition.  In those situations where I have actually run a composite design I’ve cut the number of center points to 3-4.  The design is a touch non-orthogonal but the degree of non-orthogonality is such that it doesn’t have any major impact on the results of the analysis.





    Robert Butler

    If your question is how do you get the percentages at the bottom of the page the answer is you need information that is not present on the spreadsheet you provided. In order to get the percentages at the bottom you need to know how the efforts are apportioned to each of the consultants. The best you can do with what you have is compute total utilization percent for each customer with all of the consultants lumped together…and even that calculation has issues since the footnote is far too cryptic and also does not provide sufficient information.


    1. the sum of the efforts for customer A is 402.

    2. According to the footnote   Utilization = sum of all efforts against L1 or L2 consultant/available effort per shift (480)

    ….what exactly does this mean?

    If we assume the above means:

    Utilization = sum of all efforts for all consultants/available effort per shift (480)

    Then – for customer A you have 402/480 x 100 = 83.75% which is somehow divided among 4 consultants in such a way that the total consultant utilization sums to 92% and that the 42% is split among 3 consultants while a single consultant accounts for 50% …and then there is the question concerning the remaining 8% of their collective time.

    3. But….

    a. 480 minutes = 8 hours

    b. If I have 4 consultants for customer A this would mean I actually have 1920 minutes for that shift.

    Therefore – for customer A you have 402/1920 x 100 = 20.93 % which is somehow divide among 4 consultants such that the split on the 20.93% is 42% and 50%.

    So, the short answer to your question is this – based on what you have provided you cannot answer the question you asked.



    Robert Butler

    @cseider – well, welcome back…question – your elapsed time for your most recent group of responses is 62 minutes, 57 minutes, 14 minutes, 8 minutes, and 7 minutes….what happened between 57 and 14?  Lunch break?  :-)

    Take Care

    P.S. where I work the IT department is really on the ball as far as fighting Covid is concerned.  This morning I logged in to work and shortly after I signed on I happened to sneeze. A split second later the computer started running an anti-virus program.


    Robert Butler

    I’m not sure I understand what you mean by “statistical tools to create a baseline.”  Baseline data is just that – data gathered from a process before you attempt to make changes.  As for the data gathering itself – that is an entirely different matter.

    1. How often – I’ll take everything I can get with as much detail as I can get. This means if it is hourly – I’ll take hourly, if the best anyone can do is daily – I’ll take that.  The thing to remember is this – you can always take whatever data you may have and make it coarser but you can’t do the reverse.  In other words, it is easy to convert hourly into daily or weekly or whatever but you can’t go the other way. To put it another way – if you are given continuous data in any form – keep it – do not record the data as binned data – after you have entered the data you can have the computer bin the data any way you want.

    2. How far back in time – that will depend on a number of things and you will have to get the answers to those questions before you start gathering the data.

    a. Are the measurements for whatever it is that you are measuring the same across time?  If not then there will be a time horizon beyond which it won’t make much sense to gather additional information. For example, for the last two years you have been taking measurements in millimeters/second, however, prior to that you were taking measurements in furlongs/fortnight. In this case your data horizon is two years.

    b. Have there been any major changes to the way the process was run? Changes in measurement methods, switches in sources of major raw material suppliers, major changes/upgrades in machinery used in production, major changes in operation protocol, etc.

    For example, up until last year we ran the process by looking at the settings of three gauges twice a day and made adjustments based on those readings. A year ago we went to hourly readings and adjustments and six months ago we went to automated continuous feedback.  The time horizon is now 6 months.

    Consequently, If you have a situation where both a and b are true then there’s no point in going back any further than 6 months.

    There are other issues but, in my experience, these are the big ones.


    Robert Butler

    In summary:

    1.    [The defects are] not deep enough emboss thickness, not enough stretching, or telescoping or collapsing rolls (and what is telescoping and collapsing – are these other ways of expressing not enough stretching or are they something else?

    2.    We know that a majority of these defects are caused by incorrect tension during the winding portion of the embossing process.

    3.    One thought I had is to measure tension, unwind and rewind motor outputs, and some other machine outputs (temperatures and pressures)… to determine which machine output hs the greatest effect on the stretching of the film


    1.    How do you know what constitutes a majority of the defects?  Have you done a bean count of defect types and summarized your findings with a pareto chart or some other method to make sure you know which defect occurs the most often?

    a. are all defects created equal – that is regardless of the defect the cost in terms of lost time/product/etc. is the same or is there an order to defect severity?

    2.    How do you know this majority is connected to incorrect tension?

    a.    Do you have required settings for tension?

    b.    If so how were these settings determined?

    c.     How well are they controlled?

    3.  Do you have prior production data where you have simultaneous measures of tension and defect type and counts?

    4.  For the third summary point – are the listed variables the ones you wanted to look at in your design or are they something else?

    5.  You say you have a new way to check the tension. Can you run the new and old side by side without stopping production?

    6.  If you are going to gather the data on the variables for the third summary point how do you plan to record the defects – type and frequency so you have some assurance of a connection between the process parameters and the defects?

    If you can provide the answers to the above then perhaps I or someone else might be able to offer some suggestions.


    Robert Butler

    It looks like your plotting routine for the histogram is running some kind of default binning which is giving you a false impression of your data.  If you look at the linear plot of the data the histogram should have vertical bars at the same places -15, -10, -5, 0 , 5, 10, 15 but it doesn’t. Rather the histogram looks like the bars are at (maybe) -15, -10, -5, 0(?), 6, 10, 14.

    If you know the actual data counts for the measures at -15, -10, -5, 0 , 5, 10, 15 just make a two column data set with -15, -10, -5, 0 , 5, 10, 15 in one column and the associated counts in the other and make a histogram using that data set.  Given that the data is very granular it still looks like the shape of the histogram should be acceptably normal.

    After re-reading your initial post I guess the question I have is what is it that you wanted to do with the data?  If you are thinking about process control you could build an individuals control chart using the data but depending on what it is that you are trying to investigate that may not be what you really want to do.

    Your focus on differences from an ideal suggests your real concern might be one of checking for random vs non-random trending in the differences over time.  Since everything is in increments of 5 it might be worth considering examining your data looking at runs above and below the median. The question you would answer with an analysis of this type is – are the differences above and below the median random over time or is there a pattern.  If a pattern occurred it would tell you there is special cause variation present and you need to investigate why.

    Most good basic statistics text books will have the details for assessing runs above and below the median and the topic in the textbook will have the same title.



    Robert Butler

    @Straydog is correct – no subgrouping here just the individual data points.  Once you have a histogram built as he recommended (I’d also recommend plotting the data on a normal probability plot) post the graphs here and perhaps I or @Straydog or someone else may be able to offer additional thoughts.


    Robert Butler

    The central limit theorem applies to distributions of means so, the fact that the distribution of your means passes the Anderson-Darling test isn’t too surprising.

    I’m not trying to be nasty or mean spirited here but the bigger issue is a basic mistake you have made – your post suggests you didn’t take the time to really examine your data. As written all you have done is take some data, dumped it into a normality test, found the test indicates the data is non-normal, and decided that it is, in fact, non-normal to a degree that really matters with respect to what you are doing.

    The Anderson-Darling test, along with the other tests, is extremely sensitive to any deviation from ideal normality.  Indeed it possible for data taken from a generator of random numbers with an underlying normal distribution to fail one or more of these tests.

    Before you do an analysis you need to examine your data and that means the very first thing to do is  plot the data in any way that is meaningful and see what you see. In this case, the minimum you should do is generate a histogram, generate a normality plot and a boxplot of the data.  Given what you are doing I’d also run a time plot of the data to see if there is any obvious underlying trending over time.

    Once you have the normality plot you should look at it to see if it deviates from the reference straight line and then apply what is termed “the fat pencil test”.  This amounts to looking at the plot and checking to see if the plotted data can be covered with a fat pencil.

    The various plots will tell you the following:

    1. The histogram will give you an idea of the overall shape of the distribution of individual points.  The questions you would want to investigate with this plot would be:

    a. Is the plot unimodal? – If it is bimodal then you have work to do.

    b. Does the overall plot provide a visual appearance of something that is approximately normal?

    c. Is there any data that gives a visual impression of being outliers?  Are there just a couple or are there a lot (yes, this is a judgement call)?

    d. If there are just a few visual outliers, drop them from consideration and re-run your plots and your statistical tests. What do you see? Does everything change or do things stay much as they were?

    2. The normal probability plot will give you a much better sense of the approximate normality of the data.

    a. if the plot doesn’t approximate a fit to the reference straight line  but veers off sharply at either the low or the high end or exhibits clean breaks in the data with the subsets approximating straight lines that have large differences in slopes then you have work to do.

    b. On the other hand, if the data is randomly scattered about either side of the line or if it is staying in the relative vicinity of the line but randomly drifting above and below the straight line then it is safe to assume the data approximates normality to an acceptable degree.

    3. The boxplot will not only give you a good sense of the distribution shape it will also clearly characterize the behavior of the data in the tails as well as in the central part of the data distribution.

    4. A time plot will tell you if your process is changing over time.

    There are many individuals who view plotting data as something only a child should do. They find the idea of plotting data before running an analysis to be somewhat insulting and beneath their dignity. What they fail to understand is that a meaningful graph IS a statistical analysis.

    I’m a statistician and I could bore you to tears with story after story about running an analysis that began and ended with basic plots of the data – in other words – once my engineer/doctor/line manager/scientist/division head/CEO/technician looked at the plots I had generated the project ended because the graphs told us everything we needed to know with respect to the source of the problem and its solution.

    Some recommended reading:

    In order to get some idea of how, non-normal, data generated using a random number generator with an underlying normal distribution can look I would recommend you borrow a copy of Fitting Equations to Data – Daniel and Wood and look at the probability plots in the appendices of Chapter 3.

    In order to get some idea of the power of real graphs I would recommend you borrow all four of Tufte’s books on graphs and graphical methods (The Visual Display of Quantitative Information, Visual Explanations, Envisioning Information and Beautiful Evidence) and “read” them. I put “read” in quotes because, while there is text in his books, the books are graphs, all kinds of graphs, and he provides the reader with a very clear visual understanding of graphical excellence and what a proper graph can do.


    Robert Butler

    I think you are still misunderstanding the issue of sample size. Your statement “…covers 80% of the population with 95% confidence…” has no meaning.

    Let’s pretend you have done all of the usual things you need to do with respect to visually inspecting the distribution of those 100 samples (histogram, data on a normal probability plot, time plot of data, etc.) and what you have observed is a uni-modal distribution which is “acceptably” symmetric (this being a judgement call) and whose time plot is “acceptably” random (no obvious trending over time).

    Given this the next step would be to compute the sample mean of those 100 samples and its associated standard deviation.

    Armed with this information you can address any of the following questions:

    1. How small a sample would I need to compare the mean of a new sample to the mean of the existing sample and be certain with a probability of 95% that the mean of the new sample was not significantly different from the mean of the 100 sample baseline?

    2. How small a sample would I need to compare the standard deviation of a new sample to the standard deviation of the existing sample and be certain with a probability of 95% that the standard deviation of the new sample was not significantly different from the mean of the 100 sample baseline?

    3. Given the 100 sample mean and the 100 sample standard deviation what kind of a spread around the sample mean would encompass 95% of the data – or, in your case, what kind of a spread around the sample mean would encompass 80% of the data where the data is:

    a. existing/future individual samples
    b. the means of samples of size X where X is is the count of a new sample whose total is something less than 100
    c. the means of future samples of the same size as my original (100 samples)

    Given what you have posted my guess (and I could be wrong about this) is question 3a is the one you want answered.

    If you want to know the expected range of individual measurements around the mean of the results from your 100 samples then the equation would be:

    #1 Range = sample mean of the 100  +- (1.987*sample standard deviation/sqrt(n))  where n is the sample size. Since you are interested in the estimate of the range for a single sample n = 1.  The 1.987 is an extrapolation for the fractional points of a t Distribution where for 95% the value for 60 samples is 2.00 and the value for 120 samples is 1.98.

    So, the estimate of the range of single values for 80% of the population would be:

    #2 Range = sample mean of the 100  +- (1.291*sample standard deviation/sqrt(n))  where, again, n = 1 and 1.291 is the extrapolation for the fractional points of a t Distribution where, for 80%,  the value for 60 samples is 1.296 and the value for 120 samples is 1.289.

    Thus, if you took a new measurement from the same process and you wanted to be sure that single measurement could be viewed as coming from the central 80% of the distribution of your process output (as defined by you 100 sample check) you would examine it relative to the #2 Range – if it was inside those limits then you would have no reason to believe it was not a sample which was representative of the central 80% of your process output.


    Robert Butler

    As written, the answer to your question is to sample 80% of the data – there’s no estimation involved.

    Sample size questions focus on moments of a distribution (things like means, standard deviations, etc.)of a sample and are phrased in the following manner: How many samples do I need to take in order to be certain that the mean/standard deviation/percent defective/percent response/percent change/etc. of the sample is not significantly different from a target?  Where the target can be things like – customer specified  means or standard deviations, gold standard targets, moment measures of an earlier sample from the process before we made a change, etc.

    So, if you are interested in sample sizes then your question needs to be recast – for example:

    I have 100 batches of process data. How many samples will I need to take to be 80% certain that my sample mean is within plus/minus some range of a target mean?



    Robert Butler

    @KatieBarry Ah Ha! So it IS resurrect old threads month!!!  :-)

    I’m sorry to have to admit this but I don’t think I’ve ever checked the newsletter. Thanks for the heads up.


    Robert Butler

    @KatieBarry – is this resurrect old threads month?  This isn’t a complaint – just wondering.


    Robert Butler

    Minitab is great so I’m sure they appreciate the 2020 tip of the hat in their direction but this thread has been dormant since 2013 so the original poster might be a bit slow in responding.  :-)


    Robert Butler

    There isn’t a simple guideline. Which one of  the design methods you use will depend on what it is that you want to do and the kinds of variables you have.

    For example:

    1. Are the variables of interest continuous (or can they be treated as continuous), are they nominal, or are they a mixture of the two?

    2. Are you trying to assess mixtures?

    3. Are there known combinations of independent variables that either cannot be physically run or if running them would result in something you know will be other than you will need for the final results of an experiment?

    4. What can you afford in terms of time/money/effort?

    5. Are there restrictions with respect to randomizing the run order of your design which will require you to block your design on certain variables?

    6. Is this going to be an initial investigation with a number of variables whose impact is: well defined, not well understood, of some interest because we suspect it might be important, included out of curiosity, or maybe a mix of all of the preceding.

    7. Is this a situation where you are fairly certain of the effect of the variables of interest and are interested in seeing if there is anything better within the current parameter space?

    8. Do you expect the relationship between the variable level and the output to be something other than a simple linear one?  If so do you have the time/money/effort/desire to check for all possible curvilinear behavior or only just a select few?

    9. Do you think there might be some synergistic effects (interactions) between certain variables?  If so which ones?


    Once you have the answers to questions like these then you can press on with a discussion concerning the merits of the various choices for DOE.  So, if you have something in mind please provide the answers to the above questions and either I or someone else will be happy to provide additional information/advice.



    Robert Butler

    If you have multiple responses then presumably you have a predictive equation for each of those responses.  If that is the case then you should have a minimum and a maximum level of desirability for each response.  If this is true then take your equations – put them in a program, assemble the matrix of X’s and run the equations against the matrix.  Next, tell the machine to sort through the predictions with the restrictions on minimum and maximum acceptable criteria for all of the responses (this done simultaneously) and find those that fall within your specifications.

    What you will have will be a matrix consisting of settings and the corresponding predicted responses. Since it is very unlikely you will find a combination of X’s that gives you the best of everything you will have to look over the predictions and decide for yourself which set of X’s results in a group of predicted responses that are the “best”.


    Robert Butler

    So if I’m understanding correctly the question you are asking is how to generate a fit to the data in your plot that will be in the form of a predictive equation.

    If that’s the case then, based on just looking at the plots, my first try would be a simple linear regression using the terms pipe motion, the square of pipe motion, stiffness and the interaction of stiffness and pipe motion and/or the interaction of stiffness and pipe motion squared.

    The other first try option could be a form of one of Hoerl’s special functions with an additional interaction term.

    The linear form would be ln(vessel motion) = function of ln(pipe motion), pipe motion, stiffness and an interaction term of either stiffness with pipe motion or stiffness with ln(pipe motion).

    For every attempt at model building you will need to run a full residual analysis.  The plots of the residuals vs the predicted, independent variables, etc. will tell you what you need to know as far as things like model adequacy,  missed terms, influential data points, goodness of fit, etc.

    Repeating this exercise with other terms that the residuals might suggest should ultimately result in a reasonable predictive model – be aware – any model of this type will have an error of prediction associated with it and your final decision with respect to model accuracy will have to take this into account.

    If you don’t know much about regression methods you will need to borrow some books through inter-library loan.

    I’d recommend the following:

    Applied Regression Analysis – Draper and Smith – read, understand, and memorize the first chapter – the second chapter is just the first chapter in matrix notation and may not be of much use to you. You will need to read, thoroughly understand and do everything listed in Chapter 3 – The Examination of the Residuals.

    Regression Analysis by Example – Chatterjee and Price – an excellent companion to the above and, as the book title says – it provides lots of examples.

    Fitting Equations to Data – Daniel and Wood – from you standpoint the most useful pages of the book would probably be pages 19-27 (in the second edition – page numbers might be slightly different in later editions) for the chapter titled “One Independent Variable” – yes, I know, you have 2 variables – but the plots and the methods will help get you where you want to go.

    You will also want to follow what these books have to say about models with two independent variables and interaction terms (also known as cross-product terms).



    Robert Butler

    I’m missing something.

    The idea of an experimental design is that you have little or no idea of the functional relationship (if any) between variables you believe will have an effect on the output and the level(s) of the output itself.  You put together a design which consists of a series of experimental combinations of the independent variables of interest and you then go out into the field/factory/seabed floor/ hospital OR/whatever and physically construct the various experimental combinations, run those combinations, get whatever output you may get from the experimental run, and then use that data to construct a model.

    It sounds like you already have a model which (I assume) is based on physical principles and mirrors what is known about seabed stiffness/pipe motion behavior.  If you already have a model (which your graph suggests is the case) then all that is left is either a situation where you want to go out on the seabed and run some actual experiments with pipe motion to see if the current model matches (within prediction limits) what is actually observed or a situation where you want to run some matrix of combinations of seabed stiffness and pipe motion just to see what the model predicts.

    If it is the former situation then the quickest way to test your model would be to actually run a simple 2×2 design where you have two seabed conditions and two levels of pipe motion and where you try to find settings that are as extreme as you can make them.  If it is the latter then, unless it is very costly to run your model, I don’t see why you wouldn’t just run all of the combinations and see what you see.  The problem with the latter is, if you already have an acceptable model, I don’t see the point of running all of the calculations since it doesn’t sound like you have any “gold standard” (actual experimental data) for purposes of comparison.

    Shifting subjects for a minute.  Let’s just talk about DOE.

    You said, “My question is that: how can I define different levels for each factor (e.g. 6 levels for factor 1, and 11 levels for factor 2) in the DOE method? which method in this theory is beneficial for my project?

    I have searched around these topics and some guys said me: ‘the “I-Optimal” method of the RSM is helpful for your project. because in this method you can define several levels for each factor and there are not any issues if levels of each factor are not the same as other factor’s levels’.”

    If what you have written above is an accurate summary of what you were told then it is just plain wrong.

    If the graph you have provided really mirrors what is going on then you need to ask yourself the following question:

    Why would I need all of those levels of stiffness and pipe motion?  Consider this – two data points determine a straight line, 3 data points will define a curve that is quadratic in nature, 4 data points will define a cubic shaped curve, 5 data points will define a quartic curve, and 6 will give you a quintic.  The lines in your graph are simple curves – no inflections, no double or triple inflections – in short no reason they couldn’t be adequately described with measurements at three different levels and thus, no reason to bother with more than three levels.

    As for the notion concerning I-optimality (or any of the other optimalities) – they are just computer aided designs.  The usual reason one will opt for a computer aided design is because there are combinations of independent variables that are known to be hazardous, impossible to run, or are known to provide results that will be of zero interest (for example – we want to make a solid – we know the combination of interest will only result in a liquid) and if we try to examine the region of interest using one of the standard designs we will wind up with one or more experiments of this type in the DOE matrix.

    Finally, I’ve never heard of any design that requires all variables to have the same levels.  If this was said then my guess is what was meant was when you are running a design it is not necessary for, say, all of the high levels of a particular variable to be at exactly the same level.

    If that was the case then, yes, this is true – but this is true for all designs – basically, as long as the range of “high” values for the “high” setting of a given variable do not cross over into the range of the “low” settings for that variable you will be able to use the data from the design to run your analysis.  Of course, if you do have this situation then you will need to run the analysis with the actual levels used and not with the ideal levels of the proposed design.





    Robert Butler

    Maybe we are just talking past one another but I’m not sure what you mean by “the numerical output from the process creates the shape of the model.”

    A random variable is a function that takes a defined value for every point in sample space. – Statistical Theory and Methodology in Science and Engineering – Brownlee – pp. 21.  Given this I would say the process is the random variable and the output is just a quantification of what the random variable is doing. In other words – the random variable is going to determine the output distribution and all the numerical record is doing is cataloging that fact.


    Robert Butler

    Based on what you have posted the short answer to your question is – no – DOE won’t/can’t work in this situation.

    The factors in an experimental design require either the ability to change the levels of each factor independently of one another or, in the case of mixture designs, to vary ratios of the variables in the mix.  In your situation you do not have the ability to change stiffness and pipe motion independently of one another – indeed it does not sound like you have the ability to change either of these variables in the real world setting.

    The point of a DOE is to provide outputs that are the result of controlled, organized changes in inputs.  The DOE does not predict anything it only acts as a means of data gathering and once that data has been gathered you have to apply analytical tools such as regression to the existing data in order to understand/model the outputs of the experiments that were part of the DOE.


    Robert Butler

    I’m not sure what you mean by the regular rules.  If you mean the issue of control limits based on standard deviations about the means then you can still use them.  The issue is one of expressing the data in a form that will allow you to do this.  When you have an absolute lower bound which is 0 the usual practice is to add a tiny increment to the 0 values, log the data, find the mean and standard deviations in log units and then back transform.  What you will get will be a plot with asymmetric control limits.  Once you have that you can press on as usual.


    Robert Butler

    That would make sense if the aspect of the supplements you believe most impacts the outcome is the percentage of total nitrogen, otherwise it won’t matter.

    The idea is if total nitrogen was understood to be the critical component of the supplements you would chose the two supplements with the lowest and highest levels of total nitrogen and run the analysis on the basis of nitrogen content. This would allow you to treat the supplements as continuous.

    Another option would be sub-classes of raw materials and supplements.  In that case you might have some raw materials with an underlying continuous variable of one kind and others with a different one.  If you have this case then you would do the same thing as mentioned above.  Of course, this would mean building experiments with combinations of raw materials and supplements.  My guess would be you couldn’t do this with the catalysts so they would have to remain categorical variables.  However, if you can express the raw materials and supplements as functions of critical underlying continuous variables and can mix them together then the number of experiments would be drastically reduced.

    For example, let’s say your raw materials fall into three groups with respect to a critical continuous variable and your supplements can be grouped into 4 separate groups – you could put together an 8 run saturated design with these and then run the saturated design for each one of the separate catalysts. That would give you a 32 run minimum.  If you tossed in a couple of replicates (I’d go for a random draw from each of the 4 saturated designs so I had one replicate for each catalyst) you would have 36 runs and you could get on with the analysis.


    Robert Butler

    If you have a situation where you can only run one raw material with no combinations of other raw materials and similarly for the catalysts and the supplements and you have no underlying variables that would allow you to convert them the continuous measures then you are stuck with running all the possible combinations which would be 480.  I understand the need for blocking so what you will want to do is draw up the list of the 480 combinations, put them in order on an Excel spread sheet – go out on the web and find a random number generator that will generate without replacement and generate the a random number sequence for 1-480.  Take that list, enter it in a second column in the Excel spread sheet so there is a 1-1 match between the two columns, sort on the second column so now 1-480 are in order and use that template to guide you in your selection of experiments (the first 10, then the second etc.)

    As a check it would be worth randomly adding into the list a few of the experiments (again do a random selection) to permit a check with respect to run-to-run variability.




    Robert Butler

    You will have to provide some more information before anyone can offer much of anything in the way of advice.  You said, “I have 15 difference raw materials (A1, A2… A15), 4 catalysts (C1, C2, C3, C4) and 8 supplements (B1, B2…B8). ”


    1. Is this a situation where the final process can have combinations of raw materials, catalysts and supplements or is this a situation where you can only have a raw material, a catalyst and a supplement?

    2. If this is the first case and you have absolutely no prior information concerning the impact of any of the raw materials, catalysts, or supplements on your measured response and you have no sense of any kind of a rank ordering of raw materials, catalysts, or supplements as far as what you think they might do then your best bet would be to construct a D optimal screen design with 27 variables – this would probably be a design with a maximum experiment count of 40 or less.

    a. If you do have prior information and you can toss in any combination from raw materials, catalysts, and supplements then I’d recommend building a smaller screen design using only those items from raw materials, catalysts, and supplements that you suspect will have either a minimum impact or a major impact on the response.

    3. If this is a case where you can only have one from column A, one from column B and one from column C is there any way to characterize the raw materials, catalysts, and supplements with some underlying physical/chemical variable that can will allow you to treat the raw materials, catalysts and supplements as continuous variables?

    a. For example – let’s say the raw materials are from different suppliers but they are all the same sort of thing (plastic resin for example) and their primary differences can be summarized by examining say their funnel flow rates, the catalysts are all from different suppliers but they can be characterized by their efficiency at doing what they do and the supplements can also be characterized by whatever aspect of the final product they impact – say viscosity values.  If you have this situation then what you have is a situation with three variables at two levels.  In this case the lowest level for A would be the raw material that had the least of whatever it is and the high level would be the raw material that had the highest level of whatever and so on for catalyst and supplements.  This would be a design of about 10 experiments – a 2**3 with a couple of repeats.

    4. If there isn’t any way to characterize the variables in the three categories and you have to treat everything as a type variable then you are stuck with having to run all combinations – 480.


    Robert Butler

    The short answer is the CLT doesn’t have anything to do with the capability analysis.

    You are correct – the CLT refers to distributions of means not to the distribution of any kind of single samples from anything.  The statement from the WI (whatever that is)  “based on the CLT, a minimum sample size of n ≥ 30 (randomly and independently sampled parts) is required for a capability analysis. The stated rationale is that generally speaking, n ≥ 30 is a “large enough sample for the CLT to take effect”.”  comes from a complete misunderstanding of the CLT.  As for the second part about the 30 samples rule of thumb – that’s not worth much either.

    If you need a counter-example to help someone understand the absurdity of that statement consider the case where the choices are binary and you take a sample of 30 and get a group of 0’s and a group of 1’s.  If you make a histogram of your measures you will have two bars – one at the value of 0 and the other at the value of 1 and no matter how hard you try that histogram cannot be made to look normal.

    For a capability analysis you have to have some sort of confirmation that the single samples you have gathered exhibit an acceptable degree of normality.  The easiest way to do this is to take the data and plot it on normal probability paper and apply the fat pencil test.  If it passes that test you can use the data to generate an estimate of capability.

    If it doesn’t, in particular if the plot diverges wildly from the perfect normal line and exhibits some kind of extreme curvature, you should consult Chapter 8 in Measuring Process Capability by Bothe. It is titled “Measuring Capability for Non-Normal Variable Data”.  That chapter provides the method for capability calculations when the data is non-normal.


    Robert Butler

    Control charts don’t test for normal distribution and, while the standard calculation for process capability requires approximately normal data, it is also true that process capability can be determined for systems where the natural underlying distribution of the data is non-normal.  Chapter 8 in Bothe’s book Measuring Process Capability titled “Measuring Capability for Non-Normal Variable Data” has the details concerning the calculation.


    Robert Butler

    @andy-parr  it’s a spammer.  They get through from time to time – I’ve had the same person/entity trying for the same thing.  I’ve seen a few in the past as well.  This one is just more persistent.


    Robert Butler

    From what I found after a quick Google search for the definition of TMV the big issue with that process is exactly what you stated in your second post: TMV focuses on the issue of repeatability and reproducibility.  Since assessment of repeatability and reproducibility are the main issue and since it was what I thought you were asking in your first post your question collapsed to one of determining a measure of the variability that could be used for test method validation. It was/is this question I am addressing and it has nothing to do with skill in using, or knowledge about, TMV.

    The kind of process variability you will need for your TMV will have to reflect only the ordinary variability of the process. Ordinary process variation is bereft of special cause variation.  The variability associated with bi-modal or tri-modal data contains special cause variation.  As a result, should you try to use the variability measure from such data for a TMV your results will be, as I stated, “all over the map.”

    I appreciate your explaining the issue with respect to the testing method – mandatory is mandatory and, I agree, there’s no point in worrying about it. I only questioned the methods since your first post left me with the impression you were just trying various things on your own.

    So, taking your first and second posts together I think this is where you are.

    1.       Your process can be multimodal for a variety of reasons.

    2.       You don’t care about any of this since the spec is off in the west 40 somewhere and no cares about the humps and bumps.

    3.       For whatever reason(s) identifying the sources of special cause variation impacting your test setup is not permitted.

    4.       When you tried to control for some variables you though might drive multi-modality you still wound up with a bi-modal distribution in your experimental data.

    5.       For the controlled study the two groups of product were distinctly different and both had a narrow distribution.

    6.       You want to find some way to ignore/bypass the bi-modal nature of the controlled series of builds and come up with some way to use the data from the controlled build to generate an estimate of ordinary variation you can use for your TMV.

    The easiest way to use your test data to attempt to get some kind of estimate of ordinary variation suitable for a TMV would be to go back to the data, identify which data points went with which mode, assign a dummy variable to the data points for each of the modes (say the number 1 for all of the data points associated with the first hump in the bi-modal distribution and number 2 for all of the data points in the second), and then run a regression of the measured properties against the two levels of the dummy variable.

    Take the residuals from this regression and check them to see if they exhibit approximate normality (fat pencil test).  I’d also recommend plotting them against the predicted values and look at these residual plots just to make sure there isn’t some additional odd behavior.

    If the normal probability plot indicates the residuals are acceptably normal, and if the residual plots don’t show anything odd, then you will take the residuals and compute their associated variability and use this variability as an estimate of the ordinary variability of your process.

    The reason you can do this is because by regressing the data against the dummy variable you have removed the variation associated with the existence of the two peaks and what you will have left, in the form of the residuals, is data that has been correctly adjusted for bi-modality.

    …and now the caveats

    1.       You only have the results from the one controlled build – you are assuming the result of a second build will not only remain bi-modal but that the spread around the residuals from the analysis of data from the new build will not be significantly different from the first series.

    2.       All you have done with the dummy variable regression is back out the bi-modal aspect – you have no idea if the residuals are hiding other sources of special cause variation, you have no idea how those unknown special causes may have impacted the spread of the two modes, and that means you have no guarantees of what you might see the next time you try to repeat your controlled analysis.

    3.       If there is nothing else you can do then, as a simple matter of protecting yourself, I would recommend you insist on running a series of the same controlled build experiment over a period of time (say at three- month intervals) for at least a year and see what you see. My guess is, even if you manage to somehow only have two modes each time you run the experiment, you are going to see some big changes in the residual variability from test-to-test.

    4.       If you should ever have to face a quality audit armed with only the results based on the above and the auditor is an industrial statistician you will have some explaining to do.


    Robert Butler

    After a Google search I’m guessing that TMV stands for Test Method Validation.  If this is the case then I think you need to back up and re-think you situation.

    What follows is a very long winded discourse concerning your problem so, first things first. The short answer to your question is – no – running a TMV and pretending the bi-modal nature of your results does not matter is a fantastic way to go wrong with great assurance.

    You said you have what appears to be a multimodal distribution of your output but everything meets customer spec.  Based on this and on some other things you said it sounds like the main question is not one of off-spec but just one of wondering why you get multimodal results.

    1. You said,”We are now repeating TMV for that test, and due to its destructive nature, we must use ANOVA to determine %Tol. We also calculate STD ratio (max/min). 4 operators are required.”

    2. You also said,”For TMV we limited the build process ranges – one temp, one operator etc and we have a distinctly bimodal distribution (19 data points between 0.850 and .894 and 21 data points between 1.135 and 1.1.163) LSL is 0.500. Reduction to a unimodal distribution is not worth the expense from a process standpoint, and we wouldnt know how to do so, since it may be incoming materials causing this distribution (All are validated, verified, suppliers audited etc. Huge headache, huge expense to make changes.)”

    The whole point of test method validation is an assessment of such things as accuracy, precision, reproducibility, sensitivity, specificity, etc.  Since your product is all over the map and since your (second?) attempt at TMV gave results that were also all over the map for reasons unknown the data resulting from your attempt at TMV, as noted in #2 above, are of no value.  The data from #2 is not accurate and you cannot use that data to make any statements about test method precision.

    I would go back to #2 and do it again and I would check the following:

    1. Did I really have one temperature?

    2. Was my operator really skilled and did he/she actually follow the test method protocol.

    3. What about my “etc.” were all of those things really under control or if I couldn’t control them did I set up a study to randomize across those elements I thought might impact my results (shift change, in house temperature change, running on different lines, etc.)?

    4. It’s nice to know the suppliers of incoming raw material are “validated, verified, suppliers audited etc.” but that really isn’t the issue. The two main questions are:

    1) What does the lot-to-lot variation of all of those suppliers look like (both within a given supplier and, if two or more suppliers are selling you the “same” thing, across suppliers)?

    2) When you ran the TMV in #2 did you make sure all of the ingredients for the process came from the same lot of material and from the same supplier?

    I’m sure you have a situation where not all ingredients come from a single supplier but the question is this – for the TMV in #2 did you lock down the supplies for the various ingredients so that only one supplier and one specific lot from each of those suppliers was used in the study in #2?

    The reason for asking this is because I’ve seen far too many cases where the suppliers had jumped through all of the hoops but when it came down to looking at the process the “same” material from two different suppliers or even the “same” material from a single supplier was not, in fact, the “same”.  The end result for this lack of “sameness” is often the exact situation you describe – multimodal distributions of final product properties.

    A couple of questions/observations concerning point #1:

    1. I don’t see why you think you need ANOVA to analyze the data – nothing in your description would warrant limiting yourself to just this one method of analysis.

    2. You stated you needed 4 operators yet in #2 you said you were using one operator. As written both #1 and #2 are discussing TMV so why the difference in number of operators?



    Robert Butler

    I would disagree with respect to your view of a histogram for delivery time.  I would insist it is just what you need to start thinking about ways to address your problem.

    Given: Based on your post – just-in-time (JIT) for everything is a non-starter.

    Under these circumstances the issue becomes one of minimization of the time spread for late delivery.

    You know 75% of the parts have a time spread of 0-6 days.  What is the time spread for 80,90,95, and 99%?  Since all parts are created equal and all that matters is delivery time then the focus should be on those parts whose delivery time is in excess of say 95% of the other parts.  Once you know which parts are in the upper 5% (we’re assuming the spread in late delivery time for 95% of the parts is the initial target) the issue becomes one of looking for ways to insure a delivery time for the 5% is less than or equal to whatever the upper limit in late delivery time is for the 95% majority.

    Having dealt with things like this in the past my guess is you will first have to really understand the issues surrounding the delivery times of the upper 5%.  I would also guess once you do this you will find all sorts of things you can change/modify to pull late delivery times for the upper 5% into the late delivery time range of the 95%.

    Once you have the upper 5% in “control” you can choose a new cut point for maximum late delivery time and repeat.  At some point you will most likely find a range of late delivery time which, while not JIT, is such that any attempt at further reduction in late delivery time will not be cost effective.



    Robert Butler

    Found it.   It’s about 8 months back and it is the same question – the question also has other wrong answers as discussed in the thread.



    Robert Butler

    This is the same garbage question another poster asked about several months ago – the answer was wrong then and it is wrong now.

    Three factors at two levels is a 2**3 design = 8 points.  There is no such thing as a 32 factorial design.

    Since it is evening I’ll rummage around a bit and see if I can find the earlier post.  If I’m remembering correctly there are other wrong answers on that test.



    Robert Butler

    It’s not a matter of calculating the t value and then transferring  it into a p-value.  What you are doing is computing a t statistic and then checking that to see if the value you get meets the test for significance.  Small t-values breed large p-values and conversely.  :-)

    You are doing what I recommended and your plot is telling you what you computed – there is a slight offset between the two groups of data which means that the averages are numerically different but there really is no statistically significant difference.  If you want an assessment of the means then you could turn the analysis around – treat the yes/no as categorical and run a one way ANOVA on the results.  You’ll get the mean values for the two choices and you will also get no significant difference.

    All of the above is based on commenting on the overall plot you have provided.  There are a couple of interesting things about your plot you should investigate. It is a small sample size and it looks like your lack of significance is driven  by 4 points – the two points at -1 that are much lower than the main group and the two points at 1 that are higher than the main group.

    As a check – remove those 4 points and run the analysis again – my guess is you will either have statistical significance (P < .05) or be very close.  If you do get significance, and if it is a situation where you were expecting significance, then you will want to go back to the data to see if there is anything out of the ordinary with respect to those 4 points.  If there is something that physically differentiates those points from their respective populations and if their deletion results in significance then I would recommend the following:

    1. Report out the findings with all of the data.

    2. Include the plot as part of the report.

    3. Emphasize the small size of the sample.

    4. Report the results with the deletion of the 4 points.

    5. Comment on what you have found to be different about the  4 points.

    6. Recommend additional samples be taken with an eye towards controlling for whatever it was that you found to be different about the 4 points and re-run the analysis with an increased sample size to see what you see.

    • This reply was modified 1 year, 6 months ago by Robert Butler. Reason: typo

    Robert Butler

    For the casual reader who is late to this spectacle and who does not wish to plow through all 94 posts allow me to provide a summary.

    The OP has a pure theory, based on axioms and postulates of his choosing with no demonstrable connection to empirical evidence, which he has used to evaluate various and sundry aspects of MINITAB analysis and works by Montgomery.  His theory generates results that are wildly at odds with tried and tested methods of analysis as delineated by MINITAB and Montgomery.

    Through a series of polite and courteous posts it has been pointed out to the OP that extraordinary claims, such as his, require extraordinary proof and that the burden of providing this proof is on his shoulders and not on the shoulders of those whom he has insisted are incorrect.

    The OP’s response to this well understood tenet of scientific inquiry has been one continuous run of Proof-By-Volume consisting of raw, naked, ugly, disgusting, hateful, ad hominem attacks on anyone questioning his theory and the conclusions he has drawn from it. The OP has made it very clear he not only has no intention of providing such proof but he also does not think proof is necessary.

    His intransigence with respect to the idea of the need to provide extraordinary proof is mirrored in his choice of words. An assessment of his 51 posts (as of 5 March 2020) indicates his favorite words and word assemblies are (in no particular order)

    Wrong – 47 times – applied to everyone but himself

    Theory – 52 times – by far the favorite

    Ignorance, Incompetence and Presumptuousness – all together or at least one occurrence of any of the three words – 41 times  – again – as applied to everyone else

    As the most prolific responder to his posts I’ve tried to politely point out the shortcomings of his theory and why it isn’t viable.  It has been, as I thought it probably would be, a vain effort. The only thing I’ve received in return is a barrage of ad hominem attacks and multiple accusation of waffling (6 times) (to waffle – to be unable to make a decision, to talk a lot without giving any useful information or answers). Waffling – back at you OP.

    There really isn’t anything else to say – the OP has constructed a theory based on postulates and axioms of his choosing. He has reported the results of his theory as though they were correct while the empirical evidence he himself has compiled in this regard says otherwise, he has provided no proof of the correctness of his theory, and he reacts violently to any suggestion that his theory is wrong.  I’m sure the OP will want to have the last word, as well as the one following that – go for it Fausto  – QED (Quite Enough Discussion)


    Robert Butler

    As I noted in my post to your thread yesterday your situation is exactly that described by Abraham Kaplan and you have confirmed this to be the case in your post where you specifically stated:

    “For me THEORY is

    ·              ·         The set of all LOGIC Deductions

    ·              ·         Drawn

    ·              ·         From Axioms and Postulates

    ·              ·         Which allow to go LOGICALLY from Hypotheses to Theses

    ·              ·         Such that ALL Intelligent and Sensible people CAN DRAW the SAME RESUL”


    So, to reiterate – what you have is a pure theory which has no basis in empirical fact and which is, as you are so fond of saying – wrong.  What is particularly interesting about your situation is that you have conclusively proved you are wrong and you have demonstrated that fact time and again.

    In the scientific world when one constructs what they believe to be a new theory the very first thing they do is make sure their theory is capable of accounting for all of the prior known facts in whatever area it is they are working.  Montgomery’s work, the methods and results of T Charts, etc. have been around for a long time and have withstood the close scrutiny and testing by myriads of scientists, engineers, statisticians, health practitioner, etc.  No one has found anything wrong with them.

    If a real scientist constructs a theory and finds it at odds with prior known facts the first thing that individual will do is choose to believe there is something wrong with the theory and spend time either trying to adjust the theory to match the known facts or, if that proves to be impossible, scrap the theory and start over.  In your case, you tested your theory and found it at odds with Montgomery and came to the instant conclusion your theory, based on a set of axioms and postulates of your choosing, was correct and everyone else was wrong.

    As Kaplan noted – the world your theory has deduced will only mirror reality to the extent that the premise mirrors reality. The empirical evidence you have compiled in your uploaded rants about MINITAB and Montgomery’s text repeatedly confirm the fact that your theory (and most certainly the postulates and axioms upon which it is based) does not mirror reality and is therefore wrong.

    Instead of recognizing this very obvious fact you have chosen to spin the failures of your theory as successes, indulge in self-glorification, and shout down and denigrate anyone and anything that challenges your personal view of yourself and your theory. This isn’t science. It is raw, naked, ugly, disgusting, hateful, ad hominem attacks of the lowest order – in other words – Proof By Volume. Mussolini would be so proud.

    As for your demands for my peer reviewed papers all I can say is let’s stay focused on the topic of this thread.  This thread was started by you. It is not about me nor anyone else who has participated. It is about a guy named Fausto Galetto and his pure theory based on axioms and postulates which do not incorporate empirical evidence. Let’s keep it that way.


    Robert Butler

    In all the time I’ve been a participant on the Isixsigma Forums, this has to be the saddest thread I’ve seen or been a part of.

    Whether you wish to acknowledge it or not @fausto.galetto, this is your situation:

    From: The Conduct of Inquiry – Abraham Kaplan pp. 34-35

    “It is in the empirical component that science is differentiated from fantasy. An inner coherence, even strict self-consistency, may mark a delusional system as well as a scientific one. Paraphrasing Archimedes we may each of us declare, “Give me a premise to stand on and I will deduce a world!” But it will be a fantasy world except in so far as the premise gives it a measure of reality. And it is experience alone that gives us realistic premises.”

    You have a premise – your theory – and you deduced a world. And, based on your scornful rejection of @David007 recommendation for simulations as well as the lack of actual data from processes you personally have run and controlled, that is all you have – a theory bereft of empirical evidence.

    You wrote a paper describing the world resulting from your theory and you sent your description of that world to Quality Engineering. They gave your world some careful thought, compared it to the empirical evidence they had (as well as the current world view based on that evidence), found your world description wanting, and rejected it.

    Based on your posts it appears you did the same thing with MINITAB. They too, examined your world, found it did not agree with, nor adequately describe, the empirical evidence they had on hand, and chose to ignore it.

    All through the exchange of posts to this thread you have made it very clear that Fausto Galetto’s world based on pure theory is the only correct one and anyone who does not agree is ignorant, incompetent and presumptuous.  You, of course, have a right to believe these things but everyone else has a right to believe otherwise.

    What makes this exchange of posts particularly depressing is the idea you believe a company like MINITAB would know about a serious flaw in their T Chart routine and do nothing about it. You seem to forget that hospitals and medical facilities routinely use the MINITAB T Chart.  If there was something seriously wrong with that part of the program there would be ample empirical evidence in the form of lots of dead patients attesting to the fact: There aren’t.   If there had ever been anything seriously wrong with the T Chart program the statisticians at MINITAB would have made proper corrections to the program before it was ever offered in their statistics package.

    • This reply was modified 1 year, 6 months ago by Robert Butler. Reason: typo

    Robert Butler

    @fausto.galetto – sorry, I’m with @David007 on this one.  Your latest response(s) to me (and to him) are true to form – invective and shouting.  I know this is just poking the Tar Baby and prolonging a very sad string of posts at the top of the masthead of a very good site, but you really should re-read what you wrote and contrast that with my last post. There is nothing in any of the 8 statements you cited that have anything to do with THEORY – as you are so fond of putting it.

    Given your violent reaction to anything you perceive as contrary to your views, I actually took some time to look over some of the numerous things you have uploaded to that storage site ResearchGate.  I also took some time to check out the reputation of that site. Three things are immediately evident:

    1. The noble concept of “open access” is DOA and you are taking the life of your grant and the status of your professional reputation/career in your hands if you try to use anything on those sites as a basis/starting point for your work.

    2. If the “paper” you sent to  Quality Engineering looked anything like the stuff you have uploaded to that storage site (open access with no apparent oversight) it is not surprising it was rejected.

    3. Based on what you have posted here and on ResearchGate it appears you strongly favor the PBV method of scientific discourse (Proof By Volume).  That method works well in the world of politics and extreme religious movements but does not advance your cause (and hopefully it never will) in the world of engineering and science.


    Robert Butler


    The following statements – (I won’t bother to bold them since you have already)

    1. “There is no “hidden theory”. Joel Smith has a good paper on t charts …Control charts for Nonnormal data are well documented”

    2. “is your issue that the data is neither exponential nor Weibull, but something else?”

    3. “If you are saying the chart fails to detect assignable causes, try simulating some exponential data.”

    4. “But as I’ve repeatedly said with n=20 the model is going to be wrong anyway.”

    5. “You really should play with some simulated exponential data. You’ll be surprised by what you see as “inherent variation””

    6. “Minitab tech support is not needed. Calculate the scale (Exponential) or scale and shape (Weibull) using maximum likelihood. Compute .135 and 99.865 percentiles.”

    7. “I anxiously await to see your rebuttal paper in any of Quality Engineering, Journal of Quality Technology, Technometrics, Journal of Applied Statistics, etc.”

    8. “along with your spam practice”

    are not goads, are not impolite, and they most certainly are not “IGNORANCE, INCOMPETENCE and PRESUMPTUOUSNESS” as those words are generally understood.

    To goad is to provoke someone to stimulate some action or reaction – there is nothing provocative in any of the cited quotes. A rational assessment of the 8 quotes cited above is as follows: 1-6 are just plain statements of fact and/or polite questions concerning your take on the issue – nothing more.  #7 is a very polite and logical response to all of your posts that preceded it.  And #8 is an objective observation concerning your posting practice.

    In your numerous posts to this thread you have made it very clear you interpret anything that goes against your personal views as arising from “IGNORANCE, INCOMPETENCE and PRESUMPTUOUSNESS”. What you choose to ignore is the obvious fact that your personal views are not shared by the people you are addressing on this forum, by the body of practitioners of science/engineering/process control in general, nor by the body of reviewed and published scientific/engineering literature.



    Robert Butler

    @fausto.galetto – In your post to @David007 on February 29 at 4:39 AM you spewed out a lot of hate and you concluded with the statement “Tell me your publications SO THAT I CAN READ them…. and I will come back to you and to your “””LIKERS”””!”

    has been courteous and polite in every response he has made to your postings.  Your responses to his posts are nothing but shouting and invective. They are, by turns, irrational, ugly, viscous, denigrating, and hateful.  They have no place on a forum of this type.


    Robert Butler

    The problem you are describing has two parts.  The first part isn’t one of correlation rather it is one of agreement.  Things can be highly correlated but not be in agreement.  I would recommend you use the Bland-Altman approach to address the agreement part of the problem.


    1. On a point by point basis take the differences between the two methods.

    2. You would like to be able to compare these differences to a true value but since that is rarely available use the grand mean of the differences. Compute the standard deviation of the grand mean of the differences and identify the +- 1,2, and 3 standard deviations

    3. Next compute the averages of the two measures that were used for computing each difference.

    4. Plot the differences (Y) against their respective average values (X).

    5. If you have a computer package that will allow you to run a kernel smoother on this data and draw the fitted line do so.  If you don’t then you can use a simple linear regression and regress Y against X and look at the significance of the slope.  If you can only do the simple linear fit you will want to plot that line on the graph.  The main reason for doing this is to make sure the significance (or lack thereof) isn’t due to a few extreme points.

    To determine agreement you will want to see if the mean of the differences is close to zero. If there is an offset, but no trending then you will have identified a bias in the measures (based on the example you have provided this is to be expected).  If you have a significant trending and it is not being driven by a few data points then you will have a situation where the two methods are not in agreement. If they are not in agreement then you have some major issues to address.

    The second part is the issue of the approximate delta of .25 relative to the lower (9.8) and upper (13.8) limits.

    It looks like you have the following possibilities:

    1. The difference is approximately .25 and both processes are inside the 9.8-13.8 range

    2. The difference is approximately .25 and one process is outside either the upper or lower target and the other process isn’t.

    3. The difference is approximately .25 and both processes are outside of either the upper or lower targets.

    Your post gives the impression that a single instance of the approximate delta of .25 being associated with either of the processes outside the targets counts as a fail.  If this is true then the question becomes one of asking what are the probabilities of getting a difference of approximately .25 given that one or both of the processes are outside the target range and building a decision tree based on what you find.


    Robert Butler

    As you noted – given a big enough sample (or a small one for that matter), even from a package that generates random numbers based on an underlying normal distribution, there is an excellent chance you will fail one or more of the statistical tests… and that’s the problem – the tests are extremely sensitive to any deviation from perfect normality (too pointed a peak, too heavy tails, a couple of data points just a little way away from 3 std, etc.) which is why you should always plot your data on a normal probability plot (histograms are OK but they depend on binning and can be easily fooled) and look at how it behaves relative the perfect reference line.  Once you have the plot you should do a visual inspection of the data using what is often referred to as “the fat pencil” test – if you can cover the vast majority of the points with a fat pencil placed on top of the reference line (and, yes, it is a judgement call) it is reasonable to assume the data is acceptably normal and can be used for calculating things like the Cpk.

    I would recommend you calibrate your eyeballs by plotting different sized samples from a random number generator with an underlying distribution on normal probability paper to get some idea of just how odd this kind of data can look (you should also run a suite of tests on the generated data to see how often the data fails one or more of them).

    I can’t provide a citation for the hyper-sensitivity of the various tests but to get some idea of just how odd samples from a normal population can look I’d recommend borrowing a copy of Fitting Equations to Data – Daniel and Wood (mine is the 2nd edition) and look at Appendix 3A – Cumulative Distribution Plots of Random Normal Deviates.

    I would also recommend you look at normal probability plots of data from distributions such as the bimodal, exponential, log normal, and any other underlying distribution that might be of interest so that you will have a visual understanding of what normal plots of data from these distributions look like.


    Robert Butler

    I think you will have to provide more information before anyone can offer much in the way of suggestions.  My understanding is you have two populations which have a specified (fixed) difference between them and you want to know if this fixed difference is less than or greater than the difference between the lowest and highest spec limits associated with the two populations.  If this is the case then all you have is an absolute comparison between two fixed numerical differences.  Under these circumstances there’s nothing to test – either the fixed delta is greater or less than the fixed delta between greatest and least spec and things like correlation have nothing to do with it.


    Robert Butler

    If the measurements within an operator for each part are independent measures and not repeated measures then you could take the results across parts for each operator and regress the measurements against part type (you would have to construct a dummy variable for parts to do this).  The variability of the residuals from this regression would be a measure of the variation within each operator.

    If you then reversed the process and took all of the measurements for a single part across operators and did the same thing with operators as the predictor variable the variation of the residuals for that model would be a measure of the variability within a part.

    If you wanted to get an estimate of the process variation controlling for part type and operator you would take all of the data and build a regression with part and operator as the predictor variables.  The variation of the residuals would be an estimate of ordinary process variation with the effect of operators and parts removed.

    In all of these cases you would want to do a thorough graphical analysis of the residuals.  Since you would have the time stamp of each measurement you would want to include a plot of residuals over time as part of the residual analysis effort.  If you’ve never done residual analysis then I would recommend taking the time to read about how to do it.  My personal choice would be Chapter 3 of Draper and Smith’s book Applied Regression Analysis – 2nd edition – you should be able to get this book through inter-library loan.

    The big issue is that of operator measurements on the same part.  You can’t just give an operator a part and have him/her run a sequence of three successive measurements – these would be repeated measures not independent measures and the variation associated with them would be much less than the actual variation within an operator.  You would need to space the measurements of the parts out over some period of time and so that each measurement of a given part by each operator has some semblance of independence.


    Robert Butler

    The yes/no is the X variable and the  percentage is the Y  therefore just code no = -1 and yes = 1 and regress the percentage against those two values.  As for the p-value the “traditional” choice for significance of a p-value is < .05 so, using that criteria a p of .127 says you don’t have a significant correlation between the yes/no and the percentage.  This would argue for the case that whatever is associated with yes/no is not having an impact on the percentages.

    Since correlation does not guarantee causation and causation will not guarantee you will find correlation what you need to do (you should do this in every instance anyway) is put your residuals through a wringer before concluding that nothing is happening.  You would want to plot the residuals against the predicted values as well as against the yes/no response.  If there are other things you know about the data (for instance, you know it was gathered over time and you have a time stamp for each piece of data) you will want to look at the data and the residuals against these variables as well.

    Since you had a good reason for suspecting a relationship a check of the residual patterns will help you find data behavior that might account for your lack of significance.  If the data structure is adversely impacting the regression you may see things like clusters of data, a few extreme data points, trends in yes/no choices that are non-random over time, etc. which are adversely influencing the correlation.

    If you should find such patterns you would want to identify the data points to which they correspond and re-run the regression. If the revised analysis results in statistical significance then you will need to go back to the process and try to identify the source of the influential data points. If it turns out there is something physically wrong with those data points you could justify eliminating them from the analysis and reporting your findings but you will also want to make sure that you clearly discuss this decision in your report.



    Robert Butler

    There’s any number of ways you could do this.  You could run a regression with percentage as the Y and yes/no as the X.  You could use a two-sample t-test with the percentages corresponding to a “no” as one group and the percentages corresponding to a “yes” as the other group.  If the distribution of the percentages is crazy non-normal you might want to run the t-test and the Wilcoxon-Mann-Whitney test side by side to see if they agree (both either find or do not find a significant difference).


    Robert Butler

    The short answer to your question is there isn’t a short answer to your question.  The explanation provided by Wikipedia is a good starting place but you will have to take some time to not only carefully read and understand what is being said but also put pencil to paper and work through some math.

    I’ve spent most of my professional career as an engineering statistician and biostatistician. I always keep paper and pencil handy when I’m reading a technical article and I take the time to convert what I think I have read into mathematical expressions.  I find when I do this I not only gain a better understanding of what I’ve read but, if I’ve misunderstood what was written, I find it is easier to see that I have misunderstood. This, in turn, helps me on my way to a correct understanding.

    Perhaps if you had written out, in mathematical form, the highlighted section of the quote in your initial post you would have realized your question “How can you have several squared deviations of mean? Square it and it is just (mean)(mean) …?” is incorrect. The highlighted segment does not say “squared deviations of mean” it says “several squared deviations FROM that mean”  In mathematical terms it is saying (x(i) – mean)*(x(i) – mean) where x(i) is an individual measurement. The difference between the previous squared expression and what you wrote is all the difference in the world.


    Robert Butler

    The phrasing of several of the questions is terrible…and there is one that is simply wrong.  Now that we have that out of the way my thoughts on the questions are as follows:

    Question 46: I would choose B.  When you lay a t-square over the plot there is no way the minimum at 6 is anywhere near 100. If they are claiming D is the correct answer then I don’t see how that’s possible.  “Between 2-4” means anything between those two numbers so you would look at the values from 2.01 to 3.99 and, depending on which line you choose (regression line, 95% CI on means, 95% CI on individuals) you can get a 20-40 range.

    As an aside- if you are going to ask questions like this you owe it to the people taking the test to provide a graph that has enough detail to permit an assessment of questions such as this one – If I ever made a presentation where the required level of detail with respect to output was 10 units my investigators would be really upset if I gave them a graph with a Y axis in increments of 50.

    Question 49: I would choose D.  If the residuals can be described by a linear regression then the residual pattern is telling you that you have some unknown/uncontrolled variable that is impacting your process in a non-random fashion.  The good books on regression analysis will typically show examples of residual plots where the plot is telling you that you have something else you need to consider.  The three most common examples are residuals forming a straight line, residuals forming a curvilinear line, and residuals forming a < or a > pattern.

    Question 52: Of the group of choices C is correct but the question is terrible – I’ve never had a case where a manager asked me for the average cost of an item rather the issue was always the cost per item which means you calculate the prediction error based on the CI for individuals.  What they are doing is using the CI on means.  Thus for 3 standard deviations (the industry norm) you would have 4060 + 3*98/sqrt(35) = 4060 + 49.7 = 4109.7.  Your choice is wrong because the issue is not the average cost, rather it is the maximum average cost and a maximum average of 4200 means you are going to have a fair percentage of the average product costs in excess of 4200 – how much would depend on the distribution of the cost means relative to some grand mean.

    Question 152: I agree, the answer is B.  Please understand that what I’m about to say is not meant to be condescending.  You need to carefully re-read the question – RTY is a function of …..What?

    Question 184:  Your equations shouldn’t have the sqrt of 360 in them – the problem states the process standard deviation is 2.2 therefore that is not s it is sigma.  The issue is this – for the first equation the Z value is 1 and for the second it is -.681.  If you look at a Z table you will find for Z = 1 the area under the curve coming in from -infinity is .8413. If you subtract this value from 1 you get the estimate for the percentage of the production that are in excess of 8.4 and that is 15.87% and 15.87% of 360 = 57.13 which rounds down to their answer.

    However, there is the issue of the product outside the lower bound. This is determined by your second equation where Z = -.681. The area under the curve coming in from – infinity to -.681 is .2483. This means the total percentage of off spec (above the upper limit and below the lower limit) is .1587 + .2483 = .4070 which means the total off spec is .4070 x 360 = 146.

    Question 209:  The null hypothesis is that the mean is less than or equal to $4200 therefore the alternate hypothesis is that the mean is greater than $4200.  All of the other verbiage in the problem has nothing to do with the statement of the null and alternative hypothesis so I would agree with E.


    Robert Butler

    The question you are asking is one of binary proportions. Since you have a sample probability for failure (p = r/n where r = number of failures and n = total number of trials) in the one environment you also have a measure of the standard deviation of that proportion  = sqrt([p*(1-p)]/n).  You also have a measure of the proportion of failures in a different environment where you took 40 samples and found 0 failures.  You can use this information to compare the two proportions as well as determine the number of samples needed to make statements concerning the odds that the failure rate in the new environment is zero and the degree of certainty that the failure rate is 0.

    Rather than try to provide an adequate summary of the mathematics needed to do this I would recommend you look up the subject of proportion comparison and sample size calculations for estimates of differences in proportions.  I can’t point to anything on the web but I can recommend two books that cover the subject – you should be able to get both of these through inter-library loan.

    1. An Introduction to Medical Statistics – 3rd Edition – Bland

    2. Statistical Methods for Rates and Proportions – 3rd Edition – Fleiss, Levin, Paik


    Robert Butler

    @cseider – …could be… Happy New Year to you too.   :-)


    Robert Butler

    The original focus of this exchange was the issue of the validity of the underpinnings of the T-chart.  Your conjecture is that you have developed a theory and this theory is correct and the, apparently to you, unknown theory behind the T-chart is wrong.  Your approach to “proving” this is to

    1. Ask for a bunch of MBB’s to provide a solution to problem(s) you have posed with the understood assumption that if they don’t exactly reproduce your results then they are wrong.

    2. Demand that the folks at Minitab provide the underlying theory of the T chart and along the way prove you are wrong.

    As I said in an earlier post to this exchange – extraordinary  claims require extraordinary proof and the burden of the proof is on the individual making the claims not on the people against whom the claims are made. I sincerely doubt you will hear anything from Minitab for the simple reason that, to the best of their knowledge, (as well as to the best of the knowledge of anyone who might be using their T-chart program for process control) what they have is just fine.

    It really doesn’t matter what you think about whatever theory you have built nor does it matter that you think you have done whatever it is that you have done correctly.  The issue is this – do you have a “reasonable” amount of actual data where you can conclusively show the following:

    1. Specific instances where the Minitab T-chart control declared a process to be in control only to find that it really wasn’t in control and when your approach was applied to the same data, your approach identified the out-of-control situation.

    2. Specific instances where the Minitab T-chart control declared a process in control, it was found to be in control, and when the data was analyzed with your approach you too found it to be in control.

    3. Instances where your approach declared a process to be out-of-control only to find later it was not out-of-control and, when the data was re-analyzed with the Minitab T-chart, their methods identified the process as in control.

    4. Instances where your approach declared the process to be in control only to find it was out-of-control and when checked with the Minitab approach the out-of-control situation was correctly identified.

    5. …and, once everything is tallied – what kind of results do you have – a meaningful improvement in correct identification of out-of-control situations when dealing with processes needing a T-chart (while simultaneously guarding against increases in false positives) or just some kind of change that, in the long run, is at best, no better than what is currently in use.

    You have made a big point about citing some of Deming’s statements concerning theory.  Although I can’t recall a specific quote, one big point he made was the need for real data before you did anything else.  His book Quality, Productivity, and the Competitive Position is essentially a monument to that point.  What you have presented on this forum is a theory bereft of evidence – just claiming everyone else is wrong because it violates your personal theory is not evidence.  The kind of evidence you need is as listed in the four points above.  If you have that kind of evidence and if what you have appears to be a genuine improvement,  then, as I’ve said before, the proper venue for presentation is to a peer reviewed journal that deals in such matters.

    In this light, your rebuttal concerning “thousands” is without merit.  The point of this thread was that the Minitab T-chart method was wrong and nothing more and it was in this context that you made the claim that “thousands” of MBB’s were wrong.  The “proof” you provided in your most recent thread is based on a claim that the entire Minitab analysis package is wrong. In addition, in an attempt to inflate the “thousands” estimate, you dragged in poor old Montgomery and declared his book has BIG errors.  Statements such as these are not proof – they are just simple gainsaying.

    Now, what to do about the rest of your last post?

    1.  Your rejoinder concerning the Riemann Hypothesis is standard misdirection boilerplate – the fact remains that if your proof had had any merit it would have made the pages of at least one of the journals on mathematics.

    2. How many people have I seen who have admitted a mistake in a publication?  Quite a few – you can find corrections and outright retractions with respect to papers in peer reviewed journals if you take the time to go looking for them.  One good aspect of the peer review process is that the vast majority of mistakes are found, and corrected, before the paper sees publication.

    I happen to know this is true because I do peer reviewing for the statistics sections of papers submitted to one of the scientific journals. I can’t tell you the number of mistakes in statistical analysis I have found in submitted papers.  As part of the review process what I do is point out the mistake(s), provide sufficient citations/instructions concerning the correct method to be used, and request the authors re-run the analysis in the manner indicated.  The length of the written recommendations varies.  The longest I think I’ve ever written ran to almost two full pages of text which were accompanied by attachments in the form of scans of specific pages in some statistical texts.  In that instance the authors re-ran the analysis as requested, were able to adjust the reported findings (most of the significant findings did not need to be changed, a couple needed to be dropped, and, most importantly, they found a few things they didn’t know they had which made the paper even better), addressed the concerns of the other reviewers and their paper was accepted for publication.

    3.  In my career as a statistician, more than 20 of those years were spent as an engineering statistician in industry supporting industrial research, quality improvement, process control, exploratory research, and much more.  I too did not have time to send articles in for peer review and I too made presentations at conferences.  Some of those presentations were picked up by various peer reviewed journals and, after some re-writing for purposes of condensation/clarification saw the light of day as a published paper. It was because of my experience that I asked about yours.  I thought the question was particularly relevant in light of your position concerning your certainty about the correctness of your efforts.

    4. As for reading you archived paper on peer review – no thanks.  I seem to recall when I checked you had uploaded more than one paper on the subject of peer review to that site and, if memory serves me correctly, all of them had titles suggesting the text was going to be nothing more than a running complaint about the process.

    The peer review process is not perfect and can be very stressful and frustrating – I know this from personal experience.  Sometimes the intransigence of the reviewer is enough to drive a person to thoughts of giving up their profession and becoming an itinerant  beachcomber.  However, based on everything I’ve experienced, the process has far more pluses than minuses and, given the inherent restrictions of time/money/effort of the process, I have yet to see anything that would be a marked improvement.




    Robert Butler

    Oh well, in for a penny in for a pound.

    .galetto – great – so now you have completely contradicted your denial of the comments I made in my first post.

    I said the following:


    1. A quick look at the file you have attached strongly suggests you are trying to market your consulting services – this is a violation of the TOS of this site.

    2. If you want to challenge the authors of the books/papers you have cited, the proper approach would be for you to submit your efforts to a peer reviewed journal specializing in process/quality control.


    To which you responded:


    Dear Robert Butler,


    I am looking for solutions provided by the Master Black Belts (professionals of SIX SIGMA).

    I do not want to challenge the authors!!!

    THEY did not provide the solution!!!!!!!!!!!!!!

    I repeat:

    I am looking for solutions provided by the Master Black Belts (professionals of SIX SIGMA).


    …and what followed was a series of posts that demonstrated you were indeed challenging authors and you were not looking for solutions since, by your own admission, you had already solved them and you thought they were wrong and you were right.

    So today you post what amounts to an advertisement for your services/training. True, you don’t specifically encourage the reader to purchase anything but the only reference is to you and your publication and the supposed correctness of your approach.

    A couple of asides:

    1. You say “Since then THOUSANDS of MASTER Black Belts have been unable to solve the cases.”

    a. And you determined this count of “Thousands” how?

    b. Why would the fact that “Thousands” of MBB’s have been unable to solve the cases in the way you think they should solve it matter?

    c. Given you are challenging the the approach of various authors of books and papers what makes you think your version is correct?

    d. I only ask “c” because one can find out on the web your proof of the “so-called Riemann Hypothesis” – your words – submitted to some general non-peer reviewed archive on 5 October 2018 which was incorrect (in a follow up article to the same archive you said it was “a very stupid error”). It would appear, even with this correction the proof was still wrong because, according to Science News for 24 May 2019, the proof of the Riemann Hypothesis (or conjecture) is still in doubt.  The article states “Ken Ono and colleagues” are working on an approach and a summary of their most recent efforts can be found in the Proceedings of the National Academy of Sciences.

    2. It’s nice that Juran praised one of your papers at a symposium – the big question is this – which peer reviewed journal published the work?


    Robert Butler

    What an IMPRESSIVE retort!!!  @fausto.galetto DO YOU appreciate the INEFFABLE TWADDLE of this entire thread (see, I can yell, insult, and boldface type too) ?

    Let’s do a recap:

    1. OP initial post says he wants a solution for two items in an attached document.

    2. I do a quick skim and offer the comment that I think the correct venue for the OP would be a peer reviewed paper since it certainly looked like a challenge to the authors/papers cited.

    3. The OP responds and assures me this is not the case – all he wants is for some generic six sigma master black belts to provide a solution. Given what follows later it is obvious he doesn’t want a solution – just confirmation of his views.

    4. @Darth offers a suggestion and a recommendation to try running Minitab – and in the following OP post @Darth gets slapped down for his suggestions.

    5. Reluctantly, the OP announces he has downloaded Minitab “BECAUSE NOBODY tird to solve the cases”

    6. The OP then lays into Minitab on this forum because “MInitab does not provide the Theory for T Chart….

    BETTER I DID NOT FIND in Minitab the THEORY!!! IF someone knows it PLEASE provide INFORMATION”

    7. @Darth tries again with a Minitab analysis and provides a graph of the results – the OP slaps him down again.

    8. The OP shifts gears – suddenly, somehow, the OP knows “MINITAB makes a WRONG analysis: the Process is OUT OF CONTROL!!!!!!!!!!!!!!!!!! I asked to Minitab the THEORY of T Charts!!!!!

    9. The OP then slams me for stating he hasn’t tried to solve the cases. OK, fine, I’ll assume he did try. I probably should have said the OP hadn’t tried to do any research with respect to finding the theory of T control charts or really understanding anything about them other than insist Minitab drop everything and get back to him with the demanded information pronto! – but this really doesn’t matter because, as one can see in the other posts, the OP KNOWS Minitab is wrong because he has a proof based on THEORY!  What theory he doesn’t say but apparently it doesn’t matter since this THEORY is better than the THEORY of T Charts – even though, by his own admission, he doesn’t know what that theory is.

    Given the bombast I decided to see what I could find.  I couldn’t find a single peer reviewed paper by the OP listed in either Pubmed or Jstor.  Granted there are other venues but these two happen to be open to the public and are reasonably extensive.

    I did find a list of papers the OP has presented/published at various times and I found some vanity press (SPS) – publish on demand books by the OP – of the group I found the title of his book  “The Six Sigma HOAX versus the Golden Integral Quality Approach LEGACY”  interesting because the title, as it appears on the illustrated book cover on Amazon, is written in exactly the same manner as the OP’s postings.  Given the book title I find it odd that the OP would ask anyone on a Six Sigma site for help. After all, with a title like that one would assume the OP views practitioners of Six Sigma as little more than frauds and con artists.

    In summary – the OP believes he has found Minitab to be in error and he wants some generic master black belts to confirm his belief.  I do know the folks at Minitab really know statistics and process control. I also know many of the statisticians at Minitab have numerous papers in the peer-reviewed press on statistics and process control and have made presentations at more technical meetings than I could list.  Under those circumstances, if the OP really thinks he is right and Minitab is wrong then the OP first needs to take the time to find, and thoroughly understand, the theory behind the T chart – this would require actually researching the topic – books, peer reviewed papers, etc.

    In science, extraordinary claims require extraordinary proof and the burden of the proof is on the challenger, not on those being challenged.  If, after doing some extensive research,  the OP is still convinced he is right then the proper venue for a challenge of this kind is publication in a peer reviewed journal that addresses issues of this sort.


    Robert Butler

    To answer your question:

    There are two equations one for alpha and one for beta.

    What you need to define is what constitutes a critical difference between any two populations (max value, min value)

    What you have is the sample size per population (8) and for the conditions you have specified you have -1.645 (for the alpha = .05) and +1.282 ( for beta = .1).

    the alpha equation is  (Yc – Max value)/(2/sqrt(n)) = -1.645

    the beta equation is (Yc – Min value)/(2/sqrt(n)) = 1.282

    subtract the two equations and solve for n.

    Obviously you can turn these equations around – plug in n = 8 and solve for the beta value and look it up on a cumulative standardized normal distribution function table and see what you have.

    A point of order:  As I mentioned earlier ANOVA is robust with respect to non-normality. Question: with 8 samples how did you determine the samples were really non-normal?  If you used one of the many mathematical tests for non-normality – don’t.  Those tests are so sensitive to non-normality that you can get significant non-normality declarations from them when using data that has been generated by a random number generator with an underlying normal distribution.  What you want to do is plot the values on probability paper and look at the plot – use the “fat pencil test” – a visual check to see how well the data points form a straight line.

    If you are interested in seeing just how odd various sized random samples from a normal distribution look I would recommend you get a copy of Daniel and Wood’s book Fitting Equations to Data through inter-library loan and look at the plots they provide on pages 34-43 (in second edition – page numbers may differ for later editions)


    Robert Butler

    your post is confusing.  What I think you are saying is the following:

    You have 13 groups and each group has 8 samples.

    The data is non-normal

    You checked the data using box-plots and didn’t see any outliers

    You chose the Kruskal-Wallis test because you had more than 2 groups to compare.

    You ran this test and found a significant p-value.

    If the above is correct then there are a few things to consider:

    1. You don’t detect outliers using a boxplot. The term outlier with respect to a boxplot is not the same thing as a data point whose location relative to the main distribution is suspect.

    2. ANOVA is robust to non-normality – run ANOVA and see if you get the same thing – namely a significant p-value.  If you do get a significant p-value then the two tests are telling you the same thing.

    The issue you are left with – regardless of the chosen test is this – which group or perhaps groups are significantly different from the rest. In ANOVA you can run a comparison of everything against everything and use the Tukey-Kramer adjustment to correct for multiple comparisons.  In the case of the K-W you would use Dunn’s test for the multiple comparisons to determine the specific group differences.

    If you don’t have access to Dunn’s test you will have to run a sequence of Wilcoxon-Mann-Whitney tests and use a Bonferroni correction for the multiple comparisons.


    Robert Butler

    I agree nobody, including yourself, tried to solve the cases.  As I’m sure others will tell you, the usual procedure on this site is for you to try to solve the problem, post a reasonable summary of your efforts on a thread and then ask for help/suggestions.

    pointed you in a direction and gave you a reference.  Given what he offered your approach should have been to get a copy of the book referenced, learn about G and T charts, and do the work manually. Agreed it would have taken more time than if you had a program handy but by doing things manually (at least once) you would have learned a great deal and also solved your problem.


    Robert Butler

    Based on your post it sounds like you have no idea where your process is or what to expect.  If that is the case then the first thing to do is just run some samples and see what you see – how many to run will be a function of the time/money/effort you are permitted to use for what will amount to a pilot study whose sole purpose will be to let you know where your process is at the moment and, assuming all is well, an initial estimate of the kind of variation you can expect under conditions of ordinary variation.



    Robert Butler


    During the time I provided statistical support to R&D groups my method of support went something like this:

    1. Have the researchers fill me in on what they were trying to do.

    2. Walk me through their initial efforts with respect to quick checks of concepts, ideas – this would usually include the description of the  problem as  well as providing me with their initial data.

    3. Ask them, based on what they had uncovered, what they thought the key variables might be with respect to doing whatever it was they were trying to do.

    4. Take their data, and my notes and look at what they had given me and using graphing and simple regression summaries see if what they had told me was supported in some ways by my analysis of their data.

    5. If everything seemed to be in order I would then recommend building a basic main effects design.  As part of that effort I would ask them for their opinions on combinations of variables they thought would give them what they were looking for.  I would make sure the design space covered the levels of the “sure fire” experiments and, if they were few enough in number (and in vast majority of cases they were) I would include those experiments as additional points in the design matrix.

    6. Once we ran the experiments I would run the analysis with and without the “sure fire” experiments and report out my findings.  Part of the reporting would include using the basic predictive equations I had developed to run “what if” analysis looking for the combination of variables that gave the best trade off with respect to the various desired outcomes (in most cases there were at least 10 outcomes of interest and the odds of hitting the “best” for all of them was very small).  Usually, we would have a meeting with all of the outcome equations in a program and my engineers/scientists would ask for predictions based on a number of independent variable combinations.

    7. If it looked like the desired optimum was inside the design box we would test some of the predictions by running the experimental combinations that had been identified as providing the best trade-off.

    8. If it looked like the desired optimum was outside the design box we would use the predictive equations to identify the directions we should be looking.  In those cases when it looked like what we wanted was outside of where we had looked my engineers/scientists would want to think about the findings and, usually with my involvement, run a small series of one-at-a-time experimental checks.  Often I was able to take these one-at-a-time experiments and combine them with the design effort just to see what happened to the predictive equations.

    9. If they were satisfied that the design really was pointing in the right direction we would usually go back to #5 and try again.

    10.  If the results of the analysis of the DOE didn’t turn up much of anything it was disappointing but, since we had run a design which happened to include the “sure fire” experiments we would discuss what we had done, and if no one had anything to add it meant the engineers/scientists had to go back and re-think things. However, they were going back with the knowledge that the odds were high that something other than what they had included in the design was the key to the problem and that it was their job to try to figure out what it/they were.

    Point #10 isn’t often mentioned but, in my experience it is often a game changer.  The people working on a research problem are good – the company wouldn’t have hired them if they thought otherwise.  Because they are good and because they have had successes in the past the situation will sometimes occur where the research group gets split into factions of the kind we-know-what-we-are-doing-and-you-don’t.  There’s nothing like a near total failure to get everyone back on the same team – especially when all of the “known” “sure-fire” experiments have been run and found wanting.


    Robert Butler

    A quick Google search indicates push, pull, and conwip are all variations on a theme – basically they focus on having work in progress throughout the line (CONWIP = Constant Work in Progress).  The same Google search indicates line balancing is just the practice of dividing the production line tasks into what are called equal portions with the idea that labor idle time is reduced to a minimum. Based on this quick search it would seem the issue is which flavor of production control do you want to emphasize rather than wondering about integrating one with the other because all of them appear to have the same goal – minimum worker idle time and maximum throughput.


    Robert Butler

    All of the cases of which I’m aware of that have run simulations of the results of an experimental design have had actual response measurements from which to extrapolate.  If the simulation is not grounded in fact then all your simulation is going to be is some very expensive science fiction.

    A few questions/observations and an approach to assessing what you have that might be of some value.

    1.       You said you identified the critical factors using a number of tools.  In order for those tools to work you had to have actual measured responses to go along with them.

    a.       What kind of factors – continuous, categorical, nominal? A mix of all three? Or, what?

    b.       Apparently, you have some sort of measure or types of measures you view as correlating with record accuracy. What kind of measurements are these – simple binary – correct/incorrect? Some kind of ordered scale – correct, incorrect but no worries, incorrect but may or may not matter, incorrect and some worries, seriously incorrect? Or is the accuracy measure some kind of percentage or other measure that would meet the traditional definition of a continuous variable?

    2.       You said running an experimental design would entail cost.  All experimental efforts cost money. The question you need to ask is this:  Given that running a design would cost money and given that I have taken the time to generate a reasonable estimate of this cost – what is the trade off?  Specifically, if I were to spend this money, how much could I expect to gain if I identified a way to increase accuracy by X percent?  We are, of course, assuming the accuracy issue is physically/financially meaningful and one where the change in percent would also be physically/financially meaningful.   I make note of this because if you are at some high level of accuracy like 99% then you can only hope for some fraction of a percent change and the question would be what would a tiny fraction of a change in the final 1% translate into with respect dollars saved/gained?

    If you refuse to run even a simple main effects design (you should note that one does not need to interrupt the process in order to do this) then you are left with the happenstance data you gathered and tested using the tools you mentioned in your initial post.

    In this case you could do the following: Take the block of data you gathered and check the matrix of critical factors you have identified (the X’s) for acceptable degrees of independence from one another.  The usual way to do this is to use eigenvalues and their associated condition indices and run a backward elimination on the X’s dropping the X with the highest condition index and then re-running the matrix on the reduced X matrix.  You would continue this process until you have a group of X’s that are “sufficiently” independent of one another within that block of data. An often-used criteria for this measure of sufficiency is having all remaining X variables with a condition index < 10.  If you are interested, the book Regression Diagnostics by Belsley, Kuh, and Welsch has the details.

    What this buys you is the following: you will know which X’s WITHIN the block of data you are using are sufficiently independent of one another.  What you WON’T KNOW and can never know is the confounding of these variables with any process variables not included in the block of data and the confounding of these variables with unknown variables that might have been changing when the data you are using was gathered.  You will also need to remember your reduced list of critical factors will likely fail to include variables you know to be important to your process.  This failure will probably be due to the fact that variables known to be important to a process are being controlled which means the variability they could contribute to the block of data you have gathered is not significant – in short you will have a case of causation with no correlation.

    Keeping in mind these caveats, you can take your existing block of data and build a multivariable model using those critical factors you have identified as being sufficiently independent of one another. Run a backward elimination regression on the outcome measures using this subgroup of factors to generate your reduced model. Test the reduced model in the usual fashion (residual analysis, goodness of fit – and make sure you do this by examining residual plots – Regression Analysis by Example by Chatterjee and Price has the details). Take your reduced model, apply it to your process and see what you get.  Before you attempt to run the process using your equation, you will want to make sure the signs of the coefficients make physical sense and you will need to make sure everyone understands the model is based on happenstance data and failure of the model to actually identify better practices is to be expected.

Viewing 100 posts - 1 through 100 (of 2,532 total)