iSixSigma

Robert Butler

Forum Replies Created

Viewing 100 posts - 101 through 200 (of 2,432 total)
  • Author
    Posts
  • #203154

    Robert Butler
    Participant

    I spent some time checking various reference books looking for something that might help and I’m afraid I don’t have much to offer.

    Two thoughts
    1. We are assuming you are running ANOVA on continuous variables. If you are running categorical variables then linear and interaction are all you can expect to compute.
    2. The level of sophistication of modern statistics packages is such that I doubt you will find anything relating to manual calculations in any of the modern statistical texts for anything other than main effects and interactions.

    I did find a reasonable discussion of the problem and what looks like a guide for generating the necessary equations in Statistical Analysis in Chemistry and the Chemical Industry by Bennett and Franklin – Wiley – 5th printing January 1967 under the chapter sub-heading “Analysis of Variance for Polynomial Regression.” The text and discussion are too long to attempt to summarize on a forum of this type but you should be able to get a copy of this book through inter-library loan.

    If you can’t get this book then you might want to do a search on the web for the same topic – Analysis of Variance for Polynomial Regression – and see if anyone has provided the details.

    0
    #203122

    Robert Butler
    Participant

    Then is it fair to say that it makes sense to characterize the workpiece parameter with high and low values? Which is to say – are lambda, theta, and rho connected or can they be varied independently? If it is the former then you have 4 continuous variables which means you could run a screen check in 8 points with two center points for a total of 10 experiments. If it is the latter then you actually have the situation of a 6 parameter study – lambda, theta, rho, voltage, capacitance, and machine speed. In this case you could also run a screen design in 8 experiments again with two center points for a total of 10 experiments.

    An analysis of the 10 points would allow a check for significance of the factors of interest and pave the way for a design with fewer factors and the inclusion of some possible interaction terms. The analysis of the 10 point designs would also provide an initial predictive equation which could be tested to see if the observed correlations actually described possible causation.

    0
    #203118

    Robert Butler
    Participant

    No, that’s called cheating. We are assuming your upper spec limit is there for a reason. If it isn’t arbitrary then, given your description of events, you are just trying to pretend that some kind of special cause which is driving your process to the upper spec limit (and presumably at some future date will drive your process past that limit) doesn’t matter. The only solution your approach provides is a way to ignore problems with your process for the time being – this is not process improvement.

    0
    #203113

    Robert Butler
    Participant

    What you are asking is this: is there a way to solve my problem before I know what the problem is – and the answer is “no”. The whole point of DMAIC is that of problem identification, investigation, and resolution.

    0
    #203112

    Robert Butler
    Participant

    If this isn’t a homework problem it certainly sounds like one. In either case no one can offer anything because you have not provided enough information. You need to tell us what it is that you are trying to do. For example – confidence interval around the defect size, detection of a shift in defect size due to process change, etc. …and then there is the issue of precision – alpha and beta levels and why you need those at the level you have chosen.

    0
    #203110

    Robert Butler
    Participant

    What kind of a measure is “the workpiece”? My guess would be categorical unless you have some way of quantifying this variable (really small, small, medium, large, very large), (really simple, simple, average, above average, complex) etc. If it is categorical then the smallest design (we are assuming voltage, capacitance, and machining speed can be treated as continuous variables) would be a basic 3 variable saturated design (3 factors in 4 experiments) run for each of the 5 workpiece categories for a total of 20 experiments – add one replicate per group and you would have a total of 25 experiments.

    If, on the other hand, workpiece can be quantified then you could run a 2 level factorial with a couple of replicates at the center point for a total of 18 experiments.

    The thing to bear in mind here is that the philosophy of experimental design is that if something is going to occur it is most likely to occur at the extremes so rather than 5 levels for a continuous variable choose the minimum and maximum settings and use variations of a two level factorial design. Of course, if the one variable is categorical then you would run the smaller two level saturated design at each of the 5 categorical values.

    If you are worried about the possibility of curvature then you would want to include a center point on the factorial and run your replication there. For the case of the saturated designs with the categorical variable this would bring the total number of experiments to 30 and, if all can be treated as continuous then, as previously noted, the total would be 18.

    0
    #203105

    Robert Butler
    Participant

    Practically anything you care to name that has anything to do with any aspect of the way the process functions – is this a homework question or are you actually looking at a process. If you are actually working on a process tell us about it and perhaps I or someone else could offer more specific details.

    0
    #203098

    Robert Butler
    Participant

    Special cause can impact either the mean or the variability or both and changes in operators and equipment are not limited to changes in mean – these variables (along with lots of other things) can impact variation as well.

    0
    #203090

    Robert Butler
    Participant

    Welcome to the club. I wouldn’t waste my time worrying about it.

    0
    #203089

    Robert Butler
    Participant

    What it means is that your data distribution isn’t a normal distribution. Given that your process is going to truncate your output at both the low and the high end this isn’t terribly surprising. From the standpoint of what it means to your customer – nothing. From the standpoint of what it means to you – see Mike Carnell’s post.

    0
    #203081

    Robert Butler
    Participant

    You don’t use the dummy variable designations in the DOE. What you use are the categorical identifiers 1,2,3,4 and, as noted, you cannot fractionate these. What you can do is run a saturated design for each of the categorical levels or, again as noted, if there is some way to rank the categorical identifiers then you would use just the two extreme levels and treat the variable as you would any other two level factor.

    0
    #203073

    Robert Butler
    Participant

    If that is what you want to do then I wouldn’t recommend using either one. If you want to know the relationship between the two then you should look at a regression along with the 95% confidence intervals for individuals. This will give you an estimate of the change in employment status as a function of change in competency.

    Based on what you have said employment would be the response and competency would be the predictor. The other thing you really should do is get a plotting routine that will allow you to jitter the plotted points. This will give you an excellent visual understanding of the relationship between the two. If you don’t jitter the data points you will just have a graph with a bunch of single data points corresponding the the 1-5 combinations of the two rating scales. If you can’t jitter then you should at least be able to annotate the graph so you can indicate the number of data points at each of the 1-5 combinations.

    0
    #203070

    Robert Butler
    Participant

    There are a number of answers to your question. Before trying to offer anything more could you provide more information on what it is that you are trying to do. For example – you say you are trying to look for a correlation between the answers to two questions- does this mean you want to see if one predicts the other, or is it a case of looking for internal consistency, or is it a case of looking for agreement, or something else. Each one of the above objectives will result in a different answer.

    At the moment, my personal guess with respect to what you are trying to do is assess agreement. In that case you would want to look at things like a Kappa statistic and not an assessment of correlation.

    0
    #203067

    Robert Butler
    Participant

    So, what are the variables you think matter and what have you done with respect to examining production data to at least justify your variable choice? If you have a list – put together a screening design – run the design and use the Box-Meyer’s method to analyze the results to identify the big hitters with respect to variation.

    Oh yes – what does “crores” mean?

    0
    #203065

    Robert Butler
    Participant

    There are two schools of thought concerning ordinal variables – one says you can’t use standard statistical techniques to analyze them and the other says you can. To the best of my knowledge neither school has been proved to be either right or wrong.

    As written it sounds like you have 3 or more ordinal responses and you want to test their correlation with some group independent variables. The usual methods of regression analysis will work.

    One thing to bear in mind, if you are running a regression analysis with an ordinal as a dependent your residual plots will have a banded structure to them. This is to be expected and it is not a problem.

    0
    #203060

    Robert Butler
    Participant

    Sure – it’s called dummy variables and it is my understanding that there are commands in Minitab which will allow you to do this without having to write separate code.

    Before going there however, you say you have “a needle position” – is there any way to characterize the position in terms of location such as height relative to a baseline or some such thing? If there is then you can treat the variable as continuous.

    Dummy variables – in your case you have 4 settings therefore you re-code the settings as follows:
    Setting 1: D1 = 0, D2 = 0, D3 = 0
    Setting 2: D1 = 1, D2 = 0, D3 = 0
    Setting 3: D1 = 0, D2 = 1, D3 = 0
    Setting 4: D1 = 0, D2 = 0, D3 = 1

    and you run your regression on whatever variables you have in your list and instead of the variable setting you use D1,D2, and D3. The coefficients for these three variables will be whatever they are and the way you determine the effect of any of the settings is to plug in the combination of D1,D2, and D3 values associated with a given setting.

    If you want more details the book Regression Analysis by Example by Chatterjee and Price has a good discussion of the method.

    0
    #203052

    Robert Butler
    Participant

    Before I go any further please understand I’m not trying to be sarcastic, mean spirited, or condescending. I have two concerns here – first is your effort to finish a master’s thesis (I know how much time and effort one of these things take – I have a couple of Master’s Degrees myself ) and second is to provide you with some suggestions concerning a resolution of the problem you are trying to address should you want to try DMAIC.

    With this in mind I would say the following: based on your description of what has been done to date it is not DMAIC and I think trying to recast what has been done by tacking on some afterthought investigations in order to call what was done DMAIC is a waste of time.

    What you can do with what you have: Basically someone did some kind of back-of-the-envelope assessment (“a rough previous analysis”) and decided that the issue was employee satisfaction. I realize you said “this is probably partly due to differences in employee satisfaction” but the actions you have described indicate you and/or the people you are working for believe this is THE issue.

    Based on what you have said it would appear you have the means to test this conjecture and put it to rest one way or the other. You have prior productivity measures, someone decided the issue was employee satisfaction, to this end a shop floor board was introduced and…what happened? You must have productivity measures after the board was introduced … so is there a significant difference in pre/post shop floor board implementation or not? If there isn’t then I wouldn’t hold out much hope for the idea that employee satisfaction is the driver.

    If you can’t do a proper DMAIC I’d recommend trying the above, doing the minimum additional work as per your adviser’s requirements, write up what you have, get your degree, and press on.

    On the other hand, if you have time to follow DMAIC, I would offer the following:

    1.You said “the process is internalized and presented as SIPOC”. SIPOC is a nice tool for getting people to give some thought to what might go into a process but it is not a substitute for the Define phase. As for “internalized” I don’t know what that means.

    2.You said someone did a “rough previous analysis.” My experience with these kinds of analysis is that they aren’t. Rather they are a reflection of what has been referred to as a B.O.G.G.S.A.T. analysis (Bunch Of Guys & Gals Sitting Around Talking) and all they contain are impressions of WELL-MEANING people who have no intimate day-to-day contact with the process itself.

    To really understand the process you will have to personally walk the line (by yourself, no managers present), take the time to talk to everyone in the working environment (take copious notes while talking), and actually ask them to describe what they are doing as well as ask their opinion concerning issues that might bear on a difference in productivity between the two shifts (this means talking to both first and second shift). There is, of course, more to the Define phase but this is key and it provides a solid foundation for the rest of the Define effort.

    3.You mention “the morning shift is much better than the other shift.” For me, a statement like this screams process not people. Regardless of employee satisfaction, random chance says there should have been occasions when second was better than first. From the process standpoint, after I had walked the floor and employed the rest of the tools used in the Define phase, I would want to think about issues like the following:
    a.Is there a difference in the makeup of the shifts? For example: is the bulk of the first shift comprised of experienced workers and the bulk of the second shift comprised of new hires? Is second shift in any way considered/treated/perceived as a punishment assignment?
    b.Is there a difference in the supply chain between shifts (raw material deliveries are ready and waiting for first shift but second shift has to go out and pull new material from the store room, restock various aspects of the operation , clean up after first shift before proceeding, etc.).
    c.With regard to supply chain – are the raw materials provided to first and second uniform – that is same supplier, same material lots, same location in the factory setting, etc.
    d.Is the break between shifts clean – that is, does the performance of the second shift depend in any way on what was done on the first shift? If so, what is the situation concerning hand-off – and how smooth is it?
    e.Is there any aspect of the production cycle that depends on results from the QC lab? If so, is the lab manned to the same level for both shifts? Is there any difference with respect to response times between first and second shift? Is there any difference in the frequency of sample testing/submission between the two shifts?

    Obviously there are many more issues than what I’ve listed above. What I’ve tried to do is provide you with a quick summary of some of the things I’ve seen in the past that, when discovered, proved to be the real driver between differences in shift performance.

    0
    #203042

    Robert Butler
    Participant

    If your post was, by choice, extremely brief and:
    1. If you have run a survey with meaningful measures
    2. If you have determined production/employee hour is a good measure of the process.
    3. If you have defined the process and really understand how it works
    4. If you have determined the key issues with respect to improving production/hour are all under the control of the employees.

    Then what you have at the moment is a baseline measure. Your next step would be to look at the results of your survey, analyze those findings to identify some aspect of the process to change, make the change, and then repeat your measurements at some later date to see if it made a difference.

    What concerns me is the following:

    1. There is no indication in your post that you have any understanding of the process – no mention of block diagramming, fishbone assessment, quantification of issues concerning quality of quantity of product produced, and no indication that the measure of quantity produced/employee hour is of any value.

    a.If the issue is problems with the process itself then all of the employee motivation in the world isn’t going to change a thing and any correlation you might get between some aspect of your survey and quality change would be due to dumb luck.

    b.Quantity produced/employee hour doesn’t seem to me to be of much use. For example, you could ramp output through the roof but if the defect rate increases proportionally then you haven’t done anything – just increased the rate of defect production. You could easily have an improvement with the same level of output/employee hour with a reduction in scrap/defect rate.

    2. If we assume you have defined the process and know it is strictly an issue of employee motivation as defined by you then how did you build the survey, what did you include in it, and why did you choose the measures you did?

    a.Your quick list of motivation measures was, of course, not all inclusive but I find it interesting it didn’t include a single tangible source of motivation (larger percentage pay increases, improvement in benefits, etc.)

    b.Have the questions been constructed in a way that they will permit the identification of aspects of the employee experience which can actually be changed.

    If you don’t know the process, if you don’t know the correctness of your measure, and if your questions identify things that can’t be changed then you need to start over.

    0
    #203039

    Robert Butler
    Participant

    You will have to provide some more detail because, as written, your questions are not clear.

    For example:
    1. You say confidence level 95 vs 98. – This can be taken several different ways and each of them will result in a different response.
    a. You mean you want to compute the confidence interval around your measurements of process output and you are debating whether or not to compute the 95% or the 98%. If this is what you mean then most of what you have written concerning 95 vs 98 doesn’t make sense. If what you want is the confidence interval for ordinary process variation then the “usual” choice is plus/minus 3 standard deviations which corresponds to a 99% CI. In this case there isn’t much of an issue with respect to sample size.
    b. You mean you want to be able to detect a shift from some proportion relative to either a proportion of 95% or 98%. To do this you are going to have to state the size of the shift you want to detect and the degree of certainty (the power) that a shift has indeed occurred in order to compute a sample size.
    c. You mean something else entirely.

    2. You said “response distribution – while I have recommended 50%….” – what does this mean? In particular – what do you mean by a “response distribution” and how does 50% relate to this?

    3. You said “for low volume processes where the sample size calc throws a low sample” – as written this makes no sense at all.

    If you could expand on some of your questions and provide clarification perhaps I or someone else could offer some suggestions.

    0
    #203027

    Robert Butler
    Participant

    You’r welcome – what did you find out?

    0
    #203012

    Robert Butler
    Participant

    I haven’t run the calculation but if the answer is .21 then it looks like there is a typo in the multiple choice column – perhaps c) is supposed to be .212.

    0
    #203009

    Robert Butler
    Participant

    So what number did you get? Show us your calculations and we can probably show you where you went astray.

    0
    #203005

    Robert Butler
    Participant

    Yes you can. There are no restrictions on the distributions of the X’s or the Y’s in a regression analysis. As for “never” using dollar values in a regression equation – how would you propose to identify an optimum combination of ingredients for, say a plastic compound, such that the dollar cost per pound was minimized other than using the X’s to predict the final dollar cost. In short – perhaps someone has told you this is never done but whomever offered that opinion is wrong.

    0
    #202997

    Robert Butler
    Participant

    Entering the phrase “what is the meaning of p-value” on Google give 392,000,000 hits – a quick check of the first 10 entries looks like they have the answers to your questions.

    0
    #202995

    Robert Butler
    Participant

    There is no requirement for data normality with respect to using control charts. – See Wheeler and Chambers Understanding Statistical Process Control 2nd edition pp. 76 Section 4.4 Myths About Shewhart’s Charts – Myth One: it has been said that the data must be normally distributed before they can be placed upon a control chart.

    For calculation of Pp/Cp values for non-normal data see Measuring Process Capability by Bothe – Chapter 8 Measuring Capability for Non-Normal Variable Data.

    For regression analysis – there are no restrictions on the distributions for either the X’s or the Y’s. Applied Regression Analysis 2nd Edition, Draper and Smith pp. 12-28 – or, if you are in a rush – bottom of pp-22-top of pp-23.

    For t-test’s and ANOVA – both are robust with respect to non-normal data. The Design and Analysis of Industrial Experiments 2nd Edition Owen L. Davies pp. 51-56

    You should be able to get all of these books through inter-library loan.

    0
    #202994

    Robert Butler
    Participant

    It’s nice but it has a number of faults and it places restrictions where none exist.

    1. The decision tree for variable data normal yes/no is wrong. Both the t-test and ANOVA are robust with respect to non-normal data so the roadmap with respect to these test is wrong. The roadmap does correctly identify the fact that Bartlett’s test is very sensitive to non-normality unfortunately it implies that Levene’s test should not be used with normal data – which is wrong. Levene’s test can handle normal or non-normal data and it is for this reason that it is the test used by statisticians – I don’t know of anyone in the field who uses or recommends Bartlett’s test anymore.

    2. The decision tree for X > 1 is just plain wrong. You can use any of the methods on the left side of that diagram when there is only one X. Whether you choose to use logistic/multinomial regression or some kind of MxN chi-square table to address a problem with respect to attribute data will depend on what it is that you want to know.

    3. The warning about unequal variances and “proceed with caution” in the lower left and lower right boxes is nice but it is of no value – in particular it does not tell you what to do or how to check to see if the existence of unequal variances matters.

    0
    #202935

    Robert Butler
    Participant

    As I stated before – the data can be treated as continuous and you can run an analysis on the data using the methods of continuous data analysis.

    The basic calculation for Cpk DOES require data normality which is why there are equivalent Cpk calculations for non-normal, attribute, and other types of data. With your data you will need to look into Cpk calculations for non-normal data – Chapter 8 in Measuring Process Capability by Bothe has the details.

    When testing for mean differences t-tests and ANOVA are robust with respect to non-normality and can be used when the data is extremely non-normal – a good discussion of this issue can be found on pages 51-54 of The Design and Analysis of Industrial Experiments 2nd Edition – Owen Davies.

    Variance issues

    When it comes to testing variance differences the Bartlett’s test is sensitive to non-normality. The usual procedure is to use Levene’s test instead.

    If the VARIANCES are HETEROGENEOUS the t-test has adjustments to allow for this as well. Indeed most of the canned t-test routines in the better statistics packages run an automatic test for this issue and make the adjustments without bothering the investigator with the details.

    Too much HETEROGENEITY in the population variances can cause problems with ANOVA in that if the heterogeneity is too extreme ANOVA will declare a lack of significance in mean differences between populations when one exists. When in doubt over this issue one employs Welch’s test for an examination of mean differences in ANOVA.

    Actually, when in doubt about variance heterogeneity you should do a couple of things, histogram your data by group, compute the respective population variances, run ANOVA using the usual test and Welch’s test and see what you see. If you do this enough you will gain a good visual understanding of just how much heterogeneity is probably going to cause problems. This, in turn, will give you confidence in the results of your calculations.

    Control Charts

    As for control charts – data normality is not an issue. A good discussion of this can be found in Understanding Statistical Process Control 2nd Edition Wheeler and Chambers in Chapter 4 starting on page 76 under the subsection titled “Myths About Shewhart’s Charts.”

    Regression

    While not specifically mentioned in your initial post – data normality is also not an issue when it comes to regression. There are no restrictions on the distributions of the X’s or the Y’s.

    Residuals need to be approximately normal because the tests for regression term significance are based on the t and F tests. But, as noted above – there is quite a bit of latitude with respect to normality approximation. For particulars you should read pages 8-24 of Applied Regression Analysis 2nd Edition by Draper and Smith and Chapter 3 of the same book “The Examination of the Residuals.” For an excellent understanding of the various facets of regression I would also recommend reading Regression Analysis by Example by Chatterjee and Price.

    I would recommend you borrow the books I have listed (the inter-library loan system is your friend) and read the sections I’ve referenced.

    0
    #202928

    Robert Butler
    Participant

    You can treat your data as continuous.

    “Variable are classified as continuous or discrete, according to the number of values they can take. Actual measurements of all variables occurs in a discrete manner, due to precision limitations in measuring instruments. The continuous-discrete classification, in practice, distinguishes between variables that take lots of values and variables that take few values. For instance, statisticians often treat discrete interval variables having a large number of values (such as test scores) as continuous, using them in methods for continuous responses.”
    – From Categorical Data Analysis 2nd Edition – Agresti pp.3

    As for running t-tests – they are robust with respect to non-normality. If you are worried about employing a t-test – run the analysis two ways – Wilcoxon-Mann-Whitney and t-test and see what you get. The chances are very good that you will get the same thing – not the exact same p-values but the same indication with respect to significance.

    0
    #202908

    Robert Butler
    Participant

    There are ways to test the significance for a percent change the problem is that before anyone can offer anything you still need to tell us what you have done and how you want to test the percent change.

    1. You said you have a 3rd column with a percent change – how did you do this computation? For example:
    a. did you just put male and female columns side-by-side, take a difference and compute a percent change using one of the two columns as a reference?
    1. If you did a) then what was your criteria for pairing the two columns? If you just randomly smashed the two columns together and took a difference then the percentage changes are of little value.
    Or did you use some other method. Regardless, we need to know the computational method you used and what is being compared to what.

    2. You said you want to know if this percentage change is significant. Based on your earlier posts it appears that you do not want to test to see if it is different from 0 therefore, what is the target? That is, what target percent change do you want to use for comparison?

    Example: you have a precious metal recovery system that recovers 99.986% of all of the metal from scrap. You have a new method which you believe will recover 99.990% and you have a target for significant change of .002% – under these circumstances if the new process delivers as expected the percent change will be significant.

    0
    #202904

    Robert Butler
    Participant

    @jazzchuck, I could be wrong but I think the OP is after what I commented on in my first post. He/she has, in some manner, computed a percent change and is looking for significance with respect to that difference where the comparison of the generated percentage change is with something external to the data.

    The issue isn’t that of difference between the groups – as you and others have noted – that information is already part of the t-test. As written the OP does not seem to have anything that would justifying pairing individual measures so a paired comparison would be of no value. In any event the question usually addressed by pairing – is the mean of the differences significantly different from 0 – does not appear to be of interest to the OP.

    I think if the OP provides a description of how they generated the estimate of percent change someone might be able to offer some insight with respect to answering the original question.

    0
    #202892

    Robert Butler
    Participant

    Years ago a company I worked for was getting beat up with respect to OSHA recordable accidents. The company had tried a number of things over the years. Among the ongoing efforts was an annual shoe fitting for safety shoes for everyone regardless of their position. These shoes were expensive and the company not only paid for the shoes but for the time lost to mandatory shoe fitting and inspection.

    The company had a monthly plant wide meeting to discuss company issues. At the meeting where the latest OSHA statistics were reported the presentation consisted of the usual, here’s what we did last year and here is what we did this year. During the question and answer session I noted that all that had been presented was the data for just 2 years and, as the statistician, I offered to look at all of the OSHA data to see if I could perhaps find some trends that might help us better understand the accident situation.

    After the meeting, one of the safety engineers gave me all of the OSHA data as well as all of the non-OSHA accident data for the last 20 years. I ran an analysis and discovered the following:
    1. The nature of the physical work done at the plant had not changed substantially over that 20 year time frame.
    2. In 20 years there had not been a single OSHA or non-OSHA accident related to legs or feet (and for the first 10 of those years the company did not have a mandatory safety shoe policy in place).
    3. During the entire 20 year span the primary OSHA accident consisted of 2nd degree burns to the fingers, hands or lower arms.
    4. During the entire 20 year span the next largest OSHA accident consisted of cuts requiring one or more stitches. Those cuts occurred on either the fingers or the hands and were due overwhelmingly to improper handling of glass tubing.
    5. The overall 20 year trend in OSHA accidents had been down, steeply down, and the “big” difference between the most recent year and the prior year amounted to nothing more than ordinary process variation.

    When I presented my findings several things happened.
    1. The safety shoe program ended.
    2. The entire plant knew what the two biggest issues with respect to OSHA safety were and had been for the last 20 years.
    3. Additional focus/emphasis was placed on wearing proper gloves and lower arm protection when working with anything from the heating ovens.
    3. The company purchased puncture resistant gloves for glass tubing and everyone involved in handling glass tubing received additional safety training.

    The end results:
    1. OSHA recordables dropped to 0 for several years running.
    2. As had been the case for the prior 20 years no one had a foot or lower leg injury.
    3. The cost of the shoe program was eliminated.
    4. The time involved in fitting and safety shoe inspection was used for other purposes.

    0
    #202891

    Robert Butler
    Participant

    Since you only have a single percentage change the only way to assess significance is to have something to use as a yardstick. So the question becomes significant compared to what?

    0
    #202882

    Robert Butler
    Participant

    I’m afraid I don’t have an answer to that question. I’ve never done either. My GUESS would be it wouldn’t matter. The problem is how are you going to present your Cpk in terms of your original distribution. If you run a capability analysis on transformed data the chances are very good you will not be able to back transform to actual units. What this means is all of your reporting will be in transformed units and I do know from personal experience this does not work very well. People don’t understand that you have transformed the data, they don’t understand the Cpk is in terms of the transformed data, and they don’t understand you can’t apply the Cpk to the actual system measurements.

    One of the other problems I have with box-cox and Johnson transforms (and that is all of the box-cox and Johnson transforms) is the issue of the change of transform parameters over time. As you gather more data there is a good chance your transform parameters will change which means you have to go back, recalculate the new parameters and then adjust all of your reporting from the past to make it match with the present.

    Of course, you can have a process where the addition of more data does not have a major impact on the parameters but if you don’t then you may find yourself up a well known tributary without sufficient means of propulsion.

    0
    #202880

    Robert Butler
    Participant

    Rather than try to force you data to look more or less normal and spend a lot of time trying to transform and back transform why not just calculate the equivalent Cpk for non-normal data? The book Measuring Process Capability by Bothe has all of the details on how to do this in Chapter 8 whose title is “Measuring Capability for Non-Normal Variable Data”. You should be able to get this book through inter-library loan.

    0
    #202866

    Robert Butler
    Participant

    Not really. One of the key assumptions of the Pearson Correlation Coefficient is normality (or approximate normality) of the variables – binary data is not normal.

    If it was just a case of looking at class attendance in 2 vs class attendance in 3 you could run a 2×2 chi-square analysis which would give you a measure of association. However, given your problem description I still think logistic regression would be the better bet. It would tell you the odds of attending class 3 given attendance (or lack thereof) in class 2 and it would also tell you whether or not the odds ratio was significant.

    0
    #202864

    Robert Butler
    Participant

    I’m not really sure what you mean by correlation. Since you are on-line – post the data and I’ll take a look at it – just 3 columns, one for each class yes/no (1/0) and an indication as to which class is which.

    0
    #202862

    Robert Butler
    Participant

    Based on your description of the problem I think your best bet would be to run a logistic regression with attendance at class 3 being the Y variable and attendance at classes 1 and 2 being the two X variables.

    The output would be in the form of odds ratios – that is you would have an odds ratio for attendance at class 1 correlating with attendance at class 3 and similarly for class 2. One problem you are going to have with this approach is the question of independence of attendance at class 1 relative to attendance at class 2. If the two are not independent enough you will get huge odds ratios with very large Wald confidence limits and the analysis won’t be worth much.

    One thing you need to remember with binary data – to see significance you need a lot more data than you would need for an analysis of continuous data so even if the measures exhibit sufficient independence you may not have enough data to detect a significant difference.

    0
    #202857

    Robert Butler
    Participant

    Your data says you have a defect rate of .05% If you want to be 95% confident that this is the actual defect rate then you will need 22,600 samples. Your defect rate is so small that one would wonder why it is of any concern.

    As for detecting a change. At this percentage level any change you could possibly see would be very small and the sample size needed to check the before and after proportions would be huge. For example, if you are looking to see a 10% decrease in this defect rate you would need a sample of 5.9 million to be certain with an alpha of .05 and a power of 80%.

    0
    #202854

    Robert Butler
    Participant

    I don’t follow the reasoning of your post. A design is just that – a design. You say you have 3 factors and you have identified the 5 levels of those factors corresponding to the five settings in a composite design. So, what is left is the matter of running the design, taking the values for each of the responses, using methods of backward elimination and stepwise regression to identify the reduced regression model and then use those models to identify the combinations of water, nitrogen, and phosphorous that will provide the best trade-off in terms of optimum yield, height, and biomass.

    From the standpoint of actually running the design there are a number of issues you should think about.
    1. What drove your choice of a central composite? Do you have any kind of prior information that would indicate curvilinear behavior in all three factors and possible significance of most of the two way interactions?
    2. You will have to grow more than one plant per experimental combination and, since plants grown in the same plot of land will be more alike than plants grown in separate plots of land you will need to run the same treatments on more than on plot of land.
    3. Yield and biomass will (I assume) be based on a whole plot result but what are you going to do about plant height?
    4. Given the labor involved in multiple plots of land I would first recommend a simple saturated design – 3 variables in 4 experimental plots with, in this case, a full replication of the design to get some sense of within-plot variation. This would come to a total of 8 plots of land, it would give you a sense of the main effects of the variables of interest and it may be all that you need. If it doesn’t then you can augment what you have to flesh out a full central composite.

    0
    #202825

    Robert Butler
    Participant

    Ieeeeee….don’t do that. If you try to optimize them separately you will spend a lot of time wringing your forehead for no reason and get basically nowhere. What you need to do is take both equations and optimize them together. In other words build a table with all of the X variables and the two Y responses and look at how the two Y’s change SIMULTANEOUSLY as you make a change in a specific X.

    If you first build your equations in the form I mentioned in my first post so the coefficients are for normalized (that’s the old term for scaling between -1 and 1) X’s then you should be able to look at the two equations together and almost eyeball which X is going to be the driver for the need for trade-offs.

    0
    #202818

    Robert Butler
    Participant

    I don’t know that much about Minitab but, assuming it isn’t a typo, your statement “I can’t find a way to get the model to fit 2 response variables” would indicate you need to back up and start again.

    The way one addresses response variables is one at a time. Therefore, if you have two response variables you construct two models, one for knit line strength and one for porosity and you take these two models use them for purposes of prediction.

    Since you only have two models I would recommend you build the models based, not on the actual levels of the independent variables, but on their scaled values where each independent variable has been separately scaled from -1 to 1 corresponding to the minimum and maximum values of range that variable took in the design**.

    If you do this you can look at the equations for the two responses and directly compare the term coefficients. This allows you to see not only which variables are having the most impact on the responses separately but also where the conflict with respect to optimization is occurring.

    In your case it sounds like even if you do this you will not be able to optimize both of the responses so what you will need to do is work with the models to identify the best trade-off settings where neither one is at an absolute optimum but both are acceptable.

    The need to accept trade-offs with respect to absolute optimums is common and the more product characteristics you want to optimize the more likely you are to have to look for independent variable settings that result in acceptable trade-offs in product properties.

    ** the way you do this scaling is to take the minimum and maximum value for each independent variable and compute the following

    A = (max + min)/2
    B = (max – min)/2

    Scaled X = (X-A)/B

    0
    #202816

    Robert Butler
    Participant

    If we are assuming the lead times you have provided are averages then the quickest way to answer both of the questions would be for you to run a two sample t-test. You would need to have the sample size for both of the lead time calculations and the associated standard deviations. The t-test would answer the question concerning statistically significant differences in the two lead times and the null the t-test is examining is that there is no difference between before and after lead times.

    There is a caution here – if they really mean for you to test the equivalence of the before and after lead times (Lead time before = Lead time after) then you will have to run an equivalence test which is a Two One Sided t-test (TOST) and you will have to do some reading to make sure you are running the analysis correctly

    Given what you are doing I would recommend also providing at least a histogram with both the before and after data overlaid on the same graph. This will help everyone visualize what you have done and what you are comparing. There are any number of graphical packages that will do this for you.

    0
    #202815

    Robert Butler
    Participant

    Running a DOE to compare suppliers who happen to be competitors is done all the time. I’m not sure I understand what you mean by “don’t have any control over the process” If you don’t control anything then what exactly are you proposing to investigate? As for telling your suppliers you are running a DOE with their product – why would you bother to do this?

    0
    #202809

    Robert Butler
    Participant

    So, all your process consists of is taking a part (or I guess if they are small enough) some quantity of parts from one location in a warehouse and delivering them to some predetermined spot and then turning around and going back to some other location in the warehouse and repeating the same process. Based on what I’ve read it sounds like you are looking at personnel performance in an Amazon type setting.

    1. Is the delivery spot the same for all of the pickers?
    2. Since all of the parts are not equidistant from the delivery spot how are you taking into account variation in travel time?
    3. If it is an Amazon type setting the parts are stacked on racks that reach from floor to a very high ceiling – how are you taking into account the time needed to pull something from the first tier vs the time needed to pull something from the top tier?
    4. Is part weight, bulkiness, size an issue? If it is then how are you taking into account this impact on an individuals ability to transport the part from the rack to the delivery point?
    5. Is the process actually one of get a single order (part) and immediately deliver that single order to the processing location or is it a case of having an assigned number of parts and delivering the parts only after the all of the parts in a given order have been located? If it is the latter then how are you taking into account all of the issues listed above?

    Since this is a time-motion study I would recommend, as a first pass, that you simply chart part acquisition to part delivery time for each of the pickers across your chosen time period and look at the patterns.

    Based on my understanding there does not appear to be any natural grouping so the issue is reduced to running IMR charts on each of the pickers and asking if there is any significant difference in means, medians, and standard deviations of elapsed times. This question will probably have to be addressed to more than the overall time plot. For example – burn in time for pickers – I would assume a novice will have longer delivery times for some period after hiring.

    If you do find significant differences you will need to go back to the time plots and start asking questions such as those outlined above in order to make sure you are comparing like to like.

    0
    #202807

    Robert Butler
    Participant

    Based on a quick Google search it looks like Zone A corresponds to the 2 standard deviation limits on either side of the mean – what percentage of a population could one expect to be encompassed by +- 2 standard deviations around the mean?

    0
    #202806

    Robert Butler
    Participant

    1. What, precisely is a pick and pack time process? – I am not asking this.

    Yes, I know, the problem is I am asking this since neither you nor the original poster have given anyone any idea of what your are doing and the statement “from the time the picking is done to the next location” is bereft of information. So, again, what is this process and how does it work?

    Given an operational definition of your process someone may be able to look at what you have provided and offer some thoughts. One thing – your comment concerning data normality under the statement about control charts is bit of a puzzle – the two are not connected.

    0
    #202804

    Robert Butler
    Participant

    For starters how about expanding on what it is that you and (apparently Jignesh) are doing?
    1. What, precisely is a pick and pack time process?
    2. What kind of data do you have?
    a. just some counts of picking something from somewhere (pieces of fruit off of a tree, picking bits of litter along a street, etc.)
    b. just some estimate of time to pick a piece of fruit or is someone standing around timing people.
    3. Besides counts of something picked and some measure of time to do this what else are you recording? Time of day, days of the week, shift changes, etc.
    4. Your post gives the impression that you have a bunch of data and you want to go straight to control charting – if this is the case then you have a LOT of work to do before you start that.

    If you will provide a detailed description of the process and what it is that you want to do you might get a response as opposed to the lack of responses to the original poster.

    0
    #202762

    Robert Butler
    Participant

    Fair enough – let’s go with Ms. Andrews and start at the beginning.

    *Key points*

    1.The bald statement: “You only need a sample size of X to do certain tests”, without any reference to the context of the problem (that is whatever it is that you are trying to do) is rubbish.

    2.The actual sample size needed for any given effort will be driven by things such as the time needed to obtain a given sample, the cost of that sample, the effort needed to get the sample, the size of the effect you want to observe with a given sample size, the degree of certainty you wish to have with respect to any claims you might want to make concerning the observed effect size, etc.

    3.What constitutes an effect size will depend on the question you are asking. For example:

    a.If the focus is on some continuous population measure and you are interested in how the two populations may differ with respect to that measure then the effect size will often be expressed in terms of the minimum difference in the measurement mean values between the two populations that result in a significant difference with some degree of certainty.
    b.If the focus is on a difference of proportions of the occurrence of something such as a defect count (yes/no) then the effect size will often be expressed as the minimum difference in percentage of occurrence that result in a significant difference with some degree of certainty.
    c.If the focus is on determining the confidence bounds of a measurement of a mean or of a percentage then the effect size will be the degree of confidence associated with that measure and the degree of confidence you wish to have with respect to your assessment of the confidence bounds around that target mean or percentage.

    **The claim that 30 samples is sufficient**

    Let’s turn to your particular situation and see how this holds up.

    In your situation you have the following:

    1.You have millions of bills which are processed annually.
    2.You have no idea of the proportion of these bills which are incorrect.
    3.Your guestimates range from 100 to 10,000 incorrect billings (yes, I know, it could be more it could be less but we need to start somewhere). In other words you believe the proportions of incorrect billings are very small.

    For purposes of estimation let’s assume you process exactly 1,000,000 bills annually and the error rates range from 100 to 10,000 as previously stated. This would mean your guestimate of defect rate is between .01% and 1%.

    If we take a random sample size of 30 bills, the smallest non-zero defect we can detect would be a single error. This translates into 1/30 = .033 for a defect rate of 3.3%. This, in turn, means the guestimated defect rates (.01% and 1%) range from over 300 to 3.3 times SMALLER than the smallest possible non-zero defect rate you could detect with 30 samples – in other words – a sample size of 30 will provide 0 information about whether your guestimates are correct or incorrect.

    What the above illustrates is there are many situations where a sample size of 30 is grossly insufficient. In my earlier posts I illustrated situations where a sample size of 30 was gross overkill. Taken together they illustrate that problem context is everything. Without context claims concerning either the necessity or sufficiency of sample sizes of 30, or any other number for that matter, are of no value.

    As to where the 30 sample size estimate came from, my answer as a statistician is I don’t know and I don’t care – given the work I do and have done for many years it is an estimate of no value.

    **What you have done with the online sample size calculator**

    I found a couple of sample size calculators which gave me the same number you generated – 384/385.

    The one I used can be found here:

    http://epitools.ausvet.com.au/content.php?page=1Proportion&Proportion=0.5&Precision=0.05&Conf=0.95&Population=100000

    What you have done is the following:
    You are assuming you have an error rate of 50%. You told the machine you wanted to be 95% certain that this estimate is good to within +- 5%. As you can see this particular online site only allows a maximum population size of 100,000 and, as you can also see, the sample size estimate is in agreement with what you stated earlier.

    If you want a sample size estimate to test to see if your error rate is .01% and if you want that to be precise within 5% then the numbers you would enter in the above online calculator would be an estimated true proportion of .0001 and a desired precision of .000005 (5% of .0001) For these values the sample size would be 99354 – basically your entire population of 100,000.

    For the case of 1% the values would be .01 and .0005 respectively and the sample size for a population of 100,000 would be 60,337.

    The fact of large sample sizes for small defect estimates with a high degree of precision is to be expected and it was numbers of this magnitude I was getting when I tried to generate sample sizes based on my understanding of your earlier posts.

    I know very little about billing procedures and billing record keeping but I find it hard to believe that someone, somewhere in your organization doesn’t keep track of incorrect billing as well as number of bills processed. If not the exact numbers then possibly other numbers that could stand in as a surrogate for defects and totals and could be used to generate an estimate of proportion defective.

    Regardless of what you might be able to use for generating an estimate of percent defective just calculating that estimate would give you a sense of the magnitude of the problem. Depending on what you found I would think the next question one should ask is this: Given our crude estimate of an error rate – what is the cost of that error rate to our business and is there any benefit with respect to dollars to the bottom line to try to further reduce the error rate?

    If it turns out your error rate is very low and reducing it further would really be of benefit then you will have to explain the sample size situation to everyone and, in that case, I hope what I have provided will be of some value.

    1
    #202754

    Robert Butler
    Participant

    Yes, your process guarantees the data will be non-normal because you have a physical lower bound to the measurement you are taking and you are doing your best to have all of your product right at the lower bound (the lower bound being 0 shrinkage – you can’t have negative shrinkage) and shrinkage measures are naturally non-normal in distribution.

    Rather than get in a shouting match with your stakeholder see if you can borrow a copy of Measuring Process Capability by Bothe and make a copy of the first page of Chapter 8 – Measuring Capability for Non-Normal data.

    On the first page of Chapter 8 Bothe states “…there are many manufacturing processes that typically generate non-normally distributed outputs, even when in control. Below is a list of twenty such characteristics:
    1. Taper
    2. Flatness
    3. Surface Finish
    4. Concentricity
    5. Eccentricity
    6. Perpendicularity
    7. Angularity
    8. Roundness
    9. Warpage
    10. Straightness
    11. Squareness
    12. Weld or Bond Strength
    13. Tensile Strength
    14. Casting Hardness
    15. Particle Contamination
    16. Hole Location
    17. Shrinkage
    18. Dynamic Imbalance
    19.Insertion Depth
    20. Parallelism”

    …I’m sure your stakeholder will find #17 to be of interest. :-)

    As for non-normality indicating an out of control process – sorry no. The reverse is also true for a normal distribution – by itself a distribution tells you nothing about a process being in or out of control.

    If you need a quick proof of this for your stakeholder with respect to the normal distribution indicating an in control process get a random number generator based on a normal distribution and generate 100 numbers. Order the numbers from lowest to highest and assign the identifer number 1 to the first 25, 2 to the second 25, 3 to the third 25, and 4 to the last 25. Now run a one way ANOVA on the 4 groups to test for significant differences in means – you will find significant differences which means the distribution of perfectly normal numbers is hiding the results of an unstable process.

    0
    #202751

    Robert Butler
    Participant

    To add to what @Daniel.S, @KatieBarry, @Mike-Carnell, and @Cseider have said. The posts you have made to this forum present you in the light of someone who expects everyone else to do all of the work for them. Your posts also give the impression that you have neither made any attempt to investigate anything on your own using things like Google search nor do you have any interest in doing so.

    To add to what @Daniel.S has said – just go over to the thread started by @chebetz look at the title of that post – Why is Sample Size N>=30 Sufficient – and look at how Tom phrased his question. It is obvious he has done some work, given the issue some thought, presented his understanding of the issue and THEN asked for some help. You will also notice that help was provided.

    0
    #202750

    Robert Butler
    Participant

    Why do you need a specific name for your distribution? You are dealing with data that is guaranteed to be non-normal and there are all kinds of distributions for which no name exists. Take your 20 samples, plot them on normal probability paper, as a box plot, and as a histogram, determine the mean and the median and present your graphical and summary statistics findings and press on.

    0
    #202747

    Robert Butler
    Participant

    The problem is the question you are asking in your second post is not the same question you asked in your first post.

    In your first post you essentially asked: how many samples do I need in order to make inferences about a population? The answer, as I stated is a minimum of 2.

    Example 1: The view from the perspective of Means and Standard Deviations. I have two populations. From the first population my measurements for a particular property are 2 and 3. From the second population my measurements for the same property are 8 and 11.
    Population #1: Mean = 2.5, Standard Deviation = .71.
    Population #2: Mean = 9.5, Standard Deviation = 2.1
    the t-value for the means test is -4.3 and the associated p-value = .047. If a cutoff of p<.05 had been my choice prior to gathering the samples and running the test then I could say the difference in the population means was significant.

    Example 2: The view from the perspective of percentages of defects: Again I have two populations and for each of the two populations I take two samples and determine if the samples are defective or not. For population #1 I have 2 successes and 0 defects. For population 2 I have 0 successes and 2 defects. If I test these proportions using Fisher’s exact test I get the Fisher statistic of .1667 with a p-value = .33. I have a measure of the proportions defective for the two populations and, with the sample size I have I cannot say there is any difference in these proportions. (In case you are wondering you would need a minimum of 4 samples per population with a (4,0), (0,4) split in successes and failures to get statistical significance (P = .03).

    The question you are asking in your second post is not about drawing inferences about a population rather it is about drawing inferences about the detectable difference between measures of two populations predicated on characteristics of those populations.

    That question takes you into the realm of my second post to this thread. Now the question is: What kind of a sample, comprised of successes and failures, must I draw in order to have a specified degree of probability that the sample represents a given level of failure that differs from a null proportion by some amount.

    So the short answer is that the approach you have taken is correct and in that case you will need a sample of 385.

    There is one problem and that is I can’t duplicate your calculation. My guess is I’m not understanding what you are doing.

    The way I read what you have told your program is that you have a null proportion defective of .5 and you want to find the number of samples needed to detect a shift of .05 with an alpha of .05 and a power of .8. If I run that combination I get a sample size (I’m using SAS) of 636 for a one sided test and 803 for a two sided test.

    I’d like to try to duplicate your results so could you re-state what you are doing – in particular why are you saying you are assuming BOTH null proportions are .5 and what does the acronym MOE mean?

    0
    #202743

    Robert Butler
    Participant

    To answer the second part of your question about sample size calculators. As I said previously, the minimum number of data points needed to make statements about a population is 2.

    When you add in the question of sample size and sample size calculators you are now asking about the number of samples needed to make statements about population differences with respect to a degree of certainty that

    1. You have seen a statistically significant difference
    2. If you were to go back and run the experiment again you could observe that same statistically significant difference.

    The statistic associated with #1 is alpha, commonly called the p-value and the statistic associated with #2 is beta. The value (1-beta) is called the power and it expresses the probability that you will be able to achieve a given alpha should you repeat the same experiment with the same number of samples.

    Sample size calculators allow you to specify the level of certainty you would like to have that a difference exists (alpha), the degree to which you would like to be certain that you could repeat this finding (1-beta) and it requires that you define the size of a difference you want to detect given your prior definition of alpha and beta.

    The two most common ways of defining a difference is by defining population means and standard deviations or changes in observed percents.

    If the sample size calculator is any good it will also allow you to determine your power (1-beta) given that you have already run an experiment (thus fixing the sample size and the measures of either means/spreads or percentage changes) and have found a statistically significant difference (alpha at some level of significance which you previously chose to accept).

    The usual procedure, particularly with respect to initial assessments of a problem, is to take however many samples time/money/effort will permit run the statistical test needed to answer the question concerning the existence of a statistically significant difference (the level of alpha) and then run a post-hoc power test (using a sample size calculator) to determine the odds of repeating the find.

    There are caveats and yeah-but’s too numerous to mention with respect to the above paragraph but two that are worth remembering are:

    1. If you have something that is statistically significant you, as the investigator, must make the judgement call as to whether that difference has any real physical meaning/value. Given the degree of precision of many measurement instruments it is quite easy to get statistically significant results of no consequence. In this situation what your analysis has told you is your measurement system is really good at detecting differences.

    2. If you have something that is statistically not significant but the difference is such that, if it were true, it would have physical meaning/value then running a post-hoc power analysis will tell you the number of samples needed to demonstrate statistical significance with adequate power.

    This situation often arises in medicine due to sample size restrictions driven by time/money/effort. I’ve run numerous studies where the trend was in a physically meaningful direction but the alpha level was not less than the pre-determined cut point and the power was low. In these cases the existing data will be presented along with a request for increased funding to check out the possibility of the existence of something of medical value.

    0
    #202741

    Robert Butler
    Participant

    What you have been told is boilerplate to make sure you won’t go wrong with great assurance given that you have little or no understanding of statistical analysis. Unfortunately, the boilerplate is overly restrictive. It also presumes the cost per sample is very low, the effort needed to acquire the sample is minimal and there really isn’t any issue with respect to time needed to get the sample(s).

    The fact remains the minimum number of samples needed to make inferences about a population is 2. With 2 samples you have an estimate of the mean and the standard deviation and you can use those results to test for differences between your sample mean/variation and another sample mean/variation or a target mean/variation.

    If indeed you have a claim which states a sample of 30 or greater MUST be gathered before you can assess your population then that claim is false. The proof is very simple – go to the back of any basic statistics text and look at the t-table – the minimum sample size is 2. The whole point of Gossett’s 1908 effort with respect to the development of the t-distribution and the t-test was to permit accurate assessments of population parameters and differences between populations with as few samples a possible.

    0
    #202738

    Robert Butler
    Participant

    After hunting through Google it looks like you are saying that you have signals – specifically you are using the control chart rules of 9 on one side of the mean/target and 1 outside of 3 standard deviations and that you are worried about out of tolerance (OOT) results which, because they are OOT do not conform to Good Manufacturing Practice (GMP).

    Item 1 – The usual practice is that one point outside of 3 standard deviations indicates your process is out of control and out of tolerance. The other rules – 2 out of 3 outside of 2 SD, 4 out of 5 outside of 1 SD, 8/9 on one side of the target/mean are usually treated as indicators that the variation observed in your process is no longer random. They are telling you to start looking for special cause variation. The measures corresponding to these data points are still representing product that is within tolerance.

    Item 2 – The one thing you didn’t tell us is the frequency of the sampling. A key assumption of control charts is that the individual data points are independent of one another. If you haven’t checked this independence assumption then, based on what you have described, there is a good chance your data is auto-correlated. What this means is you will have control limits that are far too narrow and, as a result, you will get numerous false signals concerning process shift or actual loss of control.

    0
    #202735

    Robert Butler
    Participant

    So you have 5 different inputs in 27 different experimental combinations, and a single output. What kind of combinations? Is this an experimental design or is it just running 27 WGP’s (Wonder Guess Putter)?

    If it is a design then what terms does the design allow you to investigate?
    If it is a design then backward elimination linear regression on the terms allowed by the design should get you where you want to go.
    If it is a WGP then you will have to determine what terms you can actually include in the model and have them independent of one another before running a regression analysis.
    If you have the design matrix (the list of the 27 experiments and the levels of the 5 inputs – no labels or units are necessary) post it on this forum and I’ll take a look at it.

    0
    #202671

    Robert Butler
    Participant

    As @daniel.S said – a good first step is data plotting and the methods he suggested are appropriate. The big issue would be to check for possible extreme points that might impact any additional analysis and to check for co-linearity (confounding) of the qualitative variables.

    Confounding of variables – for all categorical variables a simple first look would be a series of matricies with one categorical variable against the other. What you would look for is the presence of combinations in each of the matrix cells.

    After proper coding of the categorical variables via dummy/indicator variables (see below) you would also want to run VIF and condition indices for multicolinearity checks. Most statistics packages do not have the ability to run condition indices but almost all of the good ones (Minitab, Statistica, etc.) have VIF capability.

    As for prediction the easiest thing to do for categorical variables would be a multiple regression using the categorical variables as predictors. To do this you will need to build a series of dummy variables in order to use the categorical variables in a regression analysis (note: if your categorical variable has only two levels then a simple coding of either 0,1 or -1,1 for the two levels is sufficient).

    The basic coding structure is as follows:

    Let’s take shift as the variable of interest.
    For shift A we will define dv1 = 0 and dv2 = 0.
    For shift B we will define dv1 = 1 and dv2 = 0.
    For shift C we will define dv1 = 0 and dv2 = 1.

    Therefore the regression equation form for cycle time would be:

    Cycle time = b0 +b1*dv1 +b2*dv2.

    Effect of Shift A : Cycle time = b0 +b1*(0) +b2*(0) = b0
    Effect of Shift B : Cycle time = bo +b1*(1) +b2*(0) = b0 +b1
    Effect of Shift C : Cycle time = b0 + b1*(0) = b2*(1) = b0 +b2

    You should do some reading on this subject before trying to run your analysis. A good reference on this subject is Chatterjee and Price’s book Regression Analysis by Example. You should be able to get this book through inter-library loan. The Chapter of interest is “Qualitative Variable Analysis”.

    0
    #202662

    Robert Butler
    Participant

    A number of posters took some time to answer this same question on your earlier thread and you were given some good advice which, based on this post, you chose to ignore.

    In particular @cseider asked you about the results of your graphical analysis. Given this post it is evident that you did not run a meaningful graphical analysis of your data. If you had done so you would have learned a great deal.

    Some issues concerning your data:

    1. Do you believe your sample data to be representative of the process?
    a. If the answer is no then running simulations based on the data will be nothing more than an exercise in fertility.
    b. If the answer is yes then the big question is what could have possibly driven you to decide to try to generate 1000 sample simulations based on a negative binomial?

    2. If we assume your original data set is representative and if we assume your desire to run 1000 sample simulations using an underlying distribution of a negative binomial is based on that data then a careful graphical analysis would suggest the basis for this desire is the existence of 2 data points. If this is indeed the case then you are focusing on the behavior of less than 2% of your sample data and ignoring what the bulk of the data, in conjunction with the two extreme data points, is telling you.

    3. A boxplot of your data for defects and defectives split on glove type tells you in no uncertain terms where you should be looking and what you should be considering with respect to the process.

    a. The plot for defects says there is a shift in the mean and that’s about it. If I remember correctly one of the posters in the other thread indicated this was a significant shift.
    b. The plot for defectives says two things and, I suspect, also highlights the driver for your post on negative binomial simulations.
    1. There is a shift in the means between the two glove types
    2. There is a marked difference in the VARIABILITY around the means of the two glove
    types.
    3. There are two extreme data points in Ansell data for defectives – as mentioned, these represent less than 2% of the total sample and, as near as I can tell, they are the sole justification for a desire to run 1000 sample simulations based on a negative binomial.

    Based on what you have provided the big news, and the three things that demand investigation, are:
    1. Is there any physical connection between glove type and the simultaneous reduction in mean AND variation about the mean for Defectives?
    2. If glove types are really a cause for reduction – why don’t Defects follow the same pattern as Defectives?
    3. What is the story with respect to the two extreme data points for Ansell Defectives?

    An answer to these three questions will probably result in an answer to your original question.

    0
    #202644

    Robert Butler
    Participant

    “Variable are classified as continuous or discrete, according to the number of values they can take. Actual measurements of all variables occurs in a discrete manner, due to precision limitations in measuring instruments. The continuous-discrete classification, in practice, distinguishes between variables that take lots of values and variables that take few values. For instance, statisticians often treat discrete interval variables having a large number of values (such as test scores) as continuous, using them in methods for continuous responses.”
    – From Categorical Data Analysis 2nd Edition – Agresti pp.3

    Based on what I can read from the OP’s picture of his spread sheet he has a sufficient spread in counts which means there shouldn’t be any problem analyzing the data using the methods I outlined in my initial post.

    0
    #202636

    Robert Butler
    Participant

    The picture of your data is too small to read – how about posting it as an excel attachment so we can see what you have?

    Based on your description and by doing a lot of squinting it looks like you have recorded a defect count along with the total number of things checked over on the left hand side and the same thing on the right hand side and all you have done is label one attribute and one variable. Assuming I am reading your picture correctly, in both cases if you divide the defect count by total number of items examined you will have a percentage and you can run a two sample t-test on the percentages with the population identifier being glove type.

    0
    #202633

    Robert Butler
    Participant

    As I stated in that discussion back in 2002 you can get negative sigma values. From that discussion

    ” If you go back to your sigma calculator in whatever program you are using (mine is on Statistica and it does add 1.5 to the Sigma Value) and plug in values such as 999,999.999 for your DPMO you will get a Sigma Value of -4.49 or -5.99 depending on how your particular calculator uses 1.5 when computing Sigma Values. These Sigma Values just mean that the vast majority of your product is completely outside the customer spec limits. It says nothing about the standard deviation of your process.”

    I agree calculating the sigma level for the case where there is very little of your product within a customer’s specification is silly because it should be blindingly obvious that this is the case without recourse to any kind of calculation.

    What you do need to remember is this calculation is only commenting on how your process lines up with customer specifications – it doesn’t say anything about process stability. You can have a very stable process and still get this kind of a result. For example there is a sudden drastic change in the customer needs and the change is such that virtually none of the current product from your stable and in control process can meet the new requirements.

    0
    #202626

    Robert Butler
    Participant

    1. If you have some kind of software that allows you to identify the quartiles of your data you will find that the lower quartile (25%) is 3, the median (the 50% point is 4, and the upper quartile (75%) is 5. Therefore 25% of your data has a color count >5.

    If you don’t have access to any statistical software you can approximate this process by sorting the entries for color count in the Excel sheet in ascending order. The actual count of real entries is 187.

    25% of 187 is 46.75 – rounded up to 47, 50% of 187 is 93.5 rounded up to 94, and 75% of 187 is 140.25 rounded down to 140.

    If you count down the sorted entries to the 47th entry the color count value is 3, down to the 94th the color count value is 4 and down to the 140th entry and the color count value is 5. Therefore 25% of the color count values are greater than 5.

    The only problem with doing the calculation this way is that you are running the analysis on the sample so you will not get perfect agreement with a statistics package running the same analysis.

    In particular you will see that, for this sample, for each of the cut points there are still values of 3 after the 47th entry, values of 4 after the 94th and values of 5 after the 140th. For this particular sample 16% of the entries are greater than 5 and 27% of the entries have a value of 5 or more.

    2. What I mean by simple bean counts or means is that you take all of the variables you listed as possibly being related to the total count of color additives and you class their values according to whether or not they are associated with color counts less than or equal to 5 or greater than 5.

    Example: Bean count using the actual color count values from the sample

    Color count less than or equal to 5 = 156, color count greater than 5 = 31

    Variable:Blade Types 1,2,3,4

    Blade Type CC<=5 : Type 1: 106, Type 2: 35, Type 3: 10, Type 4: 5 Total count 156
    Blade Type CC>5: Type 1: 1, Type 2: 4, Type 3: 7, Type 4: 19 Total count 31

    Therefore color counts <= 5 are most likely associated with blade types 1 and 2 (141 out of 156) whereas color counts >5 are most likely associated with Type 4 with Type 3 running a distant second (19 out of 31 or, combining type 3 and 4, 26 out of 31).
    This would suggest blade type might be associated in some way with larger or smaller numbers of color addition. In this case you would want to look at the various physical aspects of the blades to see if there might be some connection between some of the physical properties and number of colors added.

    Example: Means using the total count of 187

    Variable: Dispersion time

    Dispersion times for CC<=5 : Mean dispersion time is 60 seconds with a standard deviation of +-2 (N = 156)
    Dispersion times for CC>5: Mean dispersion time is 15 seconds with a standard deviation of +-3 (N = 31)

    Run a two sample t-test on the means

    t = (60 – 15)/sqrt[{(2*2)/156} + {(3*3)/31}] = 45/.5621 = 80 therefore there is a significant difference between the mean dispersion times for color counts <=5 and color counts >5. This would suggest there is something about dispersion time that impacts the number of colors added to a given batch.

    You can run these calculations manually but it will be very labor intensive and time consuming. Given the number of comparisons you will need to make you will really need to have access to some kind of statistical software and you will need to have an understanding of what that software is doing. For a basic level of understanding of the statistics I would recommend getting a copy of The Cartoon Guide to Statistics by Gonick and Smith. You should note that the section on the t-test makes reference to the “need” to have normally distributed data – this is an unneeded constraint. The t-test does quite well with non-normal data. If you need a reference check pages 52-53 of The Design and Analysis of Industrial Experiments 2nd Edition by Owen Davies.

    3. The splits are just the act of grouping the various measures according to whether they correspond to a color count less than or equal to 5 or color counts greater than 5.

    0
    #202620

    Robert Butler
    Participant

    Not so fast! :-) If your most recent post is an accurate summary of all you have done then you still have some work to do before thinking about a design.

    If you are going to try to run the analysis as I mentioned above then you should first split your data into the group with more than 5 additives and the group with 5 or less and run either simple bean counts or means tests of all of the factors you have listed Tinter Strength, Tinter Additions, Dispersion time, Machine speed, Raw materials, Skill levels, Scales/weight, Specification, Spindle type & Blade type & size.

    I’m not sure just how you would characterize some of the variables such as skill levels, specification and scales/weight but if you have some way of classifying these things then you should look at a summary of the differences in the counts/statistics of all of these properties split on the basis of the two groups.

    1. To see if there are any significant differences – for the continuous measures you could use a two sample t-test. At a guess these would be tinter strength and additions, dispersion time and machine speed. Depending on how scales/weight, specifications, and skill level are characterized (either continuous or ordinal) they might be included in this list as well.

    For those variables that are most likely nominal – Raw materials (suppliers?),Spindle type & Blade type & size you would want to run a bean count to see if there are obvious splits in the counts of these things between the two groups.

    2. Given the above information you will be in a much better position to decide which of these variables might be worth checking at the level of a DOE.

    0
    #202614

    Robert Butler
    Participant

    I don’t think the OP was concerned about process control nor about control chart issues. My take on his post was that he was given the task of reducing the average number of additives per paint lot. To that end I focused on what his current set of data was telling him about his process and I suggested a possible way of looking at his process that, if successful, would result in reducing the odds of having to add more than 5 additives to a lot of material.

    I realize that someone in his organization said they wanted a normal distribution but I didn’t get the impression this had anything to do with control charts either.

    0
    #202604

    Robert Butler
    Participant

    I’ll second Kristen’s recommendation and add The Cartoon Guide to Statistics by Gonick and Smith. It’s the book I recommend to my engineers when they want to get some additional understanding about basic statistics issues.

    You should also understand the value of good statistical graphics and what they are – to this end I’d recommend The Visual Display of Quantitative Information by Tufte. His other three books on the issue of graphical presentation are also worth reading/examining visually (after all these are books about graphical representations).

    0
    #202597

    Robert Butler
    Participant

    That may be the requirement but the basic physics of your situation will guarantee it won’t happen. As I mentioned in my first post you have an actual physical lower bound of 0 additives and you are trying to work very close to that lower bound which means you are going to have an asymmetric distribution. Just look at the histogram of your process as it is currently running – it is already skewed. If you manage to eliminate the upper 25% of that distribution the data will still be skewed just with a shorter tail.

    What you can do is this. If you can identify and eliminate the occurrence of the upper 25% of the current distribution (or drastically reduce the odds of anything exceeding 5 additives) then you could take the data for your new process, build a normal probability plot of the data, pick off the values corresponding to the 2.5 and 97.5 percentiles and you will have an equivalent two sigma spread around your process mean.

    By way of showing improvement in the process you would compute the equivalent 95% bounds of your process as it is now running and then show the difference after you have improved the process.

    If you want to do this you will need justification for the approach. Try borrowing the book Measuring Process Capability by Bothe through inter-library loan and read Chapter 8 Measuring Capability for Non-Normal Variable Data. It covers the details of this method for determining equivalent spread.

    0
    #202590

    Robert Butler
    Participant

    The product names are interesting but we still need all of the other things mentioned in my first post before anyone can offer any kind of advice.

    0
    #202588

    Robert Butler
    Participant

    You will have to provide a lot more detail about what those things are, what it is about them that you are measuring and what those measurements look like in terms of both data type and data distribution before anyone can offer much of anything in the way of specific suggestions.

    0
    #202587

    Robert Butler
    Participant

    1. Regardless of how you choose to parse the need for repair – it is still a matter of repair and all of the issues/concerns that apply to one form of maintenance/repair facility apply to the rest.

    2. Complex and Complicated are synonyms therefore there is no distinction between the two.

    3. You triage aircraft repair and maintenance the same way you triage patient care. Aircraft maintenance and repair is very dynamic and the idea that every plane can just sit around somewhere burning money until a group of people decides what is best is incorrect.

    4. In aircraft maintenance and repair there are the equivalent of cure and care teams and they too have people from diverse educational backgrounds. The cure teams (note the “s” on the end of the last word) are involved in the cure – that is the actual act of repair. The care teams are involved with all of the issues surrounding the plane before it goes back into service and when it is in service.

    I’m not interested in turning this into a flame war or a he-said-she-said exchange but I think the bigger issue here is that your understanding of continuous improvement needs refinement. Since I have worked in the health field for a number of years and I apply six sigma methods on an almost daily basis let me provide you with a specific example of how it is done.

    Some years back we had a home care medication program that, in theory, should have been very good but was plagued by a number of problems with respect to getting the program underway and in use by the patient at home.

    I first met with a representative group of supervisors and managers to get an overview of the process. I used what they told me to build a flow chart of the process. This allowed me to tentatively identify the major parts of the process. I say “tentatively” because, regardless of how good a supervisor or a manager is they, by virtue of what they should be doing namely running interference for their employees, are not privy to all of the critical details of the process.

    Next I went to the floor and met with groups of nurses/technicians/suppliers associated with each of the main components of the flow chart. With each group I built a fishbone diagram of their part of the process. The fishbone not only diagrammed their specific part of the process it also included what they needed from prior steps in the process and what they expected to deliver to the next step of the process. From this I learned the following:

    1. There were critical major parts of the process which had not been listed by the manager/admin group.
    2. There was a lot of re-work going on – a total of three hidden factories in the process.
    3. The flow of needed information through the process as described by the manager/admin group had missed several critical elements and was in error with respect to other information elements.

    In each case, when we finished building the fishbone diagram of a particular piece of the process everyone present said they had no idea that their part of the effort was that complicated/complex.

    I took all of the diagrams, got a very large conference room, taped each of the fishbone diagrams in order of process flow on the wall and called a large meeting that included people from all levels of the process. Once everyone got over the surprise of actually seeing what the process really looked like we got down to specifics with respect to where and what kinds of measurements we needed to take in order to quantify what was going on.

    We agreed on a sampling plan for a number of different areas. I pointed out to everyone that I realized gathering this data would be an added burden to their respective work schedules so I also told them we were going to have a specific start and end date for all of the measurements and the start and end dates would be the same for everyone. We discussed the trade off for the need for representative data vs. the time involved and agreed that a month worth of data should be sufficient.

    Once all of the data was compiled I ran an analysis. For this particular project the statistical analysis was largely graphical (and yes, an appropriate graphical analysis is a statistical analysis) along with a few very basic statistical summaries.

    What we found was that well over 90% of the problems were due to two of the group of physicians involved in the evaluation process which determined who should and should not be permitted to have the home medication treatment. The analysis also showed it wasn’t their fault rather it was the fault of the system.

    The workload of the two physicians in question was much higher than the others in the evaluation group (I checked this just in case someone somewhere would try to claim they weren’t pulling their share of the load) and the added burden of correctly filling out all of the details in the paperwork that need to be provided was a simple case of work overload. The solution was simple – an assistant was assigned to them to take care of the paperwork as needed. The problem disappeared and various aspects of the existing process such as the hidden factories closed shop.

    0
    #202582

    Robert Butler
    Participant

    I think the problem is the wording of the part of the second statement which says “…Attribute data (aka Discrete data)…” It is true that in order to run a statistical analysis on attribute data you will typically have to assign numeric values to the attribute categories which, because of the nature of the assignment, are discrete numeric identifiers. However, just because data is discrete does not mean it is attribute data.

    I also don’t like the part of the second statement that says “It is typically things counted in whole numbers.” rather I think it should should read “typically whole numbers are assigned to attribute values for purposes of statistical analysis. The assignment of numbers may be purely random or, if there is some logical ranking/ordering of the attribute, assigned as a function of rank/order.”

    Examples:
    Attribute – nominal – no obvious ranking – gender, pill type, geographical location, work centers, make of automobile, ice cream flavors, etc.
    Attribute – ordinal – ranking – age group, education level, ratings of any kind (poor, fair, good), (strongly disagree, disagree, neither agree or disagree, agree, strongly agree), etc.
    Discrete – no attributes – count

    0
    #202580

    Robert Butler
    Participant

    I’m not trying to be mean spirited or offensive but your first major problem is your view as expressed in your second point.

    “2. Healthcare falls mainly in “services” domain and applying six sigma principles, which are mainly applicable for “manufacturing” domain, is a big challenge.”

    This just is not true. A hospital is nothing more than a repair and maintenance facility. The doctors are body mechanics, the nurses are technicians, and everyone else is there to support the throughput of repaired items. As a result, the issues faced by a hospital are no different than the issues faced by a jet aircraft repair and maintenance facility.

    From the standpoint of unit repair the doctors actually have it simultaneously easier and harder than an aircraft mechanic. They have it easier because they are dealing with a single model with components located in exactly the same place in every model. The reason they have it harder is because while they only have one model that model is an evolved model whereas the jet mechanic’s models, while exhibiting a wide variety with respect to components and component location are designed. The hallmark of evolution is complexity and the hallmark of design is simplicity.

    0
    #202573

    Robert Butler
    Participant

    Your target average may be 2.7 but, as you noted the average of your sample is 4.03 and your median is 4. The histogram of all of the data indicates quite a skew towards the high side in terms of counts of color additives per batch. In your spread sheet you have numbers highlighted in yellow along with summary statistics – I’m assuming this is to indicate some kind of lot identifier (so many batches/lot?)

    In any event an overall histogram illustrates the fact that your process is a long way from any kind of average of 2.7 color additions. A check of boxplots by lot number reveals the same thing. In addition to having to move the mean you will also need to get feedback from management concerning the amount of variation they are willing to tolerate around a target mean of 2.7. Without that piece of information you could have a situation where your grand average is 2.7 but the variability is such that your changes to the process don’t matter.

    Since you have a physical lower bound of 0 color additives and you have a target very close to 0 the histogram of your process will be non-normal and the issue for you is that of determining the causes for the need for large numbers of additives per batch. As noted, for the entire data set the median is 4 and a check of the lower and upper quartiles indicate a value of 3 and 5 respectively. In other words 75% of your product requires more than 3 additives and 25% requires more than 5.

    If you think your data is representative then I would say the next task is to understand the need for so many additives to so much of your product. From the standpoint of the 80/20 rule you might want to consider first looking at those cases where the color count is greater than 5 (the upper 25%).

    If you can answer that question and find a solution such that the maximum additive count becomes 5 then, based on your data, you would have a situation where your lower quartile is 3, your upper quartile is 4, your median would be 3 (a whole number color count additive) and your mean would be 3.3 with a standard deviation of 1.05. Eliminating the group above 5 additives wouldn’t get you to 2.7 but it would be a large improvement over your current situation.

    I’ve attached two graphs – a histogram and a lot (?) level boxplot to help you think about what it is that you data is telling you.

    0
    #202563

    Robert Butler
    Participant

    The smallest unit of independence is the individual animal. Your measures are repeated measures and the analysis must take that fact into account. If you don’t you will find all kinds of significant effects where none exist. This is due to the fact that the machine will treat each measure as independent. The repeated measure-to-measure variability will be much less than real independent measure-to-measure variation which means you are testing measurement differences using smaller estimates of variance.

    You could run a repeated measures regression analysis with firing rate being the outcome variable. The issue is going to be that of independent variable definition. Based on what you have posted I think the best be would be a 4 parameter model outcome = forced choice right, forced choice left, choice right, and choice left. I say this since your basic question is firing rate of neurons as a function of direction. The fact that an animal chooses to take the same direction as previously (?) forced to take is interesting but hardly an error and a model of this type will cover the question you are asking.

    If all you care about is mean differences of the 4 choices and if you don’t have any ready means for analyzing repeated measures data then you could build a 2 way ANOVA with the 4 choice types as one variable and the animal identification as the other variable. By including animal in the analysis you will have taken into account the repeated measures nature of the measurements. If you choose the ANOVA approach you will want a reference – the same problem with piil type across patients and a measure of fecal fat as the outcome substituted for choice type across animals with neuron firing rate can be found on pages 254-259 of Regression Methods in Biostatistics by Vitinghoff, Glidden, Shiboski, and McCulloch. That book also does a very good job of discussing the issues of repeated measures and how to deal with them in general.

    As for issues of normality of outcomes – this not an issue either with regression or with t-tests and ANOVA. In regression analysis the issue of approximate normality only applies to the residuals and both the t-test and ANOVA are robust with respect to lack of data normality. By the way, when it comes to testing for normality do yourself a favor and first plot the data – histogram and normal probability plots being the first choice. The various normality tests are so sensitive that it is possible to take generated data using the normal distribution as the underlying generator and have even that data fail one or more of the tests.

    0
    #202561

    Robert Butler
    Participant

    Apparently the Isixsigma admin’s decided your other post with this same topic was the redundant one so – as I said in my response to the same post – you will have to tell us more about what it is that you are trying to do. Simple data plotting is just a matter of putting data points on an x-y plot and has nothing to do with MSE.

    0
    #202560

    Robert Butler
    Participant

    You could analyze it both ways – it would depend on what you want to do but before you try anything there are a couple of things you might want to consider.

    1. You say that your responses are dimensions which presumes you have taken multiple measures of each dimension of interest at each experimental condition. This, in turn, probably means that you built multiple items for each of the experimental conditions and took measurements off of each of the items. If this is the case then your measurements are repeated measures – not independent measures and you will have to take this into account when running your analysis.

    2. If you choose to run the DOE on in/out spec responses you will have to use logistic regression and your coefficients will be odds ratios which will express the odds of getting defects for certain combinations of experimental conditions. You will need more data for a logistic regression in order to detect significant differences (as compared to running an analysis on actual measurements) and you will still have to take into account the repeated measures nature of the data.

    3. Another possibility is this – it sounds like you are more concerned with in/out spec issues than mean responses. Since your post gives the impression that you ran a full factorial you might want to become familiar with the Box-Meyer’s method for data analysis. If you analyze your DOE data using this method you will be able to identify the variables that are contributing the most to your observed measurement variation and thus are most likely to contribute to the occurrence of out of spec material.

    If #3 is what you are really trying to do then get a copy of Understanding Industrial Designed Experiments – Schmidt and Launsby – 4th edition through inter-library loan. Sections 5-28 to 5-31 has the details for running a Box-Meyer’s analysis.

    As for the issue of repeated measures I don’t know Minitab but I seem to recall from other threads on this site that Minitab can handle repeated measures – your best bet would be to check their website and/or talk it over with their support people.

    0
    #202532

    Robert Butler
    Participant

    It’s not a trick or short formula rather it is a matter of modulo arithmetic and defining contrasts it can get gory fast.

    The “trick” is the identification of defining contrast (also called the defining word or the defining relation) associated with principle block (the set of experiments which include the I experiment – the experiment with everything at it low level).

    Once you have that/those contrasts you use modulo arithmetic to quickly work out the confounding structure. For some designs such as a half replicate the defining contrast is a simple matter of setting the identity experimental design (the experiment with everything at it low level) equal to the largest interaction with everything at its high level.

    For example – for a half rep of a 2**4 the contrast is I = ABCD so with modulo arithmetic you get the generator of D = ABC. This means C = ABD, B = ACD, A = BCD. Thus all mains are clear of one another and all two way interactions. However with modulo arithmetic AB = CD, AC = BD, and AD = BC so you have at most 3 two way interactions clear of one another but confounded with other two way interactions.

    On the other hand, if you want to do 5 variables in 8 experiments then you have I = ABCD = -BCE = -ADE so the generator for the design is D = ABC and E = -BC and the modulo arithmetic will show you what is confounded with what. A good book with the details is The Design and Analysis of Industrial Experiments by Davies 2nd edition. If you can get the book through inter-library loan the section to read is the chapter on confounding in experimental design and appendix 9D which goes into more detail.

    0
    #202530

    Robert Butler
    Participant

    For your first attempt I would recommend plotting your defect count data against all of the temperature variables of interest and see what you get. If all you have for plots are a series of shotgun blasts then you will know without wasting a lot of time that there isn’t any clear relationship between the variables of interest and the defect count. On the other hand if you do get some clear trending you can go about running a regression to model what you see. Based on my experience there is a good chance that if you find a graphical relationship it will not be a simple straight line – hence the recommendation of plotting before trying anything else.

    0
    #202514

    Robert Butler
    Participant

    Based on your description it doesn’t sound like you have data that could be used in an agreement analysis (I’m assuming when you say “agreement” you want to look at something like the Kappa statistic).

    For an agreement study you have to have some kind of contrast on the same things you are measuring. For example, if you want to know how well an individual rater agrees with themselves the usual procedure is to provide a series of things (let’s use x-ray images of joint injury as an example) and have the individual rate each of the images on whatever scale is being used to assess the images. The images are given to the rater in a random order and then, at some later date they are re-randomized and given to the same rater. The rating scores are matched by images and the Kappa statistic is calculated on the before and after ratings.

    On the other hand, if by “agreement” you mean equivalence then you would use the t-test and run the Schuirmann’s TOST equivalence test. However, the problem with this approach is the same as the one for the Kappa – you need some kind of pairing of measured responses.

    Rather than continue with a series of “if” statements I think you will need to provide more detail with respect to what it is that you are trying to do.

    Some issues to address:

    500 cases – what are these – all the same kind of thing, groups of similar things but not all the same, etc.?

    10 questions on an ordinal scale – same questions for all – all questions different or do they group into question types (i.e. questions 1-3 focus on patient demographics, questions 4 and 5 are focused on economic status, etc.)

    Why 5 people per case?

    Are the cases randomly assigned?

    What do you mean by overall attributes across the sample?

    0
    #202512

    Robert Butler
    Participant

    …are we talking about the 27 September 2002 exchange? If so what I said then still applies:

    “The real problem here is one of labeling. Someone made a very poor choice when it came to renaming the Z score. Whoever they were they at least chose to call the Z score the Sigma Value and not Sigma. Unfortunately, the same cannot be said of any number of articles and computer programs. I know that Statistica calls their program the Sigma Calculator and the final result Sigma. The problems that can result from confusing Sigma Value (which can have negative values) with process standard deviation (which cannot have negative values) are too numerous to mention.”

    0
    #202505

    Robert Butler
    Participant

    The Taguchi OA’s are just renamed and cosmetically altered standard designs. Given what you are trying to do it would be much easier to take a fractional design of 5 factors in 8 experiments and add a couple of experiments to make sure the interactions of interest are clear of the 5 main factors.

    The principal block for the 5 factor, 8 experiment design is as follows: (1), ad, bc, abcd, abe, bde, ace, cde. This gives you all 5 factors clear of one another but confounded with 2 way interactions which is why you would need to augment this design with a couple of other design points to clear the needed 2 way interactions. Minitab has the capability to run augmentation using D-optimal methods.

    0
    #202501

    Robert Butler
    Participant

    …Rats, cut off the last part of my post..

    The t-test is robust to normality and the issue is that of “approximate” normality. In order to get a good visual understanding of what that means you should borrow Fitting Equations to Data by Daniel and Wood from the library and look at the cumulative distribution plots of normal data (for various sample sizes) pages 34-43 (in the 2nd edition). Another thing you should do is run side-by-side checks for significant differences between data distributions using both the t-test and the Wilcox-Mann_Whitney test to give yourself an understanding of just how crazy non-normal data has to be before the t-test fails to indicate significance when the Wilcox-Mann-Whitney test indicates a significant difference exists.

    0
    #202500

    Robert Butler
    Participant

    You haven’t told us what you mean by “first run all variables, and normality is not ok
    run only significant variables, normality is OK.” Specifically, how are you determining that “normality is not ok”? If all you are doing is running your residuals through some test such as Shapiro-Wilks then you need to back up and run a real residual analysis. That means looking at plots of residuals vs predicted, plotting the residuals on a normal probability plot, residual histogram, etc. Any good book on regression will tell you what you need to do.

    It is very easy to have residuals fail one or more of the normality tests for the simple reason that these tests are very sensitive to non-normality. Indeed it is easy to have data from a random number generator with an underlying normal distribution to fail these tests and it is for this reason that so much emphasis is placed on an evaluation of the graphics of the residuals.

    The t-test is robust to normality and the issue

    0
    #202470

    Robert Butler
    Participant

    No

    0
    #202466

    Robert Butler
    Participant

    @cseider I just got back to this thread. Unfortunately, I don’t have any contact with anyone involved in R&R with respect to blood testing so there isn’t much I can offer.

    0
    #202459

    Robert Butler
    Participant

    I agree with @cseider – only I would add the adjective “expensive” to the word “wallpaper”.

    As to you measurements – it does not appear that you have any obvious grouping of the data just individual data points from various chemists. Under those circumstances I would recommend an IMR chart with the proviso of identifying the individual chemists. What this would buy you would be a check on chemist-to-chemist variability. If everyone checks out the same – well and good and if they don’t you will know whom to address in the event one or more individuals have significantly worse variation.

    Of course, on the positive side, in the event you have someone whose measurements are significantly better with respect to variation it would be worth understanding their approach to sample testing with the possibility that if everyone adopts that individuals methods there will be an overall improvement in measurement variation.

    0
    #202453

    Robert Butler
    Participant

    A lot of the posts are locked and will say so at the very bottom. Frankly, given a time lag of 3 years it would probably be better to start a new thread. The started locking several years ago because people were not paying attention to the original thread date and were doing things like asking for information from the original poster who had long since quit visiting the site.

    0
    #202444

    Robert Butler
    Participant

    For a binomial outcome the regression method you would use is logistic regression. The final model is a log odds model of the form:

    log (odds Y) = b0 +b1*x1 +b2*x2+…etc. which means

    odds Y = exp(b0 +b1*x1 +b2*x2+…etc.)

    Depending on the program you are using the output will be either in the form of the beta’s, in which case you will need to exponentiate them to get the odds ratio associated with the X’s or the exponentiation will have been done for you and the output will be in the form of odds ratio point estimates for each of the X’s.

    So, if the interest is in the odds of something happening where something happening is coded as a “1” and not happening is coded as a “0” then the output will be a table listing each of the X’s and there will be an associated point estimate of the odds ratio along with a Wald confidence interval for that point estimate.

    For example, lets take one of your percentages and say we find the coefficient for X1 is .1906. The point estimate will be 1.21 and, assuming it is statistically significant, the way you would interpret the results would be to say that for every unit increase in X1 the odds of Y occurring increases by 21%. If X is binomial and you are using the “0” setting as the reference then you would say that when X1 = 1 the odds of Y occurring are 1.21 times that for the situation where X1 = 0.

    0
    #202441

    Robert Butler
    Participant

    Let’s try for a re-cap.

    In your initial post you said, “I am learning DoE and want to simulate responses for fractional factorial design. As an input for simulation I want to use estimated effect sizes of main factors and some interactions. I know there is a formula for calculating effects from obtained responses:”

    If you want to do this then the only way this can be accomplished is to use matrix algebra to solve an array of simultaneous equations. However, in order to do this you do not get to just use some estimates of “effect sizes of main factors and some interactions”. For this to work you will have to come up with estimates of an effect size for every single main effect and every other term that would be supported by the design in question (interactions, curvilinear terms etc.) What you will get will be predictions of the results of each of the experiments in the design.

    As I see it the problem with your approach is that designs are not used in this manner – ever. As a result, I really don’t see how what you want to do will help you understand experimental designs.

    The reason for running a design is to determine the effect sizes of the factors that were studied. You do this because, independent of all other factors in your system, you really don’t know what those effect sizes are.

    Typically, before you run a design, you have some prior data on the effect size of SOME of the factors you want to include in your study (indeed it is the possibility that these variables might really matter that drove you to include them in the design in the first place). The problem with this preliminary data is there is no guarantee that the effect size you have measured using this data is actually providing an INDEPENDENT measure of the effect size of the factor. In addition, you and your crew will most likely have a list of some other factors which common sense, a report from the latest trade journal, your bosses insistence, a Dirty Harry – do-you-feel-lucky gut feel, etc. indicate might be worth examining and for which there is no prior data for effect size estimation.

    So, you choose a design to look at these factors and you run the experiments in that design. By definition, these runs are “experiments” and therefore their results are not guaranteed. Once you have the results in hand you analyze them using methods of regression analysis. The final regression model will not only tell you the effect sizes of all of the variables/interactions/curvilinear factors of interest, independent of one another, it will also tell you if they are greater than the ordinary noise of your system.

    In my estimation, if you really want to gain an understanding of designs running what-if analysis I would choose one of the smaller designs say a full factorial 3 variable design, look at the +,- settings for various factors, set response levels to match these patterns, and run the analysis to see the impact of these ordered changes in response outcome.

    For example – choose a range of large responses and a range of small responses and assign the large and small values to match the pattern of one of the interactions and then run the analysis to see not only if the interaction is significant but if the main effect terms associated with the interaction are also significant. If you do this you can also include a couple of replicate points and vary the replicate error and see how this impacts your “initial” findings with respect to term significance and effect size.

    As for the case where you want to test some other system to see if it is in agreement with respect to effect size – just assign values for the outcomes for each of the experiments in your design, run the analysis, identify the effect sizes using regression methods and then feed the outcome values to the program of interest and see if what it gets matches what you got using regression.

    0
    #202436

    Robert Butler
    Participant

    @gustavjung @gchollar is telling you the same thing I did but perhaps I wasn’t clear enough. So again – define the response values for each of the experiments, (1,a,b, etc.) define your low and high values for each of the factors and fill in the factorial table and then build a regression equation which contains only the significant terms (use backward elimination and stepwise to see if the regressions converge to the same model – with a design they should more often than not).

    With the equation in hand, first see how well it predicts each of the experimental outcomes and then feed the design and the outcomes into your unknown program and test it on that block of data to see if the unknown program can match the values generated by your regression equation and identify the same significant terms

    0
    #202422

    Robert Butler
    Participant

    You are, of course, welcome to do what you think will help you the most, but as someone who is formally trained as a statistician and has built and analyzed literally hundreds of designs of all types in my career in both industry and medicine I would submit the following:

    1. Based on your posts you already know the basic concept of DOE – namely the fact that all experiments in a design are used to assess the impact of a given variable and that assessment is achieved via averaging of the results of the experiments with the respective high and low settings of the variable of interest. This is the basis of all designs. All that is left is knowing the strengths and weaknesses of the various design types and that information is available in any good book on design (recommended reference – Understanding Industrial Designed Experiments – Schmidt and Launsby)

    2. There have been a couple of times when I was ordered to provide a sample size estimate for a design effort and, with the exception of a single instance where it was written in the contract, we never bothered running experiments that were anywhere near the number “required” by the sample calculations. In every case the size of the observed effects after running the complete design and a couple of single experiment replicates was either very large, in which case additional experimentation would have been a waste of time, or very small, in which case there was no point in further experimentation because any statistically significant effect we might find by refining our estimate of experimental error would have no physical/clinical value.

    For all of the other designs I’ve built the main focus was on the strengths of what a design has to offer – parsimony with respect to time/money/effort. In every case the entire experimental effort consisted of the basic design augmented with a replication of one or two of the design points and in every instance we were able to provide definitive answers to the questions the DOE was built to answer.

    3. In the past, when I’ve had to check the performance of some kind of external software what I’ve done is take an experimental design, assign random response values to each of the experiments in a design, choose high and low levels for each of the factors in the design and then analyzed that design to identify the significant terms. Armed with that information I fed the same factor level/responses into the software program of interest and checked it to see if it identified the same significant variables.

    0
    #202419

    Robert Butler
    Participant

    I’m afraid I can’t offer much help with respect to the programming. What coding I need to do is done in SAS and I’ve never had to build a program to run matrix algebra calculations. To the best of my knowledge you will have to use matrix algebra to generate solutions for simultaneous equations when the count is greater than 4.

    The one thing you will have in your favor is that as long as you use factorial designs the equations for the various effects will be independent of one another so you won’t have to worry about having equations that are not independent.

    One thing is puzzling me. You say you are in psychology/marketing so what is the point of trying to “reverse engineer” an experimental design? I’m in the medical field and I have built and run designs where the response was some measured output such as hospital length of stay or patient satisfaction. In those instances there wouldn’t have been any point in estimating an effect size and then determining the result of a specific experiment because there isn’t any way to insure our effect size estimate has anything to do with reality. What I have done, and this in itself is extremely difficult, is find patients within the population who individually meet the criteria for each of the experimental conditions and then run the analysis on their responses. The drawback to this approach is the issue of finding a sufficient number of people for each of the experimental conditions. The organization I work for has huge resources at it disposal but even with those resources we found it was impossible to find people who met the experimental condition criteria for more than 4 variables.

    Rather than try to fill a design with people meeting specific design criteria what I usually do is take the provided patient population data and run co-linearity diagnostics on that data (VIF and condition indices) to determine which variables of interest to the clinician are sufficiently independent of one another to permit their inclusion in a multivariable model building effort.

    The key point of this approach is that, within the block of data, the variables that pass the co-linearity check are sufficiently independent of one another, however, it is impossible to say what other variables, either as part of the recorded data set or as part of a larger family of unknown/lurking variables are confounded with each of the terms used in the analysis.

    Consequently, we cannot claim the variables used in the analysis are independent of any other variable out there in the universe but we can/do claim (and this is a BIG improvement on what is usually done) that within the block of data used in the analysis the variables we considered were sufficiently independent of one another to allow us to make claims about their importance relative to one another with respect to the outcome of interest.

    0
    #202417

    Robert Butler
    Participant

    Sure, because now it is 3 equations and 3 unknowns but, as I said, the solution will depend on your specification of “1” and will change every time you change that value. There are a couple of other things you will need to consider.

    1. The tone of your post suggests you want to solve for the value for any number of experimental conditions. In that case you will need to use matrix algebra to run the analysis.

    2. Your simulation is assuming perfection. That is, you will get A value for each experimental condition and that is not the case in the real world.

    3. You stated you were interested in binomial responses. This means your regression equation response will be an odds ratio and that means you will have to work your way through expressing your effect sizes in terms of log odds.

    0
    #202411

    Robert Butler
    Participant

    In order to solve simultaneous equations you have to have the same number of equations as unknowns and this isn’t the case for what you want to do. If we look at a two variable two level factorial then the equations for A,B, and AB are:

    A = 1/2[(a+ab) – (1+b)]
    B = 1/2[(b+ab) – (1+a)]
    AB = 1/2[(1+ab) – (a+b)]

    and the unknowns are a,b, ab, and 1. You could, of course, specify a value for “1” but then the values of a,b,and ab would be unique only to that chosen value.

    0
    #202402

    Robert Butler
    Participant

    1. You said, “in my case each animal was tested in 6 different conditions of vision and those conditions were selected randomly before the experiment began.” In other words you are using the same animal and testing six different vision conditions on that same animal. This is the same thing as I described when I mentioned running an entire experimental design on a single animal and the issue remains exactly the same – the smallest unit of independence is the animal which means the measurements within the animal are repeated – not independent.

    Perhaps checking a reference might help. Your situation with vision conditions substituted for pills and whatever your measured outcome is substituted for fecal fat is that described in the repeated measures problem on pages 254-259 in Regression Methods in Biostatistics – Vittinghoff, Glidden, Shiboski, and McCulloch.

    There really isn’t much I can add to your comments about the QQplot – it’s the wrong plot and you need to plot your residuals on a normal probability plot. I did some checking and can’t find the normal probability plot called by any other name. The normal probability plot is used to evaluate the normality of the distribution of a variable, that is, whether and to what extent the distribution of the variable follows the normal distribution. The selected variable will be plotted in a scatterplot against the values expected from the normal distribution.

    As for the failure of the Shapiro-Wilk’s test – that’s not surprising nor would it be surprising if the data failed one of the other tests such as the Anderson-Darling. These tests are extremely sensitive to any deviation from perfect normality. Indeed, it is quite possible to take data generated using a random number generator with an underlying normal distribution and have that data fail one or more of these tests.

    This extreme sensitivity is the reason that the focus of residual analysis is the visual assessment of the graphical representations of the residuals. If you fail to run residual analysis the way it should be run and instead rely on summary statistics such as Shapiro-Wilk’s you will waste a great deal of time worrying about nothing of any consequence.

    Again, as a possible reference to help you better visualize the situation you should look at pages 33-44 of Fitting Equations to Data – Daniel and Wood 2nd edition. This appendix (3A) provides normal probability plots of random normal deviates of various sample sizes. It clearly illustrates just how non-normal perfectly normal data can look.

    I would recommend that you check out these books. Hopefully, you are in the position of being able to borrow both of the referenced books through inter-library loan.

    0
    #202400

    Robert Butler
    Participant

    For whatever reason I couldn’t get the link to your post on the other site to work last night however it seems to be working this morning so I took a look. The histogram of the residuals gives the impression of approximate normality and if I had nothing else to go on I’d continue with the analysis. However, since you do have other means at your disposal I would want to see the results of the graphs I mentioned in my first post.

    In addition to this, you will still need to consider the issue of repeated measures and how to correctly handle them.

    0
    #202398

    Robert Butler
    Participant

    Sorry typo – the sentence “The experiments guarantee that the factors within the experiments are independent of one another but the measures across the entire design within a given animal have to be treated as repeated measures.”

    Should read “The design guarantees that the factors within the experimental design are independent of one another but the measures across the entire design within a given animal have to be treated as repeated measures.”

    0
    #202397

    Robert Butler
    Participant

    If you are using one individual more than once then the smallest unit of independence is that individual. Therefore the measures on that individual are repeated. Repeated measures do not have to be across time or in any time order they only have to be measures within the same unit. In the event that you are not measuring across time a good guess with respect to the structure of the repeated measures would be that of compound symmetry.

    To give you an example of non-time ordered repeated measures, let’s say you have an experimental design and you will run each of the experiments of that design on the same animal and you will repeat this process across a series of animals. The experiments guarantee that the factors within the experiments are independent of one another but the measures across the entire design within a given animal have to be treated as repeated measures. In those instances when I’ve done this we make sure we randomize the sequence of experiments within each animal but when we analyze the data we have to make sure the machine knows the measures are repeated and not independent.

    I don’t know R so I can’t comment concerning the ability of R to discriminate between an ordered variable and one without order. I would assume there has to be some kind of command in R that lets it know the variable is nominal and perhaps that command constructs dummy variables in the background and runs the analysis on those variables without anything more needed on your end. If you are not sure then I’d recommend checking to make sure R is handling your nominal variable correctly.

    I understand you are using the QQplot to check residuals and that is precisely why I’m questioning the choice. My understanding of QQplots is that they are used to check to see if data from two different sources can be treated as having a common distribution. This isn’t the point of a residual analysis.

    As for testing normality of the dependent variable – there is no need for that just as there is no need to test the normality of the independent variables – contrary to what you can find out on the web and in blog sites from here to eternity, the issue of normality only applies to the residuals. It should also be noted that the issue of residual normality is not one of perfection – it is one of approximation which is the reason that the key aspects of residual analysis are graphical – specifically the graphs I mentioned in my initial post.

    0
    #202395

    Robert Butler
    Participant

    Your post suggests you have run a statistical test and then, for whatever reason, a QQplot. What you need to do is run a residual analysis. The heart and soul of a residual analysis is a plot of the residuals against the predicted and a plot of the residuals on a normal probability plot. You should also look at a histogram of the residuals. If the histogram gives the impression of approximate normality, if the plot of the residuals on the normal probability paper passes the “fat pencil test” and if the plot of the residuals against the predicted values resembles a shotgun blast and does not have any obvious linear, curvilinear, or > or < shape then it is reasonable to assume that your residuals are “normal enough” and that the results of term significance are acceptable.

    If the above approach is unfamiliar to you I would recommend getting a good book on regression through inter-library loan and read the chapter on residual analysis.

    There are a couple of other aspects of your post that need some discussion. You state that you have “Random effect variable: as the same animal performing some behavioural task was measured more than once.”

    That is not a random effect variable that is a repeated measure and if, as your post would suggest, you are running an analysis on repeated measures then basic GLM is not the correct approach because the program will take the variability associated with the repeated measures and use that as an estimate of the ordinary variation in the data. The end result will be variable significance where none exists. With repeated measures you will need a program that allows you to identify the sources of the repeated measures so that the machine will not incorrectly use the within animal variability.

    You also stated you have an “Explicatory variable: a factor with 6 levels”. Are these levels ordinal or nominal. If they are ordinal – no problem but if they are nominal then you will need to build dummy variables and run the analysis on the dummy variables and not on the 6 levels themselves.

    0
    #202392

    Robert Butler
    Participant

    @Darth – yes, no, maybe. It would be a function of things like sample variation, distance between the sample means, distance between the sample mean and a reference, etc.

    0
    #202376

    Robert Butler
    Participant

    The tests designed for exactly that purpose. If you are interested in comparing means then use the t-test and if you are interested in comparing variability then use the F test.

    I realize there is lots of stuff out there on the net which insists you have to have gobs of data before you can do anything with either of these tests and there is also lots of stuff that claims you have to have 8 or 10 or 15 or 30 or whatever other number the writer chooses to pick before you can run either of these tests (and there is also the hoary old chestnut concerning a need for normality) but a check of the appendices of any good book on statistics will show you that the minimum number of samples needed to run a t-test is 2 and that for comparison of variance the minimum number of tests per group is also 2. Granted the t value and the F value for significance in both cases is large but that is the way the cookie crumbles.

    Most of the work I have done has been in situations where the cost per sample was very high ($5000 and up). The end result has been situation after situation where the max number of allowed samples was between 1 and 5 with most instances being between 1 and 3. In every case I was able to provide information that helped my engineers get where they needed to go with respect to new product development, process improvement, etc.

    0
Viewing 100 posts - 101 through 200 (of 2,432 total)