## Forum Replies Created

## Forum Replies Created

- AuthorPosts
- November 20, 2002 at 4:43 pm #80878

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.Before running this experiment make sure that your watch is stopped at a time about an hour or two before the actual running of the test.

Have an overhead ready in advance and then announce to everyone in the room that, on your mark, they should write down the time (to the nearest second) that is displayed on their respective wristwatches. Go around the room and ask each person in turn for their recorded time. Save your “stopped watch” time until last. The variation about the actual current time will illustrate common cause variation and your stopped watch will provide an example of special cause variation…after the exercise is over you will help drive home the concept of special cause if you admit that you stopped your watch on purpose.0November 20, 2002 at 3:12 pm #80874

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.From MIL-HDBK-5H pp. 9-30 we have the following:

” The quantities (skewness corrections) added to the tolerance-limit factor, k99 and k90, represent adjustments which were determined empirically to protect against anticonservatism even in the case of moderate negative skewness. Skewness is not easy to detect, and negative skewness in the underlying distribution can cause severely anticonservative estimates when using the normal model, even when the Anderson-Darling goodness-of-fit test is used to filter samples. These adjustments help to ensure that the lower tolerance bounds computed maintain a confidence level near 95 percent for underlying skewness as low as -1.0. For higher skewness, the procedure will produce significantly greater coverage.”

Since an awful lot of people use the Mil Handbook I guess that the answer to your question is that a lot of people use skewness, or at least have to take it into account.

I can think of a few instances where I had to worry about kurtosis but these would only constitute personal use as opposed to general use of the property.0November 20, 2002 at 1:35 pm #80862

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.A number of everyday examples come to mind.

1. If there was only a 99% chance of safely landing an airplane every time it landed – every airport in the country would be littered with wreckage.

2. If there was only a 99% chance of safely crossing the street every time you stepped off of the sidewalk – the most common cause of death would be death by crossing the street.

3. If the postal service only had a 99% success rate in mail delivery there would be mountains of mis-delivered mail piled on street corners.

4. If the local drug store only had a 99% success rate in correctly filling prescriptions there probably wouldn’t be any such thing as a drug store.

5. If there was only a 99% chance of not getting a fatal shock from a light switch each time you turned on the lights – candles would be a very popular means of illumination and anyone who actually used light switches would be viewed as a reckless daredevil.

The point here is, if you are looking for examples of anything where 99% isn’t good enough, just look for things that are done again and again on a daily basis and I’m sure you will be able to work out any number of examples like those listed above.0November 19, 2002 at 2:03 pm #80827

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.Since we seem to be pointing Mr. Osuna in the direction of a Box-Meyer approach and since it was mentioned that Minitab is set up to do this analysis, I sat down with a copy of Minitab (version 13.31) and tried to find the Box-Meyer method. I have had no success locating it. A check of the manuals that I have also fails to mention this analytical tool.

Stan, since I understood your first post to mean that Box-Meyer is on Minitab, could you give all of us a tutorial on how to find it? (If I am in error here please accept my apologies in advance). In addition to helping Mr. Osuna, I would like to know where this is in order to have an automated method to replace my built-it-myself version that I have over on another package.

I’d also like to offer the following thoughts on the sidebar discussion in this thread concerning regression:

1. The issues surrounding normality do not apply to the independent or dependent variables in a regression. Normality is an issue with the residuals and then it is only an issue when you are attempting to use the residual mean squared error term to identify significant effects. The Second Edition of Applied Regression Analysis by Draper and Smith pp.22 section 1.4 has the details.

2. The suggestion to use regression methods to examine historical data is reasonable but there are a number of caveats concerning historical data that one should never forget:

a. Variables that are known to be important to your process will probably not show up in a regression analysis. This is because in historical data, important variables are so carefully controlled that they are never given a chance to impact the process.

b. Variabiles that were not part of your control process will have been allowed to change as they see fit. Consequently, you will most likely have massive confounding of uncontrolled variables. Regression packages will not know this and they will give you regression equations loaded with variables that are not independent of one another. In order to check the nature of the confounding of the “independent” variables in historical data you will have to run an eigenvector analysis on your data set and check for such things as VIF’s, eigenvector values, and condition indices.

c. The confounding mentioned in b. also means that some of the uncontrolled variables will mask other variables that were not only uncontrolled but completely unknown.

The above is not meant to suggest that you ignore historical data. Rather it is offered to emphasize problems with historical data and to help you avoid the all to prevalent problem with such data and its analysis which can best be summed up as “Garbage in – Gospel out”0November 14, 2002 at 5:38 pm #80687

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.I can only second SoCalBB’s comments. Deming’s comment on this kind of a situation bears repeating-Be careful, the woods are full of HACKS! …and he always put a lot of emphasis on that last word.

I’d like to ask you if they actually said ,”the 4 week session contains much useless statistical training on theory, the computations, statistical history, etc when everyone uses Minitab anyway.” If they did I’ll make it a point to add this one to my hall of shame collection.0November 13, 2002 at 7:11 pm #80608

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.The requirement for “reasonably normal” data is driven by the construct of the tests. Both the t-test and ANOVA assume that the data that you are testing consists of independent samples from normal populations. This is the reason that there is such a focus on understanding the underlying distribution of the data before continuing with an investigation. If you wish to use either of the above tests and your data proves to be significantly non-normal you will either have to investigate ways to transform the data so that it is “reasonably normal” and/or you will have to study tests that are equivalent to the t-test and ANOVA for non-normal data.

0November 12, 2002 at 3:04 pm #80541

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.In order to use DOE to investigate the effect of variable change on process variability (standard deviation) you will need to have an estimate of variation at each design point. This means that you will have to repeat the entire design at least once (in order to have a two point estimate of variance at each design point). The amount of experimentation involved in such an effort can easily get out of hand. For example, for 4 factors, a full factorial 2 level design would mean a minimum of 32 experimental runs.

Two point variance estimates mean that you are not going to be able to chase subtle differences in standard deviation. Consequently, you should view first efforts in this investigation as a screen for main effects. This would recommend the use of fractionated designs which, in turn, would allow more replication (and hence better estimates of variance at each point). With 4 variables, a half fraction would be 8 points. You could do 3 complete runs of the design and thus have a 3 point estimate of variation at each design point for a total of 24 runs (as opposed to 32 runs for a two level full factorial giving only a two point estimate of variation at each design point).

Since process variation is rarely uniform across a design space (it is because of this that you try to run as many different replicates of as many different points as you can when using a DOE to examine variable effects on means) the significant shifts in variation that you will detect will highlight variables impacting the variation above and beyond the background non-uniformity of the design space.

I’ve used the above approach several times and have found it quite useful for identifying variables impacting the standard deviation. Of course, a DOE run in this fashion will permit an investigation of the variables impacting the mean as well.0November 8, 2002 at 4:53 pm #80435

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.To compute any defect level you wish do the following:

1. Find a statistics book with the table for the cumulative standardized normal distribution function. This table will be in the appendix.

2. Subtract 1.5 from your Sigma Value – the number that you have is called the Z score.

3. Using the Z score, look up the associated probabilty in the table listed in #1.

4. Multiply this probability by 1,000,000.

5. Subtract the result obtained in 4 from 1,000,000. This value is your DPMO.

Examples:

Sigma Value = 5

5 – 1.5 = 3.5 = Z score

From the normal table a Z score of 3.5 has a probability of .9997674.

.9997674 X 1,000,000 = 999767.4

1,000,000 – 999767.4 = 232.6 DPMO

Sigma Value = -1.2

-1.2 – 1.5 = -2.7 = Z score

From the normal table a Z score of -2.7 has a probability of .003467.

.003467 X 1,000,000 = 3467

1,000,000 – 3467 = 996,533 Therefore your process has 996,533 DPMO

0November 7, 2002 at 7:45 pm #80408

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.Adam,

If you are referring to the Davis Bothe’s article in Quality Engineering – Statistical Reason for the 1.5 Sigma Shift, I think that he has provided the answer to your question. At the beginning of the article he states that “Six Sigma advocates…offer only personal experiences and three dated empirical studies (which he cites) as justification.” He also points out that these three studies are 25 and 50 years old. In short, if you are looking for documented and published cases in support of 1.5 his article strongly suggests that you will look in vain.

The real value of the Bothe article is that he defines the dynamics of a process that could give rise to such a change. His examination of the dynamics of control chart sensitivity and use is reasonable and does make a case for process drift. He identifies a range of drift from 1.3 to 1.7 and he also identifies the assumptions that have been made for drift to occur. In his concluding paragraph he identifies the limitations of the initial assumptions.

His concluding comments go to the heart of a number of the posts to this thread. Specifically, if the dynamics of your process do not match the dynamics of the process he defined then your drift may or may not be greater or less than the 1.5 drift that is advocated by the Six Sigma community.0November 7, 2002 at 1:19 pm #80382

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.From the RICOH website we have the following:

” In 1951, The Japanese Union of Scientists (JUSE) developed the Deming Award for quality control based on his work. The competition for the Medal is severe. Judging is strict. Companies spend years in preparation before even becoming eligible for consideration.”

The criteria for examination includes corporate policy, quality systems, education and training, results, and future plans.0October 29, 2002 at 1:48 pm #80088

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.Mike,

I’m left with the impression that we are talking past each other. Billybob indicated that his experience with M&M’s in training exercises dealt with defects-I’m assuming defects/bag. All I said was that the only experiments involving M&M’s of which I was aware focused on helping people understand the concepts of distribution, variation, standard deviation, etc.

The reference to 3rd and 4th grade students was not meant to denigrate, merely to emphasize the fact that by using something such as the M&M experiment you could make these concepts understandable to almost anyone.

What I do find interesting is that apparently you and Billybob are aware of yet another experiment involving M&M’s that focuses on defects and introduces the concept of attribute gage R&R. If this has been written up could you provide a citation? If it hasn’t could you tell me how it is done?

By way of recirpocity, if you are interested in fine tuning your teaching techniques with respect to using M&M’s, check the past issues of the American Statistical Association’s STN (Statistics Teachers Network) newsletter. There is a four part article “Some Students and a t-test or Two” that may give you some ideas for using M&M’s for the purposes mentioned in the first paragraph.0October 28, 2002 at 7:22 pm #80059

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.If your post is in earnest and you aren’t just having some Monday fun I don’t think the object of using M&M’s in statistics training is to emphasize defect detection. All of the exercises that I have taught use M&M’s to illustrate the concepts of distribution and variation. Properly done, the use of M&M’s can really help people understand the issues that arise when dealing with data.

As I’ve mentioned on other threads, the M&M experiment is so clear and concise that you can use it to teach 3rd and 4th grade students advanced statistical concepts. So the next time you look at a Plain M&M packet don’t think “defect” think “distribution”.0October 23, 2002 at 6:15 pm #79918

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.The purpose of any replication in a design is to develop an estimate of the error. The purpose of a centerpoint in an otherwise two level design is to permit the detection of curvilinear behavior. Relpication of the centerpoint allows the investigator to address both of these questions at once.

If you are severely constrained with respect to resources and time, a simple replication of the center point ( i.e. the center point is run twice) will permit an investigation of curvilinear behavior (via residual analysis) and an estimate of error. If you wish to improve your estimate of error you may increase the number of replicates run on the center, however, you need to understand the assumption that you are making if you take this course of action.

Concentrating all of your replicates at the center means that you are assuming that the variance across the experimental space is constant. If it isn’t constant, the estimate of error that you will get from running only multiple replicates on the center will probably be an underestimate of the variance of the space and you will wind up declaring effects significant when they really are not.

Thus, the choice of the number of replicates of the centerpoint is driven by the aims of your experimental effort and there is really no such thing as the right number of centerpoint replicates. If you are looking to conserve resources and time then one or more replicates on only the centerpoint makes sense. If you have a little more latitude with respect to your investigation ,centerpoint replication complimented with replication of other points in the design space will provide you with the same check of curvilinear behavior and a more realistic estimate of experimental error.

0October 21, 2002 at 5:00 pm #79814

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.Plackett-Burman designs, like highly fractionated 2 level factorial designs, are for screening a large number of factors in order to identify the “big hitters” – the main effect variables that have the biggest effect on your process.

Fractionated 2 level designs are always 2 to some power. Plackett-Burman designs are based on multiples of 4. Thus, 2 level designs increase as 4, 8, 16, 32, 64, 128, while Plackett-Burman designs go as 4, 8, 12, 16, 20, etc. This means that if you are interested in checking out, say, 11 factors the minimum traditional 2 level design that you could build would have 16 experiments whereas the Plackett-Burman would have only 12.

If you have adequate statistical software to permit a proper analysis of the design the analysis and interpretation of Plackett-Burman results should be no more difficult than when analyzing any other kind of screening design.

Because it is a main effects design you will have no information concerning interactions or curvilinear behavior. Consequently, the resulting models should be used as a guide for further work and not as a final model for purposes of process control.

If all of the variables of interest are quantitative and the design space is such that you can have three distinct levels for each variable then one way to maximize your results and minimize your efforts would be to set up the P-B design and add a center point and run the replicate on the center point alone. The residual analysis of such a combination will give you a clear indication of the presence of curvilinear effects in your process. You won’t know which variable (or variables) is responsible but you will know that the effect is present and must be identified before you can claim to have an adequate model of your process.

Before you decide to go ahead with a P-B design I can only echo what others have posted to this thread-don’t just pick a design and run it. First consider what you want to do and then let your wants guide you in your design choice. If your wants are such that none of the “look-up” designs meet your needs, you will need to seek the guidance of your local statistician.0October 16, 2002 at 4:39 pm #79669

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.In his post Mike Carnell re-emphasizes a key point of all statistical analysis-namely that one should always choose the statistical methods of inquiry before running the analysis-not after. If you don’t do this then, as he points out, it is a very easy matter to cast around and find some statistical tool that will support whatever position it is that you wish to defend.

The well known statistician, Stuart Hunter, (of Box, Hunter, and Hunter fame) refers to this kind of abuse of statistics as P.A.R.C. analysis – Planning After Research Completed….and he points out that what you have when you analyze your data in this manner is exactly what you get when you spell P.A.R.C. backwards.0October 14, 2002 at 7:13 pm #79641

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.A P value is just the value of alpha at which a decision regarding the null hypothesis would just be on the borderline between acceptance and rejection. It is a common practice to set alpha = .05 or .01, however, there is nothing sacred about either of these numbers. In many fields an alpha level of .05 is unreasonable.

A P value of .05 means that you can reject the null hypothesis at a level of significance of .05 and a P value of .15 means that you can reject the null hypothesis at a level of significance of .15. In other words, at .05 there is a 5% chance that the observed difference isn’t statistically significant and at .15 there is a 15% chance that the difference isn’t statistically significant.

So, to answer your questioin, yes there is a statistical difference at a P of .15. The question that you need to address is-are you willing to live with the possibility that there is roughly a one-in-seven (actually 6.6667) chance that you are wrong as opposed to a one-in-20 chance of being wrong if you go with a P=.050October 11, 2002 at 12:46 pm #79593

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.You’ve asked a very good question. The empirical evidence rests on three old studies –

Bender-Statistical Tolerancing as It Relates to Quality Control and the Designer-Automotive Division Newsletter of ASQC 1975,

Evans-Statistical Tolerancing:The State of the Art, Part III, Shifts and Drifts. JQT 1975, 7 (2), 72-76,

Gilson-New Approach to Engineering Tolerances: Machinery Publishing Co., London, 1951

In the past issue of Quality Engineering 14(3), pp.479-487 Davis Bothe pulled these together with some additional research that he had done in the article ‘Statistical Reason for the 1.5 Shift’.

At the very end of his article Bothe points out that the 1.5 shift is reasonable but that it is based on the assumption of a stable process variance. If the variance is not stable then the 1.5 shift may or may not occur and may or may not be larger or smaller. He also points out that the above assume normality and that the situation may be different (or maybe not) for non-normal responses. I think the Bothe article is the best article on the subject that I have read. His point concerning variance stability is one that has bothered me in the past. In particular, given that you have a process that is in control and that you are making adjustments to keep it in control, the random walk nature of such a process coupled with both a drift in the mean and changes in variance could just as likely result in a drift in the process that resulted in improvement over time.

Thus, this rather lengthly reply to your question can be summarized by recommending that you read the Bothe article not only to understand the justification for the 1.5 shift but to also know the assumptions that have been made concerning its validity.0October 9, 2002 at 1:53 pm #79556

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.Abasu has brought up a point that has been bothering me. In her first posting Georgette stated that “the best metric is standard deviation”. After some postings from others, myself included, she responded with a post which included the following: “as we take 30 readings accross the part. I could average the reading at spot #1”. These two comments lead me to believe that what she might be concerned about is part-to-part surface variation.

If this is the case, then examining the parts by grouping measurements made at “spot #1” across all parts or subgroup samples across parts would be a mistake since such a grouping would address variation at spot#1 (and spot#2 etc.) as opposed to the variation of the surface as a whole. Similarly, you wouldn’t want to take the average of 30 readings within a part since this would change the focus of the investigation from that of surface variation to variation of mean surface measurements.0October 7, 2002 at 12:24 pm #79505

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.You are correct with respect to your computations. A Z score of 4.5, as you noted, does correspond to 3.4ppm. The Six Sigma claim of 3.4ppm at a Z score of 6 is due to the simple fact of adding 1.5 to the Z score that you would get from a cumulative normal table.

The justification for the addition of this 1.5 factor rests on three studies that were done 25 and 50 years ago. Probably the best (and most recent) discussion of the reasons for this addition can be found in Quality Engineering 14(3) pp.479-487 in an article by Davis Bothe-Statistical Reason for the 1.5 sigma Shift It is an excellent and very readable article. It is the sort of paper that should be committed to memory by anyone claiming any expertise in the world of SixSigma quality0October 4, 2002 at 12:16 pm #79455

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.I’d appreciate it if you would give a technical definition of “granular data”. I am unfamiliar with this term. I did do a search on the term and it appears to be from data mining. Unfortunately, while there were numerous cites and sites employing the term, no definitions were forthcoming. There was enough discussion about various issues on some of the web sites to lead me to believe that perhaps granular data is nothing more than individual data points but I would want to be sure of this before offering anything.

The lack of response to your question suggests that I’m not alone in understanding the reference. With a proper definintion perhaps I, or someone else, may be able to offer some advice.0October 3, 2002 at 2:19 pm #79398

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.As you describe your problem the question is not one of dependence or independence rather it is a comparison of some average response (measurement) made on two populations which may differ from one another because of the addition of a solution to one and not the other.

There are two possibilities here. If you measured each sample prior to addition of the solution and then measured the solutions after the solution addition and you kept track of the measurements on a sample-by-sample basis then you can use the t-test to make a paired comparison. If, on the other hand, you did not keep track of the measurements on a sample-by-sample basis but just simply recorded the before and after results then you will just use a two sample t-test to run your comparison.

Since the t-test assumes that your data is from a normal population and since it will usually assume equality of sample population variance (unless you have a program that allows/demands a decision concerning sample population variances) you will have to check to make sure that the sample variances are equivalent and you will also have to check the assumption of normality. If the sample variances are not equivalent you will have to run a t-test with unequal variance. If the data is not from a normal distribution you will have to use another test such as the Mann-Whitney.

You mentioned that you are going to check this difference at a specified temperature. You need to understand that any conclusions that you draw from your analysis will only apply to the temperature that you chose. For a different temperature you may get very different results. If temperature is going to vary you may want to re-think your approach and consider revamping your investigation so that you could run a two way ANOVA in order to investigate both temperature and solution addition.0October 2, 2002 at 8:09 pm #79375

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.JB, that’s correct. You will have two response columns one for the pull strength and a column for yes/no responses. If you can only run the design once you can run a regression against just the -1/1 or 0/1 responses but you should use the results of the regression only as a guide to give you some sense of variables imapacting burn and not as a final definitive predictive equation. If you can replicate the entire design several times you should be able to convert the yes/no responses into probabilities (i.e. say for the first experiment which was run 5 times you have as responses yes,yes, yes,no,no then you can use this to convert these responses into the probability of occurrence of a yes).

There are a number of caveats concerning either approach which are too lengthly to go into in a single post. If you want to read more on the subject I’d recommend Analysis of Binary Data by Cox and Snell sections 1.1 to 1.4 of the first chapter.0October 2, 2002 at 2:57 pm #79364

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.Your description of the problem suggests that there are at least two measured responses: occurrence of burn – (this either being a Yes, No response or, if there is some continuous measure of degree of burn, whatever that measure may be) and pull strength of the sonic weld.

Your design would focus on those variables that you have good reason to believe will impact weld strength and burn occurrence. After the design is run you would build separate models for burn occurrence and for weld pull strength. You can then use these models to hopefully identify a region of minimum burn occurrence and maximum weld strength.0September 29, 2002 at 4:58 pm #79303

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.Wendy asked about the M&M experiment. There may be more than one but the version which I have used numerous times consists of the following:1. Before the talk go out and purchase an entire BOX of M&M plain. Don’t get the party size, get the regular, at-the-check-out-counter size. If you ask the store clerk or manager many times they will have an unopened box of these in the back. If you have to purchase an opened box make sure that all of the packets are from the same lot (look at the lot markings on the back side of each packet).2. Have ready an overhead slide with a grid. Label the X axis “Red M&M’s” and the Y axis “Number of Packets”.3. Make sure that everyone has a partner and then give out a packet of M&M’s to each participant.4. Tell the groups of two to open their packets, one at a time, and count the total number of M&M’s and the total number of RED M&M’s and write the count down on a piece of paper. Emphaize that after the first count, the second member of the team must verify the total and RED M&M count. 5. Start around the room and ask each person for their RED M&M count. When they give it to you, plot the results on the overhead slide.6. If you have about 20-30 participants and thus 20-30 opened bags, you will have a histogram that should look pretty normal.7. Pull out a couple of unopened bags from the same material lot and ask everyone to give you an estimate of the “most probable number of RED M&M’s in the mystery bags”. 8. People will shout out numbers-write them down so that everyone can see them. Some will give ranges (which is what you want to emphasize) and some will just pick the mode of the distribution. 9. Tear open the mystery bags and plot their results, using a different pen color, on your histogram.10. You can then go back to the estimated ranges and see if the “final product” was in or out of the suggested product range.This exercise will provide a springboard for introducing most of the statistical issues of Six Sigma. All of the concepts are present – mean, median, mode, standard deviation, customer spec limits, etc. From time to time I’ve had enough time to prepare for this in advance and I’ve used the experiment twice. The first time I make sure that I’ve purchased the box of M&M’s at least 6 months before I was going to give the talk. I then run this experiment at the beginning of the discussion with the original group of M&M’s. I keep the first histogram and then, when we are at the end of the talk I have ready a second box of M&M’s which I had purchased just hours before. We repeat the experiment with the second lot. Many times the six month interval has seen economic changes that have significantly impacted the total number of M&M’s as well as the total number of RED M&M’s and, in addition to talking about process mean shift, I have a ready made example of the proper use of the t-test. If a significant shift has not occurred, I use the two histograms to discuss process stability and again, I can show the value of the t-test in comparison of means. In the cases where I have used two different lots of material the experiments are run at the beginning and at the end of a series of training periods. You don’t have to have a series of talks to test both lots of M&M’s but if you are going to do both lots in the same presentation you will want to have a supply of large plastic coffee cups so that people will have someplace to put all of those M&M’s. If you take some time to plan your presentation you will be suprised at how this helps everyone understand basic statistical concepts. I’ve had great success using M&M’s to teach 3rd and 4th grade students about statistics and the t-test.

0September 27, 2002 at 3:33 am #79260

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.If we go back to Mr. Deveix’s original question the issue is not about the standard deviation of a process it is about the sigma value that he is getting when he puts a given defect rate in his sigma calculator. If you have the table generated by Motorola et al with the headings %Yield, Sigma Value, and DPMO and if you also have a statistics book that has the table for the cumulative normal you can set these side by side and see that the Sigma Value (my Motorola table is the one that does NOT have the 1.5 correction added to the Sigma Value) is the Z statistic and the %Yield divided by 100 is the corresponding area under the normal curve. If you go back to your sigma calculator in whatever program you are using (mine is on Statistica and it does add 1.5 to the Sigma Value) and plug in values such as 999,999.999 for your DPMO you will get a Sigma Value of -4.49 or -5.99 depending on how your particular calculator uses 1.5 when computing Sigma Values. These Sigma Values just mean that the vast majority of your product is completely outside the customer spec limits. It says nothing about the standard deviation of your process. Thus, as Mr. Deveix has discovered, the calculator for Sigma Values will indeed return a negative Sigma Value which is what it should do when the defect rate get high enough.

The real problem here is one of labeling. Someone made a very poor choice when it came to renaming the Z score. Whoever they were they at least chose to call the Z score the Sigma Value and not Sigma. Unfortunately, the same cannot be said of any number of articles and computer programs. I know that Statistica calls their program the Sigma Calculator and the final result Sigma. The problems that can result from confusing Sigma Value (which can have negative values) with process standard deviation (which cannot have negative values) are too numerous to mention.0September 26, 2002 at 2:58 pm #79229

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.The problem in this discussion is that sigma has multiple meanings. As was noted, you can’t have a negative sigma when sigma refers to the standard deviation of a process. For the six sigma calculator the sigma value that is produced is (assuming a normal process) just the Z score. Thus a Z score (Sigma Value) of 4.5 corresponds to .9999966 or 99.99966%. Consequently, your “negative sigma” value is a Z score indicating that your process has something less than one half of its output in spec.

In this context, a Sigma Value (Z score) of 0 would correspond to .5 which would translate into 50% of your product in spec and a Sigma Value of say -.2 would indicate that only 42.07% of your product met spec.0September 24, 2002 at 8:20 pm #79153

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.I don’t know of any plotting routines that would do this. However, I would like to offer the following thoughts concerning such an effort.

Giving people a sense of where they are at some particular moment is laudable, however, just knowing where you are and not knowing where you have been or where you might be going is the woodsman’s definition of being lost. As you described it, this is exactly what your speedometer will do-location without any sense of direction.

A much better approach would be a time plot (not a control chart-just a time sequence plot) of the measurements over time. The chart can have colored bands corresponding to excellent, good, indifferent, poor, and unacceptable regions. By providing a snapshot of the process for say 3 months you will have all of the visual interest of a speedometer along with the visual value of a map.

If the measurements are too frequent so that a point-by-point plot will resemble nothing more than a busy squiggle, try grouping the data on some convenient basis such as hour, day, week, etc. and represent it as a series of time sequenced box-plots.

By putting the current measurements in this context you will have a much more meaningful graphic which will help promote independent thinking about your process.

If you wish to consider meaningful ways of representing your process you might want to read the three books by Tufte – The Visual Display of Quantitative Information, Visual Explanations, and Envisioning Information. They are very readable and all three are packed with example after visual example of ways to present your data.0September 13, 2002 at 4:32 pm #78917

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.While 15 variables may be a result of a poor effort in the measure phase of a project it is also quite possible that 15 is the minimum number.

If you have no knowledge of statistics and the power of designed experiments you will find it very difficult to deal with much more than 4-5 variables. Under such circumstances it can become an article of faith that there can’t be more than 4-5 critical variables in a process. When the no-more-than-five-variables barrier is removed all sorts of interesting things come out of the woodwork and people start to tell you what they REALLY think is impacting the process. Having done this sort of thing for 20+ years I have found that even after doing all of the things that are lumped under Measure in DMAIC it is not uncommon to have a final variable list in the 10-20 range.

The benefits of a design that investigates the top however many (even if some of them are questionable) far outweighs any other concerns. A screen that checks them will not only aid in an identification of the really important variables it will also lay to rest a lot of myth, misunderstanding and disagreement about the process itself.0September 12, 2002 at 4:23 pm #78873

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.Replication does not necessairly mean that you have to replicate an entire design. When you are interested in screening for variables that have a major impact on your process and/or when the experiments are too costly or time consuming to run, you can opt for running a saturated design with replication restricted to only one or two of the points in the design.

Based on my understanding of your problem it would appear that a way to address the issue of screening 15 variables would be for you to generate the usual 2 level full factorial design for 4 variables and then assign the other 11 variables to the second, third, and 4th level interactions. If the variables are such that there is a natural center point for each one then I would recommend adding a center point to this design and then replicate just the center point. This would give a total of 18 runs. If there isn’t a natural center point (a type variable instead of a continuous variable, for example) then toss the numbers 1-16 in a hat and draw the number of the experiment that you will replicate. If you can afford to replicate more than one experimental condition then do so.

It is understood that this approach will not give you the degree of precision with respect to an estimate of your error as would a complete replication of the entire design but it is statistically sound. Indeed, replication of just the center point for an estimate of error is standard practice with full or fractional composite designs

This approach will permit a check of the 15 variables of interest while keeping the cost of the experimental effort within your budget.

If you want more information on nonreplicated experiments I would recommend looking at Analysis of Messy Data-Vol 2 – Nonreplicated Experiments by Milliken and Johnson0September 12, 2002 at 12:30 pm #78853

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.In addition to following Hermanth’s suggestion for tracking individuals I would also recommend that for the two month test period you track and plot the arrival times and quantity per arriving batch. If you just track daily time you are assuming that there is no difference between a batch of one and a batch of ten and you are assuming that something arriving at 8:30 A.M. will have the same chance for initial touch time as something arriving five minutes before lunch break or fifteen minutes before quitting time. Granted that gathering this additional data may be cumbersome and/or inconvenient but if you make it clear to everyone that this data gathering effort has a reason and has a definite “drop dead” date you will probably have little difficulty gathering it. The combined data will permit a check for trending in waiting time as a function of arrival time and batch size.

If the check reveals no connection well and good. If it does, it will help guide you in making appropriate changes to your plan.0September 9, 2002 at 8:35 pm #78763

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.Based on your posts it sounds as if your primary interest is in having some form of graphical representation of your data to give you some idea of where you have been and a sense of where you might be going rather than trying to control anything in particular. Given that you are taking your data in monthly blocks, a much better way to view the results of your process would be to run monthly boxplots and plot them over time.

Your posts lead me to understand that your data may not be normal and that the issue is really waste per customer order as opposed to average waste for some meaningful groupings of customer types. In this instance, monthly boxplots make even more sense. If you want to have some “control limits” for the boxplots take your prior data, plot it on normal probability paper and identify the .135% and 99.865% points and use these as your +-3 standard deviations limits to visually assess how your process is doing over time.0September 9, 2002 at 1:00 pm #78728

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.The question that you raised concerning problems with non-normal data has been raised by other posts to this forum. Many of the replies to questions such as yours are helpful and offer good advice. There are a few points concerning some of the advice that needs some clarification.

Box-Cox transform- This transform has been recommended by many and it is indeed a very useful transform. It allows the investigator to check a huge family of power transforms of the data. This family includes 1/y, square root of y, and log y. If you have a package that will permit you to use this transform and if the results of your analysis indicates that no transform is necessary this does not necessarily mean that your data is normal, it only means that a power transform will not be of value to you. There are, of course other options such as the Johnson transform, Weibull estimation etc. However, if the issue is one of process capability then it is possible to assess the capability without having to worry about the actual distribution.

If you take your data and plot it on normal probability paper and identify the .135 and 99.865 percentile values (Z = +-3) then the difference between these two values is the span for producing the middle 99.73% of the process output. This is the equivalent 6 sigma spread and the capability goal of this spread is to have this equal to .75 x Tolerance. If you have software that permits you to fit Johnson distributions it will find these values for you but if you don’t the above will permit you to do it by hand.

If you would like a further reference try Measuring Process Capability by Bothe. Chapter 8 is titled “Measuring Capability for Non-Normal Variable Data.0September 4, 2002 at 3:35 pm #78613

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.RC,

What Vijay is asking for is your design matrix with some indication as to which experiments gave a result and which did not. Since I wrote out the complete 2^3 matrix in my second post just indicate which of the eight experiments actually gave you some kind of measureable result. For example, if 1,2,4,6,7 had measureable results just list them in this order. Armed with that information we can tell you what you can and cannot estimate with the results from your first experimental effort.0September 4, 2002 at 2:17 pm #78608

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.I wouldn’t recommend substituting 0 for missing values in a design. If you do this you will drastically alter the model resulting from the analysis. To check this take the simple 2^3 design below and just assign as a response the numbers 1-8 and build the model. Next set the values 2 and 6 to missing. Your regression diagnostics on the reduced set will show that you cannot estimate the effect of the v1xv2 interaction. Put all of the other main and two way interactions back in the model expression and, using backward elimination, run the model again. Finally, substitute “0” for those points that you set to missing and run the model the third time.

v1 v2 v3 resp1 resp2 resp3

-1 -1 -1 1 1 1

1 -1 -1 2 – 0

-1 1 -1 3 3 3

1 1 -1 4 4 4

-1 -1 1 5 5 5

1 -1 1 6 – 0

-1 1 1 7 7 7

1 1 1 8 8 8

For the first response v1 v2 v3 v1xv2 v1xv3 and v2xv3 are all significant when running a backward elimination with a selection of .1. In the second case the v1xv2 interaction cannot be estimated but all others can (zero df for error of course) and the coeffieicnts for the remaining terms are the same as those in the first case. In the last case, all of the terms except v2 and v3 are eliminated because inserting “0” for missing has altered the “measured” responses of the experiments.0September 2, 2002 at 2:51 pm #78566

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.I think Clint has been given some excellent advice concerning the use of Xbar and R charts but I would caution that before he attempts to use these charts to track and perhaps control his process he should first make sure that the data points he extracts for purposes of plotting are independent of one another. Given the frequency of sampling that he has described I think there is a very good chance that sequential data points are not independent. If sequential data points exhibit significant autocorrelation the control limits extracted from such data will not reflect the true natural variability of the process. As a result, the contol limits for Xbar and R will be too narrow and Clint will find himself reacting to “significant changes” in his process which really are not significant at all. To check for autocorrelation take a block of data and do a time series analysis using the time series module in Minitab (assuming you have that package). If significant autocorrelation exists you can use the results of your analysis to determine the proper selection of data for use in building an Xbar and R chart.

0August 27, 2002 at 4:42 pm #78449

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.There are a number of things you could do. If we assume that you have not yet run the design and that your problem is that of looking at the proposed combinations and realizing that one or more cannot be run because, as you said, a weld cannot be made, then you can do any of the following:

1. Change the high and low limits of one or more of the parameters so that the resulting design can be run.

Possible problem with this-The ranges become so narrow that the region of interest is not the region you wish to investigate.

2. Choose a RANGE of low values and a RANGE of high values for each of the variables in your design. Take ONLY those design points that cannot be run and check to see if by substituting a different low or high value in for one or more of the variables in that particular combination you can convert the design point into something that can be run.

Possible problem with this- your design will no longer be perfectly orthogonal. In many cases this is more of a theoretical concern than anything else-most designs when they are actually run will not be perfectly orthogonal. In order to check for trouble make up a dummy response to the resulting design and run it and check the values for the variance inflation factors and also for any ill conditioning warnings that your particular package may issue. If the design checks out, set it up and run it.

3. Find a degreed statistician and have him/her build you a restricted design using one of the many optimality criteria (A,G,D etc.) There are a number of packages on the market that will do this for you but unless you know how to check the resulting design and how to trick the package into doing what you want instead of what it thinks you want you can go wrong with great assurance.

If we assume that you have already run the design then your question becomes one of analysis. For those cases where the horses died just indicate missing data. Run the full model through you statistics package. If it is any good it will come back and tell you that one or more of the terms in your model cannot be estimated. Drop these terms from the initial model and submit the reduced model for consideration. Keep doing this until you get a series of terms that can be examined.

For a 2^3 design the dead horses will probably translate into the inability to estimate one or more of the interaction terms. The power of factorial designs is that they are robust-they can really take a beating and still deliver main effect estimates.0August 27, 2002 at 12:49 pm #78443

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.Based on your description I picture a system with a scanner moving vertically across a moving web so that the actual path of the trace is that of a diagonal scan of about a minute duration. The question that remains is how does the scanner return to it’s initial position – shut off and quick return-resulting in a sawtooth scan, or continuous scanning resulting in a zig-zag pattern back and forth across the web?

Either way, the structure of the data will result in multiple measurements down the web at the same X,Y position over time. If your automatic data gathering records the X,Y location of the measurement you can take data and bin it across time for particular sets of X,Y coordinates. For the time (MD) data you will have to check for autocorrelation and identify the time interval needed for data independence if you wish to run a control chart in MD direction. Frankly, for a first look I wouldn’t even think about a control chart. I’d just plot the data in meaningful ways and look at the results. For example, for across web (CD) you could stratify by Y across time (we are assuming MD is the X direction) and look at boxplots of the data arranged by Y location. given that you scan about every minute for about an hour you could also do the CD boxplots by Y and by time grouping the data in terms of Y location for a particular time interval. This would give you a picture of changes Y over longer periods of time. Time plots of this nature across multiple paper rolls will give you a picture of your process. Once you have these pictures you can sit down and give some serious thought as to what they suggest about the process and what you might want to do next.

Obviously this is a lot of plotting. However, if you tape your sequential plots in time order on a wall so you can really “see” your process you will probably find the resulting pictures to be very interesting.

I’ve done similar things with rolls of flocked material. The resulting picture forced a complete rethinking of the process and what it was that we really wanted to with the data we had collected the the information that had been extracted.0August 23, 2002 at 5:16 pm #78370

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.Management thinks that DMAIC is a selection criteria for members of your team and that it stands for Doesn’t Make Any Important Contribution.

Your champion thinks that rumor, innuendo, and hearsay are the only tools you need for the Define, Measure, and Analyze phases.0August 23, 2002 at 1:59 pm #78349

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.While I agree that Minitab is probably the best choice for general use I do think that it is a mistake to downplay the poor Minitab graphics. We have found the Minitab graphics to be a real problem not from the “pretty picture” standpoint but from the standpoint that the inflexibility of the graphs do not permit us to do what we need do. For example, the simple matter of not being able to adjust the settings of the X and Y values on the plots or make them uniform from one graph to the next can be a real problem particularly when you are doing things like sequential boxplots and you need to illustrate the process over multiple graphs scaled in the same manner.

I use SAS for my analytical work and Statistica for my graphing. The engineers I work for use Minitab for their analysis and Statistica for graphing. I’d recommend that you get Minitab and get at least one copy of the Statistica Base product for those situations where the Minitab graphics don’t do what you want them to do.

0August 21, 2002 at 1:04 pm #78276

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.Jan is pointing you in the right direction. You will have to investigate the issues of stratified random sampling in order to determine your sample size. With an expensive perfume, you will need to stratify on such things as income, disposable income spent on luxury items, age, ethnic background, religious background, etc. If you don’t stratify you will wind up with survey results that will tell you the opinions of the number of people you polled and nothing more. If you attempt to extrapolate non-stratified findings to the population at large you will be asking for trouble.

You may want to check Cochran – Sampling Techniques – Wiley for a discussion of the issues surrounding sampling methods and sample size.0August 15, 2002 at 12:29 pm #78130

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.The usual focus of safety efforts in any situation is to increase safety by reducing accidents or their causes. Several years ago I did precisely what James A recommended except instead of looking at just a single year worth of data I managed to locate the OSHA reports for the past 20 years. By running a number of pareto and time series plots of the data I managed to identify several broad trends in accident frequency and types which had completely eluded everyone for the simple reason that most of the safety stats that had been presented over the years were nothing more than a comparison of this year with last year. I was also able to identify groups most at risk. This last permitted a better focus of safety efforts and it also eliminated a number of meaningless practices that had been put in place.

Central to the effort was the fact that in spite of the time involved there had been no major change in accident type over the 20 years. The single biggest problem that I had with the analysis was convincing management that the analysis was valid. The argument offered was that things had changed so much that I was comparing apples and oranges by looking at such a long time line. The rebuttal to this was that I was able to show that for any multi-year period there was no significant difference in types of accidents and their frequencies.

I’m offering this war story to highlight some of the things that you might want to try with your own data. If you are U.S. based you may have trouble getting more than 5 years worth of data. There is/was a 5 year limit on data retention and many companies will delete data the second 5 years have passed. Even 5 years of data will be better than one or two and you have a better chance of spotting trends that simply won’t be apparent with one or two years worth of data. You might also want to contrast the OSHA and non-OSHA types of accidents over time. In my case there was pretty good agreement but you won’t know for sure unless you check.

0August 14, 2002 at 6:27 pm #78103

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.If you have a program that will permit graph overlays or one that will permit you to plot multiple Y’s against a given X then you should be able to do Multivariate plotting.

I would recommend taking each response of interest and normalizing them to their respective +-3sigma limits. If we call the -3sigma limit Ymin and the +3sigma limit Ymax then to normalize each Y to a -1 to 1 range you would compute the following:

y(norm) = (y-Yavg)/Ydiff

where Yavg = (Ymax+Ymin)/2

and

Ydiff = (Ymax – Ymin)/2

overlayed plots of the averages and ranges of the normalized Y’s will give you your multivariate plots.

The only problem I can see with graphs of this sort is that for more than 3 Y’s or so the chart will run the risk of being so busy that its utility as an analytical tool may be diminished.0August 12, 2002 at 1:41 pm #78012

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.Here are some of my favorites:

“In matters of scientific investigation the method that should be employed is think, plan, calculate, experiment and first, last, and foremost, think. The method most often employed is wonder, guess, putter, theoize, guess again, and above all avoid calculation.”

-A.G. Webster 1910

“I don’t mind lying, but I can’t stand inaccuracy.” – Samuel Butler

“It’s better to be approximately right than exactly wrong.” – Can’t remember author on this one

The insult

“There’s lies, damned lies, and statistics.” -Twain and others

The response

“Only the truly educated can be moved to tears by statistics.” G.B. Shaw

“Ignorance won’t kill you but it will make you sweat a lot.” -African Proverb

“We trained hard-but it seemed that every time we were beginning to form up into teams we would be reorganized. I was to learn later in life that we tend to meet any new situation by reorganizing; and a wonderful method it can be for creating the illusion of progress while producing confusion, inefficiency and demoralization.” – Petronius; 70 A.D.

“There are no uninteresting problems. There are just disinterested people.”

-John Bailar Jr.

“When attacking a problem the good scientist will utilize anything that suggests itself as a weapon.” – George Kaplan

“There is nothing more tragic than the spectacle of a beautiful theory murdered by a brutal gang of facts.” – Don’t know this author

“Every now and then a fool must, by chance, be right.” – Don’t know this author0August 8, 2002 at 5:04 pm #77958

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.ANOM and ANOVA under many circumstances will give similar results. The key difference between the two is that ANOM compares the sub-group mean to the grand mean of the data under examination while ANOVA compares the sub-group means with one another. Thus the answer to the question “which one is better” is: it depends on the question you wish to answer. If you have access to Minitab open up the help document and do a search on ANOM. You will get a brief overview as well as a number of cites of critical papers on this subject.

If you don’t have access to Minitab then it would appear that the following paper should help answer any other questions you might have:

Ott – Analysis of Means-A Graphical Procedure – JQT, 15, pp.10-18, 19830July 17, 2002 at 6:47 pm #77329

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.I guess we need clarification from Aush. The way I read his/her second post it is possible to set the two independently it’s just that they drift over time. If this is the case then it is possible to set up the (+,+), (-,+) etc. combinations and run the experiment. On the other hand, if they are linked so that only the (+,+) and

(-,-) combinations can be run then, of course, it is not possible to run a design with these two factors since they cannot be varied independently of one another.0July 17, 2002 at 6:06 pm #77325

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.If you can set your variables so that when they are at the “low setting” they drift over some range of values, all of which are lower than the lowest value of your “high setting” , you can go ahead and generate your experimental design in the usual fashion. When it comes to analyzing the design you will have to normalize the range of low and high values. The resulting design matrix will not be a field of -1’s and 1’s but it will be an array with values between -1 and 1.

Such a design will not be perfrectly orthogonal but you will have enough separation of effects to enable you to make statements concerning the effects of your main variables. As for being able to identify interaction effects, the answer will depend on the scatter in your low and high settings.

To simulate this, set up a standard design. Choose a low and a high value and then pick a range around these values and substitute random values from the low and the high values into your design. Re-normalize the design and then check the matrix using some of the diagnostic tools available. In particular, look at the aliasing structure. If you have access to something like SAS, or you know someone who can run it for you, have them put the matrix through Proc Reg and run it with the “vif” and “collin” options.0July 17, 2002 at 5:38 pm #77324

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.The exponential probability density function is:

p(x) = Theta*exp(-Theta*x)

The average for an exponential distribution is 1/Theta

The variance for an exponential distribution is 1/Theta^2

Thus for a mean of 4 minutes we have

4 = 1/Theta so Theta = .25

The probability that the time for stamping is less than three minutes is

P{X<3} = 1 – 1/(exp(Theta*3) = 1 – 1/(exp(.25*3) = .53

So, regardless of the time interval chosen you can expect to see parts turned out in less than 3 minutes about 53% of the time.

Check Brownlee-Statistical Theory and Methodology pp. 42- 60 for more details.0July 12, 2002 at 3:27 pm #77212

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.As I read the posts and the replies I’m left with the impression that there are a number of issues here, however, the main thrust of your question seems to focus on the issue of control limits for a log-normal distribution. The way to deal with this is as follows.

1. log your data

2. compute the average (Mln) and standard deviation (Sln) of the logged terms

i.e. Mln = (Sum (logx))/N, Sln =sqrt[( Sum (logx – Mln)^2)/N-1]

3. For control limits in terms of the original, unlogged terms compute the following:

exp(Mln – t(1-alpha/2)*Sln = lower limit

exp(Mln + t(1-alpha/2)*Sln = upper limit.

These limits will be not be symmetric in the unlogged units but they will include whatever percent of the population you wish to call acceptable i.e. alpha = .05 will give 95% limits etc.0July 11, 2002 at 12:26 pm #77142

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.As James A. observed, the trimming function will remove X% of the lowest and highest values in a distribution. The idea here is to minimize the effect of “outliers” on the estimate of the mean. Like all statistical methods this should not be done blindly. If you have a situation where your data is normal and in the course of your investigation there was a upset that generated a few extreme points that will impact your estimate of the mean then trimming may be of value. If, on the other hand, your data is not normal-too heavy tails, distribution skewed to one side, etc. then trimming will encourage underestimation and, as the inquisition said-you will go wrong with great assurance.

0July 9, 2002 at 1:58 am #77050

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.Neural nets like MARS, are just one of a number of black box non-linear regression techniques. Early papers on the subject tried to compare the functioning of the nets with that of the brain. Current papers and books have backed away from such claims. Black box regression methods are acceptable if you are only interested in mapping inputs to outputs and do not care about cause and effect. Thus neural nets have had demonstated success in detection of credit card fraud and determining the liquid level in a complex shaped vessels.

The biggest problem with the net literature is that an awful lot of it is written and reviewed by people with no real understanding of statistics. Consequently, you will find paper after paper whose claims and conclusions are based on what can only be characterized as a misunderstanding of the advantages and disadvantages of statistical analysis. If you are looking for a good introduction to the topic as well as a list of credible authors I’d recommend the following:

Neural Networks in Applied Statistics – Stern – Technometrics, August 1996, Vol 38 #3

For what it is worth below is a quick translation of some net terms to those of regression:

Neural Nets Regression Analysis

Training Set Initial Data Weights Coefficients Learning Parameter Estimation Optimal Brain Surgery Model Selection/Reduction Network Statistical Model Bias Constant Nodes Sums and Transformations0July 8, 2002 at 2:34 pm #77034

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.I think you will have to be more specific and explain what you consider to be a “correct representation” of your population. If this is a question concerning the representation of the populatiom mean and if you have checked to make sure that the population is normal then you could recast the question in the following manner:

Given that the population is normal, what kind of a sample should I take in order to be 95% certain that the allowable error in the sample mean is L?

For a normal population the confidence limits of the mean are

mean +- 2S/sqrt(n)

Thus if you put L = 2S/sqrt(n) you can compute the sample size (n) that will give you this allowable error.

If, on the other hand, you are interested in characterizing the standard error for simple random sampling then the finite population correction for the population standard deviation will be

[S/sqrt(n)]*sqrt(1-phi) where phi = the sampling fraction = n/N.

An examination of these equations illustrates the point that the standard error of the mean depends mainly on the size of the sample and only to a minor extent on the fraction of the population sampled.0July 8, 2002 at 12:49 pm #77027

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.What you are asking for is roughly the first two years of a 4 year undergraduate course in statistics. There are books that will zip through the list you have posted but I doubt that they will be of much value to you. I can’t offer a single text but I would offer the following:

Basic statistics and hypothesis testing:

A Cartoon Guide to Statistics – by way of introduction

Statistical Theory and Methodology in Science and Engineering-Brownlee

Chapter 1 Mathematical Ideas, Chapter 2-Statistical Ideas – has great graphical depiction of the concept of critical areas, Chapter 6- Control Charts, Chapters 8-10 and 14.

Applied Regression Analysis -2nd Edition – Draper and Smith Chapters 1-4, with particular emphasis on Chapter 3 -The Examination of Residuals, Chapter 8 -Multiple Regression, and Chapter 9 – Multiple Regression Applied to ANOVA

Statistics for Experimenters Box, Hunter, Hunter – the entire book.

for backup have a copy of

Quality Control – Duncan

Regression Analysis by Example – Chatterjee and Price

Statistical Methods – Snedecor and Cochran0July 3, 2002 at 5:43 pm #76893

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.The design your BB is proposing – half fraction of a 2^4 – one complete replicate of the entire fractional factorial – 25 samples per condition for a total of 16 runs and 400 samples would permit an assessment of the effects of process changes on the within run variability and an assessment of the impact of factors on process variability too. The design you are proposing will permit an assessment of the impact of process factors on process variability.

Given what you have written, it sounds like your BB is confusing within and between run variability. If within run variability is indeed of concern then as long as you understand that you will have to compute within run variation and run-to-run variation for each experimental condition and model the two types of variation independently you should have no problem. I’ve built and analyzed a number of designs over the years that focused on the issue of variables impacting process variability but I’ve never had to look at within experiment variation.

For assessing variables impacting process variability, the approach that I have used is to take the resultant design, add one of the design points to that design (for a design replicate) and then replicate the entire design including the replicate point. Thus for each design point you will have a two point estimate of the variability associated with that particular experimental condition and you will have a two point estimate for the variability of the replicate point as well.

If you run a stepwise regression against these computed variabilities you can develop a model describing the process variability as a function of process variables. You can also use the same data to identify those variables impacting the process mean by running your analysis in the usual manner.

Since, with this approach, you only have a two point estimate for the variation at each design point you should focus on big hitters first and worry about interactions later. Both of your designs will only give two point estimates of process variability associated with each design point. Apossible compromise between you and your BB would be to take your full factorial and select those experimental conditions corresponding the the half replicate. Randomize this half fraction and run them and their full replicate first. You will have to include a 9th data point from the fraction for purposes of replication of the process variation. Analyze the data from this and then make a decison as to whether or not you want to continue with the other half of the experiment.0July 3, 2002 at 1:32 pm #76878

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.I was re-reading all of the posts to this thread last night and while each post is excellent advice I think that all of us are at risk of misleading Marty because several of us (myself included) have used the same term to mean different things. This becomes apparent when I re-read Marty’s thank you to all of us.

Given the complexity of the discussion I would first echo Dave’s advice to another poster on a similar topic – get a copy of Box Hunter and Hunter’s book Statistics for Experimenters.

I would like to address what I think is a key mis-communications between all of us (if I am in error in my understanding of the previous posts please accept my apologies in advance).

Replication vs. Duplication

Central to the discussion was the issue of experimental replication. A genuine replicate of an experimental design point requires the experimenter to COMPLETELY rerun the experimental condition. This means that you have to start all over and run the experiment again. Thus, if you are going to replicate an entire design you will have to run double the number of experiments. While, as Dave noted, this will drastically increase your power this can also be very costly.

The compromise that is often used is to run either a replicated center point (assuming that it is possible to build a center point in the design) or to replicate one or two of the design points in the design. While you will not be able to detect as small a difference as you may wish, you will still find that you are able to find significant effects if they are indeed present.

A duplicate is a repeat measure on the same experimental condition. For example, if I am measuring output viscosity of a process and for a single experimental condition I take repeated measurements on the viscosity of that condition every minute for 15 minutes I am taking a duplicate measurement. Multiple grab samples from the output of a machine for a given experimental condition also constitutes duplicate measurements. If I try to treat the results of these duplicate measurements as replicates what I will do is substitute analytical variance for run-to-run variance. In general, analytical variance is much smaller than run-to-run and the computer program will use the analytical variance to determine the significance of the effects. The end result will be that a number of effects will test significant when they really aren’t.

It is possible to use duplicate measurements in your analysis. The field is called repeat measures analysis and you will need the services of a highly trained statistician in order to have any hope of doing it.

If you can get the Box, Hunter, Hunter book check section 10.6 – calculation of standard errors for effects using replicated runs – for further discussion of the difference between duplicate and replicate. You might also want to read section 10.8 which discusses the ways of getting an estimate of error if no replication can be performed.0July 2, 2002 at 4:15 pm #76852

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.

If it is not to difficult to take multiple samples for each experimental condition it is worth the effort if for no other reason than team comfort. If you take the time to do this then you should do the following:

1. Label each sample to indicate time order.

2. Choose the first sample from each group of samples and perform the planned set of measurements.

2. Keep the other samples in reserve.

3. If any of the measured results for any particular experiment are “suprising” pull the additional samples and measure them for confirmation. If the additional samples confirm the initial measurement, put them aside and keep your original measurement. If the duplicates (note these are NOT replicate measurements because they constitute multiple samples from the same experimental run) do not confirm the initial results you will have to investigate to determine which measurement is correct.

4. Run your analysis with a single measurement for each independent experimental run from your DOE.

I wouldn’t recommend averaging anything. You can hide the world behind an average and never see it. You also do not want to include all of your duplicate measurements in your analysis. The reason for this is that your software will interpret these duplicates as genuine replicates and you will wind up with an error estimate based on duplicate, not replicate, variability. Duplicate variability will be much smaller than replicate variability and the end result will be an analysis that indicates significant terms where none really exist.

If questions concerning such things as trending over time should arise you can take advantage of your stored samples and do such things as analyze the last sample in each run and then rerun your DOE analysis to see if the model terms change or if there is a significant shift in the coefficients of the original model.0July 1, 2002 at 1:41 pm #76787

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.It appears that you are using the term interaction in two different ways. What makes it interesting is that both, by themselves, are correct. Let’s try the following:

Two factors X1 and X2

X1 low = 200C, X1 high = 300C

X2 low = 4, X2 high = 6

Experimental combinations for two levels, no reps or center points would be

experiment X1 X2 X1X2

(1) -1 -1 1

a 1 -1 -1

b -1 1 -1

ab 1 1 1

The COLUMN corresponding to the interaction of X1 and X2 is derived by multiplying together the columns for X1 and X2. If you look at the result for

X1X2 for each experiment you see, exactly as you described, an “interaction” for each combination. When it comes to running a regression and getting a model of the form :

Y = a0 +a1*X1 +a2*X2 +a3*X1*X2

you will have the second situation you described, namely that when you plug in the low, low and the high high combinations you will get, for the interaction term, the same value of 1. It is also true that when you plug in low, high and high, low you will also get the same value which, for these combinations, will be -1.

Thus, as you observed, the interaction term in the regression equation will treat the above listed combinations in the same manner. The differences in these combinations, from the standpoint of the regression and the response, will make themselves apparent in the linear terms for X1 and X2. If you should have a regression situation where the only thing that is significant is the interaction, you graph of the response vs X1 and X2 will be a large X.

If this last situation arises, your analysis is telling you that your process can have the same output for two different combinations of your X’s. This could be a good or a bad thing. For example, if you have been running X1 high and X2 low and it would be much cheaper to run X1 low and X2 high your analyis would tell you that, at least for that one response, you could save money just by reversing the levels of X1 and X2.0June 27, 2002 at 1:53 pm #76706

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.

Your measured response can be anything you wish. A go/no go is just a binary response. You could check Analysis of Binary Data by Cox and Snell. In particular you should look at sections 2.6 to 2.8 which discuss multiple regression and factorial arrangements.0June 20, 2002 at 12:49 pm #76529

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.As stated, you seem to be asking two different questions. If I read your question one way it appears that you are asking for a comparison of the effects of length, material, and peak force. If you are interested in just knowing if there is a difference between materials and length and peak force you can set up a three way ANOVA with these variables in the following manner:

PTFE Utem HDPE

Long X X X

Standard X X X

Short X X X

and then run this matix for each of the peak forces (sorry, this format does not allow me to draw the matrix as a 3 dimensional box)

The X’s represent your choice of the number of samples per treatment combination. Since many programs cannot handle unbalanced designs you will probably have to make sure that you have the same number of measurements per treatment combination. If you do all of the usual checks for variance equality you can use the Scheffe method to check all of the means against each other.

This will answer the question concerning the mean differences connected with material, length, and peak force.

If peak force is the measured response then the problem reduces to a two way ANOVA and everything I mentioned above still applies except that you now have one less dimension to the problem.

ANOVA will only give you an understanding of which means are different from the others. In order to say, as you wrote “PTFE produces the same peak-force as HDPE, hence we dont have to spend more money guying PTFE, etc” you will need to take the same data and run it through a regression.

Since length and peak force would appear to be continuous variables, you can code these in the usual way for doing a regression. For the materials you will have to use dummy variables.

Code in the following manner:

if PTFE v1 = 1, v2 = 0

if Utem v1 = 0, v2 = 1

if HDPE v1 = 0, v2 = 0

Run a regression with coded variables for peak force, length and v1 and v2 (of course, as above, if peak force is the response then just use length and v1 and v2). The resulting equation will permit an assessment of equivalence of effects.0June 17, 2002 at 12:59 pm #76460

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.Two factors at 6 levels will be a 6**2 experiment for a total of 36 experiments. I’m not aware of any package that will do this for you. The easiest way to set this up is to take a piece of graph paper and simply plot out the 36 points that would be part of the 6×6 matrix. Before doing any of this however, I would recommend asking some hard questions concerning the need for 6 levels. In the vast majority of cases this is definitely overkill. The operating philosophy behind DOE is that if change is going to be observed it will best be seen by contrasting extremes-hence the focus on 2 and 3 level designs.

If your circumstances are such that you will not be premitted to consider less than 6 levels per factor, I’d recommend arguing for a 3 level “screening” design over the same region. This would give you a 9 point design and with a couple of replicates you would have 11 experiments which would permit a check of all interactions and all linear and curvilinear effects. It is true that with such a design only the corner points would correspond exactly to points from a 6 level design but I’d have a hard time believing that the small difference between the other points of a three level design and those of a 6 level design would make that much difference. Thus, if there was still some doubt you could use the 3 level design as a starting point and then fill in other areas of the design with points from the 6 level design. Since you could use your regression equation to predict the responses at the levels of the 6 factor design. the additional design points would act as confirmation runs for the findings from your initial effort.

0June 12, 2002 at 6:11 pm #76348

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.Thanks David, that gives me a better understanding of what you are trying to do. Over lunch I sat down and re-read Chapter 10 of Cochran and Cox and based on your description of the Multiple Subjective Evaluation Technique it appears that it is wedded to the Lattice Square protocol. This would explain why you had to have 4 dummie samples in order to get up to 9 treatments with 3 samples each. Going by the book, this would also mean that an increase to 5 samples would violate that protocol and, at least using Lattice Squares, I can’t see a way around this. However, there are other possibilities-more on this in a minute.

To continue with this thought, a full factorial with 16 treatments would require 4 samples per treatment and, again based on the book, it would appear that your choices of designs and levels are dictated by the Lattice Square requirements.

The main problem I’m having is trying to understand why one would use a Lattice Square to set up a rating plan. I can understand using such a plan to guarantee randomization with respect to raters and samples but if one is going to use a DOE this usually means that one is interested in expressing a given rating as a function of process variables. I can’t offer any more on this line but I am curious and I’ll have to look into this some more.

If you are interested in expressing a rating as a function of process variables you can run a regression on the discrete Y’s. Many Six Sigma courses take the very conservative aproach to regression and state that this is incorrect. This is just to make sure that you don’t make too many mistakes when you first use regression methods. If your attribute data is in rank form, for example 1-5, best to worst, rating on a scale from 1-100, a bunch of defects, not so many defects, a few defects, etc. You can use the numbers or assign meaningful rank numbers to the verbal scores and run your regression on these responses. If you do this, you can take advantage of more sample ratings (which is what I’m assuming you want to do when you asked about increasing the samples from 3 to 5) without having to worry about the restrictions of Lattice Squares.

There are a number of things that you should keep in mind if you try this:

1. As with any attribute protocol, all of your raters of the attribute in question must be trained by the same person, with the same materials, so that their ratings will be consistent.

2. The discrete nature of the Y’s will probably mask interaction effects so if you are trying to fine tune a process, as opposed to looking for major factors impacting the process, there is a chance that your success will be limited.

3. Your residual analysis, particularly your residual plots of residuals vs. predicted are going to look odd. They will consist of a series of parallel lines each exhibiting a slope of -1.

4. The final model may indeed be able to discrminate between one rating and the next but a much more probable outcome will be the ability to only discriminate between good, neutral, and bad or even just good and bad.

An example of running a regression on the Y response of “How you rate your supervisor” can be found in Chapter 3 of Regression Analysis by Example by Chatterjee and Price and if you want to check up on the residual patterns you can read Parallel Lines in Residual Plots by Searle in the American Statistician for August 1988.0June 12, 2002 at 2:06 pm #76341

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.I may be wrong but I think you will have to provide more detail before anyone can offer any constructive thoughts. Lattice squares are pretty restrictive with respect to treatments, replicates, and such. Based on what I know of them 8 treatments are not permitted. The book you reference, Cochran & Cox ,indicates that “the number of treatments must be an exact square”. In the edition I use they further list the useful plans that they knew of at the time of printing. These consisted of designs for 9, 16, 25, 49, 64, and 81 treatments. Furthermore, this type of design, like the Latin Square, is used primarily in those cases where the treatment cannot be viewed as a continuous variable – for example 4 different tire types and 4 different driving styles. If you could provide some more details concerning the experimental effort you are attempting I, for one, would be willing to try to offer some suggestions.

0June 10, 2002 at 12:22 pm #76246

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.jay,

That is correct. A well controlled X variable will typically not appear as significant when running a regression on historical data. Given your interest in the subject , Dave’s comment, which was seconded by others, is a good one. The Box, Hunter, Hunter books is a very readable statistics book and it covers many of the issues that you will face in your efforts.0June 6, 2002 at 1:15 pm #76142

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.You can certainly use regression analysis to look at historical data and the results may help guide you in your thinking but there are a number of caveats that you need to keep in mind.

1. There is a very high probability that your analysis will show variables that you know to be important to your process are not significant. The reason this will occur is because these are the process variables that you control. The fact that they don’t appear merely suggests that you have done an excellent job of controlling them. Because of this control, any interactions with these variables will also appear to be not significant.

2. You will have to do a great deal of data preparation. In particular you will have to perform a full blown regression analysis of your X’s – eigenvector analysis, VIF, etc. What you cannot do, unless you plan on going wrong with great assurance, is to take your X’s and just plug them into a simple correlation matrix.

3. Data consistency: When recording production data people do not record everything that is done to the process. Consequently, many changes are made to variables that are not part of record keeping process. If you have enough data and enough X’s to play with there is a good chance that, just by dumb luck, some of these unrecorded changes will correlate with the X’s that were tracked. The end result will be correlation with no hope of identifiying the underlying cause.

Your initial attempt to ask for suggestions concerning possible factors is a good one. You mentioned that this was a mistake. I guess I’d like to know why this was so. Too many X’s? Too few? If it was too many I’d recommend brainstorming with a wider audience and then a secret ballot of all of the proposals with everyone rating the list from most important to least important. If the people filling out the ballots are the people who know the process you should wind up with a pretty decent set X’s to investigate. Also, remember that for a first look you are going after big hitters. Don’t sweat the interactions-use saturated designs -15 variables in 16 experiments + one additional experiment for error estimate, 31 variables in 32 experiments…etc.0June 3, 2002 at 12:28 pm #76024

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.From your description you have a design that will permit a check of main effects as well as interactions. If you have no replicates and you want to check for significance of your effects the usual pactice is to use the mean squares of your highest order interaction in place of the mean squares for the error. Thus the interaction mean squares will become your measurement of noise against which all other mean squares are compared. I’m not familiar with Minitab but since this approach is standard statistical practice it is very likely that this is what Minitab is doing.

As a quick check plug the following into your program

Exp#1 A= -1, B = -1 response = 210

Expt#2 A = 1, B = -1, response = 240

Expt#3 A = -1, B = 1, response = 180

Expt#4 A = 1, B = 1, response = 200

you should get MS for A = 625, MS for B = 1225, and MS for Error (the AB interaction) = 25

If you get this result then Minitab is indeed using this method.0May 29, 2002 at 9:19 pm #75913

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.If you are using the attribute measurements as independent variables you can set up a dummy variable matrix for these X’s and run the regression against the dummy variables.

For example if you have an independent attribute measurement for texture and the choices are smooth, not so smooth, and rough you can set up the dummy variables v1, and v2 in the following matrix:

if smooth then v1=1, v2 = 0

if not so smooth then v1=0, v2=1

if rough then v1=0, v2=0

The number of dummy variables will always be one less than the number of categories. A regression model built using the variables v1 and v2 will express the response as a function of the texture.

A number of six sigma courses teach that regression cannot be used when the X’s are discrete. This is just a very conservative approach to make sure that you don’t make too many mistakes when you start using regressiion methods.

I would recommend that if you are going to use variables in this fashion that you do a little reading first. A good starting point would be Regression Analysis by Example by Chatterjee and Price. Chapter 4 is titled Qualitative Variables as Regressors and should answer questions you may have about this approach.0May 24, 2002 at 5:50 pm #75772

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.I think you will have to give some more specifics about your probem before anyone can offer meaningful suggestions. As stated, your question doesn’t make sense.

If you have data over time and you are wanting to model the data using time series methods then you need to investigate the date using ARIMA techniques. You will have to address issues of stationarity, overfitting etc.

If you have data that has been gathered over time and you are interested in trying to correlate that data with parameters known to have changed during that time you will have to check the independence of those parameters and then proceed remembering to consider all of the caveats concerning efforts involving regression and regression diagnostics.

In either case, the issue of data normality does not matter. Normality is an issue when investigating the residuals of your time series/regression efforts.

If you could give a clearer discription of what you are trying to do I,for one, would be glad to try to offer some suggestions.0May 23, 2002 at 4:58 pm #75700

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.Does correlation imply causation? Give an example either way.

Does causation imply correlation? Give an example either way.

If something is statistically significant does this guarantee that it matters?

If I had 151 variables to check at two levels how many one-at-a-time experiments would I have to run?

Assuming I ran one experiment a second how long would it take to finish?

What is the smallest saturated design that I could use?

Given that I ran one experiment a second, how long would it take to run the saturated design?

Given that each experiment cost $1,000 to run – estimate the cost savings of the saturated design as opposed to the one-at-a-time approach.0May 23, 2002 at 1:45 pm #75684

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.Actually, the point made concerning the relationship between the t and the F statistic goes much deeper.

“The distribution of a unit normal deviate to the square root of an independent Chi square with f degrees of freedom, divided by f, is the t distribution with f degrees of freedom. The ratio of Chi square(f)/f tends to 1 as f tends to infinity. Thus the t distribution with infinite degrees of freedom is identical with the standardized normal distribution.

The t distribution is related to the F distribution because if we made the degrees of freedom =1 for the numerator of the F test we will have

F(1,f2) = t**2(f2)

For example F.95(1,12) = 4.75 = 2.179**2 = [t.975(12)]**2″

The above is from Brownlee, 2nd Edition pp.289=290.0May 22, 2002 at 8:48 pm #75659

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.As has been noted by others, the t-test is for a test of two means or averages. The F test is for a test of two variances. A quick check of various references indicates that you will find information on the F test listed under “F test” or “F distribution”.

The basic t-test which most packages use assumes equality of variance with respect to the two populations under consideration. Many also assume equal sample sizes. You need to check you variances for equality. If they are not significantly different you can go ahead and use the default t-test available in most packages. If they are significantly different you will have to run a t-test with unequal variances….and if the sample sizes are not the same you will have to run one with unequal sample sizes and unequal variances. To the best of my knowledge, unequal variance t-tests are not available as point and clicks in any packages. You will have to do that test from first principles.

References:

F test, F distribution

Quality Control and Industrial Statistics – Duncan -listed as ” F distribution , in testing ratio of two variances

Statistical Methods -Snedecor and Cochran – listed as “F test”

Statistical Theory and Methodology – Brownlee – listed as “F distribution”

t-test unequal sample sizes, unequal variances

Statistical Theory and Methodology – Brownlee – 2nd Edition pp.297-304

There are, of course, any number of reference books you could check-I’m just including a list of some of those with which I am familiar.0May 17, 2002 at 2:40 pm #75554

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.I guess it depends on what you have in mind for investigation. If you are looking to apply Six Sigma to proficiency testing I would have to recommend that you look elsewhere. I’ve done a lot of analysis of the proficiency data for my local school district and most of the factors that one has to address in order to use Six Sigma in that environment are completely out of your control. For instance, in order to do any kind of a gauge R&R on the sensitivity of the state proficiency test you would have to be able to have some control over what the state does to the test on a year-by-year basis. For my state, my analysis shows beyond a shadow of a doubt that the reading level of the tests is so high that while my state thinks they are testing math, science, citizenship, reading, and writing they are actually just testing reading four times and writing once.

While question content varies from year to year the high reading bar of the tests has not changed. Thus the tests are of little value when it comes to assessing the quality of the math, science, and citizenship programs.

0May 16, 2002 at 4:06 pm #75518

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.We have paired data from two treatments A (before) and B (after). The null hypothesis is that nothing happened. We measure the tube diameter using a set of fixed pins and determine measurements in one of a couple ways.

1. specific pin fits tube #1 before but does or does not after treatment

2. specific pin fits tube #1 before, doesn’t fit after but either a smaller or a larger pin will now fit

In the first instance we have three possibilities and the differences between before and after will be either 0 – same pin fit, 1 -larger pin fit, or -1 smaller pin fit.

In the second case we will measure the diameters by checking first to see if the same pin will fit before and after giving a 0 difference. If the same pin doesn’t fit we will find the closest larger or smaller pin that does fit and take the difference between the original diameter measurement and the new, closest fit.

If there has been no effect the null hypothesis that we are checking is that the median of the differences is zero.

Take the differences and assign a value of Z as follows:

Z = 1 if the difference is greater than zero

Z = 0 if the difference is less than zero.

The original distribution is continuous and the distribution of the differences will also be continuous. Since the differences are independent the Z values are also independent so we have a binomial situation of making n independent trials in which the probability of Z is 1/2 on each trial.

The probability of a tie (i.e. a difference of 0) is assumed zero. Since this won’t occur in practice those differences which are 0 are excluded from the analysis and the number of samples for the test are reduced by the number of zero differences in the data set. Thus no values of Z for ties and the sample size is reduced by the number of ties as well.

For small sample sizes the probability that the median of the differences is 0 is given by

1/(2)**n * Sum (n|x) where the sum is over x is from 0 to n-m and

n = number of non zero differences and m = number of positive differences and n|x is the ratio of the factorials.

For larger sets we can use the normal approximation to the binomial which is

U(1-p) = (m-.5 – n*.5)/sqrt(n*.5*(1-.5)

using the normal lookup table you can find the value for 1-p and then the probability of zero median is computed directly.

Example:

We have 16 differences between paired samples whose differnces are as follows:

.3, -1.7, 6.3, 1.6, 3.7,-1.8, 2.8, .6, 5.8, 4.5, -1.4, 1.9, 1.7, 2.4, 2.3, 6.8

therefore Z=1 for 13 of these differences and Z=0 for three of them

First way = (1/2)**16 Sum from 0 to 16-13 of (16|x)

= (1/65,536)*(16!/0!16! +16!/1!15! +16!/2!14! +16!/3!13!) =.01064

or

U(1-p) = (13-.5 -16*.5)/(sqrt(16*.5*(1-.5)) = 2.25

thus 1-p = .9878 and p = .0122

Depending on the book this is called sign test, discrete scales, or randomization test. Doing a bean count of my reference books sign test seems to be the most common label.0May 9, 2002 at 5:59 pm #75340

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.I’ve built and analyzed a number of DOE’s where the response variable was variance. The simplest approach is to replicate the entire design and compute the variance of the two samples for each experimental condition. Granted, variability based on two samples is not what one would normally want when estimating variance as a Y response but this approach does work. My approach has been to use saturated designs in order to screen as many variables as possible and to replicate the design 3 times. Three times guards against the loss of a single run and thus against the loss of an experimental data point for the analysis of variables impacting the variance. Of course, with such an approach you also can analyze for variables impacting mean shift as well. If 3 reps is too many you can get by with 2 but if one of the horses die you will need to run regression diagnostics on the leftover design in order to determine what X variables you can still use for an analysis of the variance.

0May 9, 2002 at 3:33 pm #75330

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.That was the best they could do????

I’m sure this group could do better. For starters how about…

Should I Guess My Answer

Shouted Insults Get My Attention

SIGnificant Mistakes Avoided

….the possibilities are endless.0May 8, 2002 at 1:39 pm #75261

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.If I understand your problem correctly you have a series of pins of given diameters and you test for pin fit to the inner diameter before and after annealing. If this is the case you have paired samples with a measurement system that can be viewed as a discrete scale with a limited range of values. In this case the way to test the mean difference between the two groups is to analyze the data using a t test with an inclusion of a correction for continuity.

If you have say 15 tubes measured before and after annealing you would take the differences between the measurements and sum them. The null hypothesis between non-annealed and annealed is that the signs on the differences are equally likely to be + or -.

Another way to check for significant differences with this kind of data would be Fisher’s randomization test. The methods for setting up your data and analyzing it using either of the above techniques can be found on pp.146 of the Seventh Edition of Statistical Methods by Snedecor and Cochran0May 7, 2002 at 4:28 pm #75230

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.The ASQ book Measuring Customer Satisfaction-Survey Design, Use, and Statistical Analysis Methods, by Hayes is the best book I have read on the subject. As for building your own survey I would recommend, after you have read the book, using Infopoll. They are a web based survey group and you can build your survey and have them host it. I’ve used them for our company customer satisfaction survey. I plan to use them again next year when we do our next survey.

0April 26, 2002 at 5:01 pm #74882

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.

If I understand your problem correctly you have a situation with multiple gauges and multiple lots and you are concerned about gauge and lot differences. If this is the case then the easiest way examine linearity of gauges and lots is to set up a two way ANOVA with gauges and lots being your two variables of interest. If you were to then run multiple samples from each lot (say 5 per gauge) you would have the grid illustrated below and you could use the data from this experiment to check for between and within gauge and lot variability and mean differences. In order to check for gauge linearity over some range of values you could either choose your lots for different levels or, if you can identify distinct levels within lots you could run a three way ANOVA with levels as the third variable.

G1 G2 G3

Lot 1 5 5 5

Lot 2 5 5 5

0April 4, 2002 at 7:55 pm #74009

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.Ok, so now that everyone has had fun at Diane’s expense let’s see if we can’t work the problem. Based on the rather sketchy information provided, you leave one with the impression that your firm makes decisions based on averages. If this is the case then that is indeed an error. The key point to remember about an average is that while it is numeric in nature it is not a pure number bereft of meaning. An average is a descriptor and what it attempts to describe is the central tendency of a group of numbers (that is, give you some understanding of what is typical). Since you are working for an insurance company there is a very good chance that the average is not representative of typical. A better measure might be the median-the 50% point (that is 50% of the data is less than the median and 50% is greater-think of the grassy strip between the lanes of a super highway-half of the traffic on one side and half on the other-except, of course, during rush hour). Even just the median will not suffice to get an understanding of what your process is about.

It sounds like your BB is attempting to get you and others to really look at the distribution of whatever it is that you are measuring in order to gain an appreciation of what your process is doing and why averages by themselves are meaningless. One simple test for yourself would be to make a histogram of your data using Excel just to see how little information is conveyed by the average.

If your BB has access to a plotting routine that will permit him to show your process as time ordered box plots (box plots by say week or month) and if your BB is able to explain box plots to you then the two of you may just be able to start to understand each others point of view. In particular, he may be able to help you understand the concept of variance about the mean or the median and what that suggests to him about the overall process. If he can convey this to you then perhaps his comments concerning increased performance and reduced average completion time may begin to make some sense. If it does then perhaps you and he can get on with the business of trying to set your company straight.0March 22, 2002 at 6:45 pm #73535

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.Joe BB makes some interesting assertions concerning black belt training and his reference to Roger’s article is interesting but I think misses the point. As Roger says a black belt is not a statistician. Unfortunately, what Rogers says is not what is being heard out in the real world. The sad fact is that in way too many instances a black belt is in fact viewed as a competent statistician and regardless of his/her particular state of statistical ignorance his/her statements are taken to be statistical truth. The reason for this is, I think, because most managers have absolutely no understanding of statistical concepts nor do they have any real understanding of the effort needed the aquire and properly apply statistical training.

As for the notion of a BB absorbing more in a given period of time, this is all well and good. However, the issue is not about absorbing information, it is about correctly applying what little you have learned and, more importantly, seeking competent help in those situations where you are completely out of your depth.

Lest you decide that I’m “whining about BB’s” rest assured that I’m not. BB’s who believe that they are statistically competent after 4 weeks of training and who allow their management to treat them as such deserve more than a whine. Such people deserve censure because they are in the position of doing great damage, not only to the institutions for which they work but also to the idea that statistical analysis is of value in the industrial setting.0March 21, 2002 at 9:16 pm #73496

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.Andy Schlotter is correct. The issue of variation reduction has been a central tenet of statistical analysis from the beginning. As a formal area of study statistics is over 200 years old. If you really want to read something on the origins of variation reduction and indeed on the origins of most of the major issues of statistical focus, I’d recommend “The History of Statistics before 1900”.

0March 19, 2002 at 1:34 pm #73383

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.You can use anything you want for a response in a DOE. The issue of normality only comes into play when you are developing your regression equations and wish to test for levels of signficance of the factors. Then, the issue of normailty focuses not on the responses or on the independent factors but on the normality of the residuals. Indeed,

If the Y’s are not independent of one another this will be reflected in the terms entering your regression models. You will discover (assuming that your X’s are indeed independent of each other) that Y’s that are not independent will tend to have the same terms in their correlation equations and the magnitudes and signs of the respective coefficients of the X’s (given that these have been normalized to a -1,1 range) will be similar.

For a discussion of these issues I would recommend pages 22-33 of Applied Regression Analysis Second Edition by Box and Draper.0March 11, 2002 at 1:46 pm #73076

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.There are a number of good points that have been brought up in this discussion thread, however, based on the description of the process, I would recommend that you first check the existing subgroup averages for auto-correlation. We have multispindle machines here and we had the x-bar r charts in place and our operators were following all of the good practices that one would expect of an SPC effort. The problem was that, based on our charting, our process wasn’t that good and we were continually reacting to “out of control” signals from the charts. An examination of the subgroup averages indicated that they were not independent measurements and consequently, our control limits were not reflective of the actual process capability -in short they were too narrow. After a proper assessment of the auto-correlation we found that in order to insure that the numbers we were plotting indeed met the criteria of independent measurements (and thus that we had meaningful x-bar r charts) we had to build the charts using every 4th subgroup average and range.

0March 8, 2002 at 11:56 pm #73019

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.Hmmmmm, based on your description I would assume that you are supposed to compute the t statistic for a paired experiment. If this is the case then the expression for standard deviation for the paired differences would be as follows:

Average Difference = 20.8

Sum Differences = 208

Sum of squared differences = 3910

number of pairs = 20

Thus

Square Root ( (3910 – (208*208/20))/19) = 9.59 = standard deviation of the differences

Thus the estimate of the standard deviation of the sample mean difference would be

9.59/Square Root(20) = 2.14

To test to see if the average difference 20.8 is significantly different from zero the t statistic would be 20.8/2.14 = 9.7 which for 19 degrees of freedom is significant. Thus the average difference is significantly different from zero for that particular standard deviation.

If you need to read some more on differences from paired experiments you might want to check pp.84-85 of the Seventh Edition of Statistical Methods by Snedecor and Cochran.

Hope this is of some help.

0 - AuthorPosts