# Robert Butler

## Forum Replies Created

Viewing 32 posts - 2,401 through 2,432 (of 2,432 total)
• Author
Posts
• #77034 Robert Butler
Participant

I think you will have to be more specific and explain what you consider to be a “correct representation” of  your population. If  this is a question concerning the representation of the populatiom mean and if you have checked to make sure that the population is normal then you could recast the question in the following manner:
Given that the population is normal, what kind of a sample should I take in order to be 95% certain that the allowable error in the sample mean is L?
For a normal population the confidence limits of the mean are
mean +- 2S/sqrt(n)
Thus  if you put L = 2S/sqrt(n)  you can compute the sample size (n) that will give you this allowable error.
If, on the other hand, you are interested in characterizing the standard error for simple random sampling then the finite population correction for the population standard deviation will be
[S/sqrt(n)]*sqrt(1-phi) where phi = the sampling fraction = n/N.
An examination of these equations illustrates the point that the standard error of the mean depends mainly on the size of the sample and only to a minor extent on the fraction of the population sampled.

0
#77027 Robert Butler
Participant

What you are asking for is roughly the first two years of a 4 year undergraduate course in statistics.  There are books that will zip through the list you have posted but I doubt that they will be of much value to you.  I can’t offer a single text but I would offer the following:
Basic statistics and hypothesis testing:
A Cartoon Guide to Statistics – by way of introduction
Statistical Theory and Methodology in Science and Engineering-Brownlee
Chapter 1 Mathematical Ideas, Chapter 2-Statistical Ideas – has great graphical depiction of the concept of critical areas,  Chapter 6- Control Charts, Chapters 8-10 and 14.
Applied Regression Analysis -2nd Edition – Draper and Smith Chapters 1-4, with particular emphasis on Chapter 3 -The Examination of Residuals, Chapter 8 -Multiple Regression, and Chapter 9 – Multiple Regression Applied to ANOVA
Statistics for Experimenters Box, Hunter, Hunter – the entire book.
for backup have a copy of
Quality Control – Duncan
Regression Analysis by Example – Chatterjee and Price
Statistical Methods – Snedecor and Cochran

0
#76893 Robert Butler
Participant

The design your BB is proposing – half fraction of a 2^4 – one complete replicate of the entire fractional factorial – 25 samples per condition for a total of 16 runs and 400 samples would permit an assessment of the effects of process changes on the within run variability and an assessment of the impact of factors on process variability too.  The design you are proposing will permit an assessment of the impact of process factors on process variability.
Given what you have written, it sounds like your BB is confusing within and between run variability.  If within run variability is indeed of concern then as long as you understand that you will have to compute within run variation and run-to-run variation for each experimental condition and model the two types of variation independently you should have no problem. I’ve built and analyzed a number of designs over the years that focused on the issue of variables impacting process variability but I’ve never had to look at within experiment variation.
For assessing variables impacting process variability, the approach that I have used is to take the resultant design, add one of the design points to that design (for a design replicate) and then replicate the entire design including the replicate point.  Thus for each design point you will have a two point estimate of the variability associated with that particular experimental condition and you will have a two point estimate for the variability of the replicate point as well.
If you run a stepwise regression against these computed variabilities you can develop a model describing the process variability as a function of process variables.  You can also use the same data to identify those variables impacting the process mean by running your analysis in the usual manner.
Since, with this approach, you only have a two point estimate for the variation at each design point you should focus on big hitters first and worry about interactions later.  Both of your designs will only give two point estimates of process variability associated with each design point. Apossible compromise between you and your BB would be to take your full factorial and select those experimental conditions corresponding the the half replicate.  Randomize this half fraction and run them and their full replicate first.  You will have to include a 9th data point from the fraction for purposes of replication of the process variation.  Analyze the data from this and then make a decison as to whether or not you want to continue with the other half of the experiment.

0
#76878 Robert Butler
Participant

I was re-reading all of the posts to this thread last night and while each post is excellent advice I think that all of us are at risk of misleading Marty because several of us (myself included) have used the same term to mean different things.  This becomes apparent when I re-read Marty’s thank you to all of us.
Given the complexity of the discussion I would first echo Dave’s advice to another poster on a similar topic – get a copy of Box Hunter and Hunter’s book Statistics for Experimenters.
I would like to address what I think is a key mis-communications between all of us (if I am in error in my understanding of the previous posts please accept my apologies in advance).
Replication vs. Duplication
Central to the discussion was the issue of experimental replication.  A genuine replicate of an experimental design point requires the experimenter to COMPLETELY rerun the experimental condition.  This means that you have to start all over and run the experiment again.  Thus, if you are going to replicate an entire design you will have to run double the number of experiments.  While, as Dave noted, this will drastically increase your power this can also be very costly.
The compromise that is often used is to run either a replicated center point (assuming that it is possible to build a center point in the design) or to replicate one or two of the design points in the design.  While you will not be able to detect as small a difference as you may wish, you will still find that you are able to find significant effects if they are indeed present.
A duplicate is a repeat measure on the same experimental condition.  For example, if I am measuring output viscosity of a process and for a single experimental condition I take repeated measurements on the viscosity of that condition every minute for 15 minutes I am taking a duplicate measurement. Multiple grab samples from the output of a machine for a given experimental condition also constitutes duplicate measurements. If I try to treat the results of these duplicate measurements as replicates what I will do is substitute analytical variance for run-to-run variance.  In general, analytical variance is much smaller than run-to-run and the computer program will use the analytical variance to determine the significance of the effects.  The end result will be that a number of effects will test significant when they really aren’t.
It is possible to use duplicate measurements in your analysis.  The field is called repeat measures analysis and you will need the services of a highly trained statistician in order to have any hope of doing it.
If you can get the Box, Hunter, Hunter book check section 10.6 – calculation of standard errors for effects using replicated runs – for further discussion of the difference between duplicate and replicate.  You might also want to read section 10.8 which discusses the ways of getting an estimate of error if no replication can be performed.

0
#76852 Robert Butler
Participant

If it is not to difficult to take multiple samples for each experimental condition it is worth the effort if for no other reason than team comfort.  If you take the time to do this then you should do the following:
1. Label each sample to indicate time order.
2. Choose the first sample from each group of samples and perform the planned set of measurements.
2. Keep the other samples in reserve.
3. If any of the measured results for any particular experiment are “suprising” pull the additional samples and measure them for confirmation.  If the additional samples confirm the initial measurement, put them aside and keep your original measurement.  If the duplicates (note these are NOT replicate measurements because they constitute multiple samples from the same experimental run) do not confirm the initial results you will have to investigate to determine which measurement is correct.
4. Run your analysis with a single measurement for each independent experimental run from your DOE.
I wouldn’t recommend averaging anything.  You can hide the world behind an average and never see it.  You also do not want to include all of your duplicate measurements in your analysis.  The reason for this is that your software will interpret these duplicates as genuine replicates and you will wind up with an error estimate based on duplicate, not replicate, variability.  Duplicate variability will be much smaller than replicate variability and the end result will be an analysis that indicates significant terms where none really exist.
If questions concerning such things as trending over time should arise you can take advantage of your stored samples and do such things as analyze the last sample in each run and then rerun your DOE analysis to see if the model terms change or if there is a significant shift in the coefficients of the original model.

0
#76787 Robert Butler
Participant

It appears that you are using the term interaction in two different ways.  What makes it interesting is that both, by themselves, are correct.  Let’s try the following:
Two factors  X1 and X2
X1 low = 200C, X1 high = 300C
X2 low = 4, X2 high = 6
Experimental combinations for two levels, no reps or center points would be
experiment    X1    X2   X1X2
(1)           -1     -1          1
a              1    -1         -1
b             -1     1        -1
ab              1     1          1
The COLUMN corresponding to the interaction of X1 and X2 is derived by multiplying together the columns for X1 and X2.  If you look at the result for
X1X2 for each experiment you see, exactly as you described, an “interaction” for each combination.  When it comes to running a regression and getting a model of the form :
Y = a0 +a1*X1 +a2*X2 +a3*X1*X2
you will have the second situation you described, namely that when you plug in the low, low and the high high combinations you will get, for the interaction term, the same value of 1.  It is also true that when you plug in low, high and high, low you will also get the same value which, for these combinations, will be -1.
Thus, as you observed, the interaction term in the regression equation will treat the above listed combinations in the same manner.  The differences in these combinations, from the standpoint of the regression and the response, will make themselves apparent in the linear terms for X1 and X2.  If you should have a regression situation where the only thing that is significant is the interaction, you graph of the response vs X1 and X2 will be a large X.
If this last situation arises, your analysis is telling you that your process can have the same output for two different combinations of your X’s.  This could be a good or a bad thing.  For example, if you have been running X1 high and X2 low and it would be much cheaper to run X1 low and X2 high your analyis would tell you that, at least for that one response, you could save money just by reversing the levels of X1 and X2.

0
#76706 Robert Butler
Participant

Your measured response can be anything you wish. A go/no go is just a binary response.  You could check Analysis of Binary Data by Cox and Snell.  In particular you should look at sections 2.6 to 2.8 which discuss multiple regression and factorial arrangements.

0
#76529 Robert Butler
Participant

As stated, you seem to be asking two different questions.  If I read your question one way it appears that you are asking for a comparison of the effects of length, material, and peak force. If you are interested in just knowing if there is a difference between materials and length and peak force you can set up a three way ANOVA with these variables in the following manner:
PTFE             Utem         HDPE
Long               X                    X                 X
Standard         X                    X                 X
Short               X                    X                 X
and then run this matix for each of the peak forces (sorry, this format does not allow me to draw the matrix as a 3 dimensional box)
The X’s represent your choice of the number of samples per treatment combination.  Since many programs cannot handle unbalanced designs you will probably have to make sure that you have the same number of measurements per treatment combination.  If you do all of the usual checks for variance equality you can use the Scheffe method to check all of the means against each other.
This will answer the question concerning the mean differences connected with material, length, and peak force.
If peak force is the measured response then the problem reduces to a two way ANOVA and everything I mentioned above still applies except that you now have one less dimension to the problem.
ANOVA will only give you an understanding of which means are different from the others.  In order to say, as you wrote “PTFE produces the same peak-force as HDPE, hence we dont have to spend more money guying PTFE, etc” you will need to take the same data and run it through a regression.
Since length and peak force would appear to be continuous variables, you can code these in the usual way for doing a regression.  For the materials you will have to use dummy variables.
Code in the following manner:
if PTFE     v1 = 1, v2 = 0
if Utem    v1 = 0, v2 =  1
if HDPE   v1 = 0, v2 = 0
Run a regression with coded variables for peak force, length and v1 and v2 (of course, as above, if  peak force is the response then just use length and v1 and v2).  The resulting equation will permit an assessment of equivalence of effects.

0
#76460 Robert Butler
Participant

Two factors at 6 levels will be a 6**2 experiment for a total of 36 experiments.  I’m not aware of any package that will do this for you.  The easiest way to set this up is to take a piece of graph paper and simply plot out the 36 points that would be part of the 6×6 matrix.  Before doing any of this however, I would recommend asking some hard questions concerning the need for 6 levels.  In  the vast majority of cases this is definitely overkill.  The operating philosophy behind DOE is that if change is going to be observed it will best be seen by contrasting extremes-hence the focus on 2 and 3 level designs.
If your circumstances are such that you will not be premitted to consider less than 6 levels per factor, I’d recommend arguing for a 3 level “screening” design over the same region.  This would give you a 9 point design and with a couple of replicates you would have 11 experiments which would permit a check of all interactions and all linear and curvilinear effects.  It is true that with such a design only the corner points would correspond exactly to points from a 6 level design but I’d have a hard time believing that the small difference between the other points of a three level design and those of a 6 level design would make that much difference. Thus, if there was still some doubt you could use the 3 level design as a starting point and then fill in other areas of the design with points from the 6 level design.  Since you could use your regression equation to predict the responses at the levels of the 6 factor design. the additional design points would act as confirmation runs for the findings from your initial effort.

0
#76348 Robert Butler
Participant

Thanks David, that gives me a better understanding of what you are trying to do.  Over lunch I sat down and re-read Chapter 10 of Cochran and Cox and based on your description of the Multiple Subjective Evaluation Technique it appears that it is wedded to the Lattice Square protocol.  This would explain why you had to have 4 dummie samples in order to get up to 9 treatments with 3 samples each.  Going by the book, this would also mean that an increase to 5 samples would violate that protocol and, at least using Lattice Squares, I can’t see a way around this.  However, there are other possibilities-more on this in a minute.
To continue with this thought, a full factorial with 16 treatments would require 4 samples per treatment and, again based on the book, it would appear that your choices of designs and levels are dictated by the Lattice Square requirements.
The main problem I’m having is trying to understand why one would use a Lattice Square to set up a rating plan.  I can understand using such a plan to guarantee randomization with respect to raters and samples but if one is going to use a DOE this usually means that one is interested in expressing a given rating as a function of process variables.  I can’t offer any more on this line but I am curious and I’ll have to look into this some more.
If you are interested in expressing a rating as a function of process variables you can run a  regression on the discrete Y’s.  Many Six Sigma courses take the very conservative aproach to regression and state that this is incorrect. This is just to make sure that you don’t make too many mistakes when you first use regression methods. If your attribute data is in rank form, for example 1-5, best to worst, rating on a scale from 1-100, a bunch of defects, not so many defects, a few defects, etc. You can use the numbers or assign meaningful rank numbers to the verbal scores and run your regression on these responses.  If you do this, you can take advantage of more sample ratings (which is what I’m assuming you want to do when you asked about increasing the samples from 3 to 5) without having to worry about the restrictions of  Lattice Squares.
There are a number of things that you should keep in mind if you try this:
1. As with any attribute protocol, all of your raters of the attribute in question must be trained by the same person, with the same materials, so that their ratings will be consistent.
2. The discrete nature of the Y’s will probably mask interaction effects so if you are trying to fine tune a process, as opposed to looking for major factors impacting the process, there is a chance that your success will be limited.
3. Your residual analysis, particularly your residual plots of residuals vs. predicted are going to look odd.  They will consist of a series of parallel lines each exhibiting a slope of -1.
4. The final model may indeed be able to discrminate between one rating and the next but a much more probable outcome will be the ability to only discriminate between good, neutral, and bad or even just good and bad.
An example of running a regression on the Y response of “How you rate your supervisor” can be found in Chapter 3 of Regression Analysis by Example by Chatterjee and Price and if you want to check up on the residual patterns you can read Parallel Lines in Residual Plots by Searle in the American Statistician for August 1988.

0
#76341 Robert Butler
Participant

I may be wrong but I think you will have to provide more detail before anyone can offer any constructive thoughts.  Lattice squares are pretty restrictive with respect to treatments, replicates, and such.  Based on what I know of them 8 treatments are not permitted.  The book you reference, Cochran & Cox ,indicates that “the number of treatments must be an exact square”.  In the edition I use they further list the useful plans that they knew of at the time of printing.  These consisted of designs for 9, 16, 25, 49, 64, and 81 treatments.  Furthermore, this type of design, like the Latin Square, is used primarily in those cases where the treatment cannot be viewed as a continuous variable – for example 4 different tire types and 4 different driving styles.  If you could provide some more details concerning the experimental effort you are attempting I, for one, would be willing to try to offer some suggestions.

0
#76246 Robert Butler
Participant

jay,
That is correct.  A well controlled X variable will typically not appear as significant when running a regression on historical data.  Given your interest in the subject , Dave’s comment, which was seconded by others, is a good one.  The Box, Hunter, Hunter books is a very readable statistics book and it covers many of the issues that you will face in your efforts.

0
#76142 Robert Butler
Participant

You can certainly use regression analysis to look at historical data and the results may help guide you in your thinking but there are a number of caveats that you need to keep in mind.
1. There is a very high probability that your analysis will show variables that you know to be important to your process are not significant.  The reason this will occur is because these are the process variables that you control.  The fact that they don’t appear merely suggests that you have done an excellent job of controlling them.  Because of this control, any interactions with these variables will also appear to be not significant.
2. You will have to do a great deal of data preparation.  In particular you will have to perform a full blown regression analysis of your X’s – eigenvector analysis, VIF, etc. What you cannot do, unless you plan on going wrong with great assurance, is to take your X’s and just plug them into a simple correlation matrix.
3. Data consistency: When recording production data people do not record everything that is done to the process.  Consequently, many changes are made to variables that are not part of record keeping process.  If you have enough data and enough X’s to play with there is a good chance that, just by dumb luck, some of these unrecorded changes will correlate with the X’s that were tracked.  The end result will be correlation with no hope of identifiying the underlying cause.
Your initial attempt to ask for suggestions concerning possible factors is a good one.  You mentioned that this was a mistake.  I guess I’d like to know why this was so.  Too many X’s? Too few?  If it was too many I’d recommend brainstorming with a wider audience and then a secret ballot of all of the proposals with everyone rating the list from most important to least important.  If the people filling out the ballots are the people who know the process you should wind up with a pretty decent set X’s to investigate.  Also, remember that for a first look you are going after big hitters.  Don’t sweat the interactions-use saturated designs -15 variables in 16 experiments + one additional experiment for error estimate, 31 variables in 32 experiments…etc.

0
#76024 Robert Butler
Participant

From your description you have a design that will permit a check of main effects as well as interactions.  If you have no replicates and you want to check for significance of your effects the usual pactice is to use the mean squares of your highest order interaction in place of the mean squares for the error.  Thus the interaction mean squares will become your measurement of noise against which all other mean squares are compared.  I’m not familiar with Minitab but since this approach is standard statistical practice it is very likely that this is what Minitab is doing.
As a quick check plug the following into your program
Exp#1  A= -1,  B = -1 response = 210
Expt#2 A = 1, B = -1, response = 240
Expt#3 A = -1, B = 1, response = 180
Expt#4 A = 1, B = 1, response = 200
you should get MS for A = 625, MS for B = 1225, and MS for Error (the AB interaction) = 25
If you get this result then Minitab is indeed using this method.

0
#75913 Robert Butler
Participant

If you are using the attribute measurements as independent variables you can set up a dummy variable matrix for these X’s and run the regression against the dummy variables.
For example if you have an independent attribute measurement for texture and the choices are smooth, not so smooth, and rough you can set up the dummy variables v1, and v2 in the following matrix:
if smooth then v1=1, v2 = 0
if not so smooth then v1=0, v2=1
if rough then v1=0, v2=0
The number of dummy variables will always be one less than the number of  categories.  A regression model built using the variables v1 and v2 will express the response as a function of the texture.
A number of six sigma courses teach that regression cannot be used when the X’s are discrete.  This is just a very conservative approach to make sure that you don’t make too many mistakes when you start using regressiion methods.
I would recommend that if you are going to use variables in this fashion that you do a little reading first.  A good starting point would be Regression Analysis by Example by Chatterjee and Price.  Chapter 4 is titled Qualitative Variables as Regressors and should answer questions you may have about this approach.

0
#75772 Robert Butler
Participant

I think you will have to give some more specifics about your probem before anyone can offer meaningful suggestions.  As stated, your question doesn’t make sense.
If you have data over time and you are wanting to model the data using time series methods then you need to investigate the date using ARIMA techniques. You will have to address issues of stationarity, overfitting etc.
If you have data that has been gathered over time and you are interested in trying to correlate that data with parameters known to have changed during that time you will have to check the independence of those parameters and then proceed remembering to consider all of the caveats concerning efforts involving regression and regression diagnostics.
In either case, the issue of data normality does not matter.  Normality is an issue when investigating the residuals of your time series/regression efforts.
If you could give a clearer discription of what you are trying to do I,for one, would be glad to try to offer some suggestions.

0
#75700 Robert Butler
Participant

Does correlation imply causation? Give an example either way.
Does causation imply correlation? Give an example either way.
If something is statistically significant does this guarantee that it matters?
If I had 151 variables to check at two levels how many one-at-a-time experiments would I have to run?
Assuming I ran one experiment a second how long would it take to finish?
What is the smallest saturated design that I could use?
Given that I ran one experiment a second, how long would it take to run the saturated design?
Given that each experiment cost \$1,000 to run – estimate the cost savings of the saturated design as opposed to the one-at-a-time approach.

0
#75684 Robert Butler
Participant

Actually, the point made concerning the relationship between the t and the F statistic goes much deeper.
“The distribution of a unit normal deviate to the square root of an independent Chi square with f degrees of freedom, divided by f, is the t distribution with f degrees of freedom.  The ratio of Chi square(f)/f tends to 1 as f tends to infinity. Thus the t distribution with infinite degrees of freedom is identical with the standardized normal distribution.
The t distribution is related to the F distribution because if we made the degrees of freedom =1 for the numerator of the F test we will have
F(1,f2)  = t**2(f2)
For example F.95(1,12) = 4.75 = 2.179**2 = [t.975(12)]**2″
The above is from Brownlee, 2nd Edition pp.289=290.

0
#75659 Robert Butler
Participant

As has been noted by others, the t-test is for a test of two means or averages.  The F test is for a test of two variances.  A quick check of various references indicates that you will find information on the F test listed under “F test” or “F distribution”.
The basic t-test which most packages use assumes equality of variance with respect to the two populations under consideration.  Many also assume equal sample sizes.  You need to check you variances for equality.  If they are not significantly different you can go ahead and use the default t-test available in most packages.  If they are significantly different you will have to run a t-test with unequal variances….and if the sample sizes are not the same you will have to run one with unequal sample sizes and unequal variances.  To the best of my knowledge, unequal variance t-tests are not available as point and clicks in any packages.  You will have to do that test from first principles.
References:
F test, F distribution
Quality Control and Industrial Statistics – Duncan -listed as ” F distribution , in testing ratio of two variances
Statistical Methods -Snedecor and Cochran – listed as “F test”
Statistical Theory and Methodology – Brownlee – listed as “F distribution”
t-test unequal sample sizes, unequal variances
Statistical Theory and Methodology – Brownlee – 2nd Edition pp.297-304
There are, of course, any number of reference books you could check-I’m just including a list of some of those with which I am familiar.

0
#75554 Robert Butler
Participant

I guess it depends on what you have in mind for investigation.  If you are looking to apply Six Sigma to proficiency testing I would have to recommend that you look elsewhere.  I’ve done a lot of analysis of the proficiency data for my local school district and most of the factors that one has to address in order to use Six Sigma in that environment are completely out of your control.  For instance, in order to do any kind of a gauge R&R on the sensitivity of the state proficiency test you would have to be able to have some control over what the state does to the test on a year-by-year basis.  For my state, my analysis shows beyond a shadow of a doubt that the reading level of the tests is so high that while my state thinks they are testing math, science, citizenship, reading, and writing they are actually just testing reading four times and  writing once.
While question content varies from year to year the high reading bar of the tests has not changed. Thus the tests are of little value when it comes to assessing the quality of the math, science, and citizenship programs.

0
#75518 Robert Butler
Participant

We have paired data from two treatments A (before) and B (after).  The null hypothesis is that nothing happened.  We measure the tube diameter using a set of fixed pins and determine measurements in one of a couple ways.
1. specific pin fits tube #1 before but does or does not after treatment
2. specific pin fits tube #1 before, doesn’t fit after but either a smaller or a larger pin will now fit
In the first instance we have three possibilities and the differences between before and after will be either 0 – same pin fit, 1 -larger pin fit, or -1 smaller pin fit.
In the second case we will measure the diameters by checking first to see if the same pin will fit before and after giving a 0 difference.  If the same pin doesn’t fit we will find the closest larger or smaller pin that does fit and take the difference between the original diameter measurement and the new, closest fit.
If there has been no effect the null hypothesis that we are checking is that the median of the differences is zero.
Take the differences and assign a value of Z as follows:
Z = 1 if the difference is greater than zero
Z = 0 if the difference is less than zero.
The original distribution is continuous and the distribution of the differences will also be continuous. Since the differences are independent the Z values are also independent so we have a binomial situation of making n independent trials in which the probability of  Z is 1/2 on each trial.
The probability of a tie (i.e. a difference of  0) is assumed zero.  Since this won’t occur in practice those differences which are 0 are excluded from the analysis and the number of samples for the test are reduced by the number of zero differences in the data set. Thus no values of Z for ties and the sample size is reduced by the number of ties as well.
For small sample sizes the probability that the median of the differences is 0 is given by
1/(2)**n * Sum (n|x)  where the sum is over x is from 0 to n-m and
n = number of non zero differences and m = number of positive differences and n|x is the ratio of the factorials.
For larger sets we can use the normal approximation to the binomial which is
U(1-p) = (m-.5 – n*.5)/sqrt(n*.5*(1-.5)
using the normal lookup table you can find the value for 1-p and then the probability of zero median is computed directly.
Example:
We have 16 differences between paired samples whose differnces are as follows:
.3, -1.7, 6.3, 1.6, 3.7,-1.8, 2.8, .6, 5.8, 4.5, -1.4, 1.9, 1.7, 2.4, 2.3, 6.8
therefore Z=1 for 13 of these differences and Z=0 for three of them
First way = (1/2)**16 Sum from 0 to 16-13 of (16|x)
= (1/65,536)*(16!/0!16! +16!/1!15! +16!/2!14! +16!/3!13!) =.01064
or
U(1-p) = (13-.5 -16*.5)/(sqrt(16*.5*(1-.5)) = 2.25
thus 1-p = .9878 and p = .0122

Depending on the book this is called  sign test, discrete scales,  or randomization test.  Doing a bean count of my reference books sign test seems to be the most common label.

0
#75340 Robert Butler
Participant

I’ve built and analyzed a number of DOE’s where the response variable was variance.  The simplest approach is to replicate the entire design and compute the variance of the two samples for each experimental condition.  Granted, variability based on two samples is not what one would normally want when estimating variance as a Y response but this approach does work.  My approach has been to use saturated designs in order to screen as many variables as possible and to replicate the design 3 times.  Three times guards against the loss of a single run and thus against the loss of an experimental data point for the analysis of variables impacting the variance.  Of course, with such an approach you also can analyze for variables impacting mean shift as well.  If 3 reps is too many you can get by with 2 but if one of the horses die you will need to run regression diagnostics on the leftover design in order to determine what X variables you can still use for an analysis of the variance.

0
#75330 Robert Butler
Participant

That was the best they could do????
I’m sure this group could do better. For starters how about…
Shouted Insults Get My Attention
SIGnificant Mistakes Avoided
….the possibilities are endless.

0
#75261 Robert Butler
Participant

If I understand your problem correctly you have a series of pins of given diameters and you test for pin fit to the inner diameter before and after annealing.  If this is the case you have paired samples with a measurement system that can be viewed as a discrete scale with a limited range of values.  In this case the way to test the mean difference between the two groups is to analyze the data using a t test with an inclusion of a correction for continuity.
If you have say 15 tubes measured before and after annealing you would take the differences between the measurements and sum them.  The null hypothesis between non-annealed and annealed is that the signs on the differences are equally likely to be + or -.
Another way to check for significant differences with this kind of data would be Fisher’s randomization test. The methods for setting up your data and analyzing it using either of the above techniques can be found on pp.146 of the Seventh Edition of Statistical Methods by Snedecor and Cochran

0
#75230 Robert Butler
Participant

The ASQ book Measuring Customer Satisfaction-Survey Design, Use, and Statistical Analysis Methods, by Hayes is the best book I have read on the subject.  As for building your own survey I would recommend, after you have read the book, using Infopoll.  They are a web based survey group and you can build your survey and have them host it.  I’ve used them for our company customer satisfaction survey. I plan to use them again next year when we do our next survey.

0
#74882 Robert Butler
Participant

If  I understand your problem correctly you have a situation with multiple gauges and multiple lots and you are concerned about gauge and lot differences.  If this is the case then the easiest way examine linearity of gauges and lots is to set up a two way ANOVA with gauges and lots being your two variables of interest.  If you were to then run multiple samples from each lot (say 5 per gauge) you would have the grid illustrated below and you could use the data from this experiment to check for between and within gauge and lot variability and mean differences.  In order to check for gauge linearity over some range of values you could either choose your lots for different levels or, if you can identify distinct levels within lots you could run a three way ANOVA with levels as the third variable.
G1       G2      G3
Lot 1              5          5        5
Lot 2              5          5        5

0
#74009 Robert Butler
Participant

Ok, so now that everyone has had fun at Diane’s expense let’s see if we can’t work the problem.  Based on the rather sketchy information provided, you leave one with the impression that your firm makes decisions based on averages.  If this is the case then that is indeed an error.  The key point to remember about an average is that while it is numeric in nature it is not a pure number bereft of meaning.  An average is a descriptor and what it attempts to describe is the central tendency of a group of numbers (that is, give you some understanding of what is typical).  Since you are working for an insurance company there is a very good chance that the average is not representative of typical.  A better measure might be the median-the 50% point (that is 50% of the data is less than the median and 50% is greater-think of the grassy strip between the lanes of a super highway-half of the traffic on one side and half on the other-except, of course, during rush hour).  Even just the median will not suffice to get an understanding of what your process is about.
It sounds like your BB is attempting to get you and others to really look at the distribution of whatever it is that you are measuring in order to gain an appreciation of what your process is doing and why averages by themselves are meaningless.  One simple test for yourself would be to make a histogram of your data using Excel just to see how little information is conveyed by the average.

0
#73535 Robert Butler
Participant

Joe BB makes some interesting assertions concerning black belt training and his reference to Roger’s article is interesting but I think misses the point.  As Roger says a black belt is not a statistician. Unfortunately, what Rogers says is not what is being heard out in the real world.  The sad fact is that in way too many instances a black belt is in fact viewed as a competent statistician and regardless of his/her particular state of statistical ignorance his/her statements are taken to be statistical truth.  The reason for this is, I think, because most managers have absolutely no understanding of statistical concepts nor do they have any real understanding of the effort needed the aquire and properly apply statistical training.
As for the notion of a BB absorbing more in a given period of time, this is all well and good. However, the issue is not about absorbing information, it is about correctly applying what little you have learned and, more importantly, seeking competent help in those situations where you are completely out of your depth.
Lest you decide that I’m “whining about BB’s” rest assured that I’m not.  BB’s who believe that they are statistically competent after 4 weeks of training and who allow their management to treat them as such deserve more than a whine. Such people deserve censure because they are in the position of doing great damage, not only to the institutions for which they work but also to the idea that statistical analysis is of value in the industrial setting.

0
#73496 Robert Butler
Participant

Andy Schlotter is correct.  The issue of variation reduction has been a central tenet of statistical analysis from the beginning.  As a formal area of study statistics is over 200 years old.  If you really want to read something on the origins of variation reduction and indeed on the origins of most of the major issues of statistical focus, I’d recommend “The History of Statistics before 1900”.

0
#73383 Robert Butler
Participant

You can use anything you want for a response in a DOE.  The issue of normality only comes into play when you are developing your regression equations and wish to test for levels of signficance of the factors.  Then, the issue of normailty focuses not on the responses or on the independent factors but on the normality of the residuals. Indeed,
If the Y’s are not independent of one another this will be reflected in the terms entering your regression models.  You will discover (assuming that your X’s are indeed independent of each other) that Y’s that are not independent will tend to have the same terms in their correlation equations and the magnitudes and signs of the respective coefficients of the X’s (given that these have been normalized to a -1,1 range) will be similar.
For a discussion of these issues I would recommend pages 22-33 of Applied Regression Analysis Second Edition by Box and Draper.

0
#73076 Robert Butler
Participant

There are a number of good points that have been brought up in this discussion thread, however, based on the description of the process, I would recommend that you first check the existing subgroup averages for auto-correlation.   We have multispindle machines here and we had the x-bar r charts in place and our operators were following all of the good practices that one would expect of an SPC effort.  The problem was that, based on our charting, our process wasn’t that good and we were continually reacting to “out of control” signals from the charts.  An examination of the subgroup averages indicated that they were not independent measurements and consequently, our control limits were not reflective of the actual process capability -in short they were too narrow.  After a proper assessment of the auto-correlation we found that in order to insure that the numbers we were plotting indeed met the criteria of independent measurements (and thus that we had meaningful x-bar r charts) we had to build the charts using every 4th subgroup average and range.

0
#73019 Robert Butler
Participant

Hmmmmm, based on your description I would assume that you are supposed to compute the t statistic for a paired experiment.  If this is the case then the expression for standard deviation for the paired differences would be as follows:
Average Difference = 20.8
Sum Differences = 208
Sum of squared differences = 3910
number of pairs = 20
Thus
Square Root ( (3910 – (208*208/20))/19) = 9.59 = standard deviation of the differences
Thus the estimate of the standard deviation of the sample mean difference would be
9.59/Square Root(20) = 2.14
To test to see if the average difference 20.8 is significantly different from zero the t statistic would be  20.8/2.14 = 9.7 which for 19 degrees of freedom is significant. Thus the average difference is significantly different from zero for that particular standard deviation.
If you need to read some more on differences from paired experiments you might want to check  pp.84-85 of the Seventh Edition of Statistical Methods by Snedecor and Cochran.
Hope this is of some help.

0
Viewing 32 posts - 2,401 through 2,432 (of 2,432 total)