A Simple Problem for the Experts
Six Sigma – iSixSigma › Forums › Old Forums › General › A Simple Problem for the Experts
- This topic has 14 replies, 6 voices, and was last updated 18 years, 4 months ago by
Gabriel.
-
AuthorPosts
-
February 7, 2004 at 3:59 am #34489
Reigle StewartParticipant@Reigle-StewartInclude @Reigle-Stewart in your post and this person will
be notified via email.For the Six Sigma Experts: Who can solve this problem?Here is a set of interrelated but typical engineering questions that a design engineer is often confronted with. This particular problem is quite simple and represents an “entry level” situation that is part of the DFSS training for Design Engineers. This problem has mechanical and electrical design corollaries. SITUATION: An engineer has just finished configuring a certain assembly. The envelope specification was given as 4.976 +/- .003 inches. Inside of this envelope are 4 parts, where each part was specified as 1.240 +/- .003 inches. The same NC machine makes the envelope and the 4 parts. The parts are to be randomly selected for assembly. Based on a process sampling of n=30 production parts, the process standard deviation was determined to be S=.001. The sample data appeared to be normally distributed. Based on these facts, the design engineer wanted to determine the statistical probability of an interference fit (i.e., gap > 0). QUESTION 1: What is the probability that the assembly gap will be greater than zero? QUESTION 2: Could it be that the given process standard deviation is biased due to random sampling error and, if so, to what extent? QUESTION 3: Given such error in the process standard deviation, what is the statistical worst-case expectation in terms of probability of interference fit? QUESTION 4: If the design goal is no more than 3.4 interferences-per-million-assemblies, what component level standard deviation would be required to ensure the goal is met given the presence of random sampling error? QUESTION 5: From a design engineering point-of-view, and in terms of the Z.gap calculation, should the potential random sampling error be factored into the numerator term as a vectored worst-case linear off-set in the component means or in the denominator term of the pooled error. Either way, what is the rationale? PS: This problem also has a Monte Carlo solution that will confirm the computational solution.Reigle Stewart
0February 7, 2004 at 6:16 am #95174
Tim FolkertsMember@Tim-FolkertsInclude @Tim-Folkerts in your post and this person will
be notified via email.I always like a good challenge,so I’ll give it a shot
We were toldThe envelope specification was given as 4.976 +/- .003 inches. Inside of this envelope are 4 parts, where each part was specified as 1.240 +/- .003 inches. The same NC machine makes the envelope and the 4 parts. The parts are to be randomly selected for assembly. Based on a process sampling of n=30 production parts, the process standard deviation was determined to be S=.001.
(Im assuming the process was pretty well centered at 1.240 and 4.976, rather than somewhere near the limits of the engineering specs. If that is wrong, then the rest is tainted.)
QUESTION 1: What is the probability that the assembly gap will be greater than zero?
When combining standard deviations, find the RMS (root mean square) sum. In this case the combined st dev is (0.001^2 +0.001^2 +0.001^2 +0.001^2) = 0.002. Thus the combined 4 parts are 4 *1.240 = 4.960 with st dev of 0.002. The envelope of 4.976 is (4.976 4.960)/0.002 = 8 st dev away from the mean, so basically no parts will fail Excel wont even give an answer other than 0. (This assumes the st dev was actually found to be 0.00100. Since only 1 digit is quoted, it would also be reasonable to assume that the st dev was somewhere between 0.00050 and 0.00150.)
QUESTION 2: Could it be that the given process standard deviation is biased due to random sampling error and, if so, to what extent?
Yes – otherwise you wouldnt ask :-). A guide I found says that for 30 parts, the calculated standard deviation has itself a standard deviation of ~13%. That is, it you kept drawing random samples of 30 and measuring the st dev, the spread in these numbers would produce a st dev. If you take a 2 st dev (26%) range for a 95% certainty, then the st dev of 0.00100 could actually be in the range of 0.00076 to 0.00126.
QUESTION 3: Given such error in the process standard deviation, what is the statistical worst-case expectation in terms of “probability of interference fit?”
Now the combined four parts have a st dev up to (0.00126^2 +0.00126^2 +0.00126^2 +0.00126^2) = 0.0025. This is still 6.4 st dev from the envelope, for a failure rate of 8E-11.
QUESTION 4: If the design goal is no more than 3.4 interferences-per-million-assemblies, what component level standard deviation would be required to ensure the goal is met given the presence of random sampling error?
As all six sigma practitioners know, 3.4 PPM is 4.5 st dev from the mean. Thus the st dev of the combine parts should be no bigger than (4.976 4.960) / 4.5 = 0.0036. The st dev of each part should be no bigger than 0.0036 / (4^0.5) = 0.0018. To get our 26% cushion, the st dev should be measured as no more than 0.0018 / 1.26 = 0.0014
QUESTION 5: From a design engineering point-of-view, and in terms of the Z.gap calculation, should the potential random sampling error be factored into the numerator term as a vectored worst-case linear off-set in the component means or in the denominator term of the pooled error. Either way, what is the rationale?
I dont know!
PS: This problem also has a Monte Carlo solution that will confirm the computational solution.
4 sets of 1,000,000 random data points in Minitab (mean 1.24, st dev 0.0018) gave no values over 4.976, but several close to 4.976, so I think I’m on the right track.
0February 7, 2004 at 2:43 pm #95176
Riegle StewartMember@Riegle-StewartInclude @Riegle-Stewart in your post and this person will
be notified via email.Tim Folkerts:Very nice work; however, you forgot to factor in the
envelope. The envelope must be a part of the total RSS
as well. It too has a tolerance and will vary in accordance
to the same standard deviation (S=.001). Dont forget to
compute the nominal gap: 4.976 – (1.240 * 4) = .016. So,
Z.gap is given as (0 – .016) / RSS.Great WorkReigle Stewart0February 7, 2004 at 10:47 pm #95183
Reigle StewartParticipant@Reigle-StewartInclude @Reigle-Stewart in your post and this person will
be notified via email.Tim Folkerts:
Your assumption is interesting. You stated: “Im assuming the process was pretty well centered at 1.240 and 4.976, rather than somewhere near the limits of the engineering specs. If that is wrong, then the rest is tainted” Perhaps this is why the generalization of a 1.5 sigma shift is needed in engineering. The problem you faced (given by your assumption) is that you recognize a process will shift and drift … but by how much? You also recognize that your analysis could be tainted if your assumption of a centered process is false (which it would be most of the time, given a process rarely remains perfectly centered over time). Perhaps one of the design goals should be to devise a set of tolerances that are “robust” to shift and drift. Great job and good insights. Together, we will get through this problem.
Respectfully,
Reigle Stewart0February 8, 2004 at 3:14 am #95188
Tim FolkertsMember@Tim-FolkertsInclude @Tim-Folkerts in your post and this person will
be notified via email.Mostly I was interested in clarifying the question. You stated that the engineering specs were 1.240 and 4.976; you implied (but didn’t specifically state) that the measured production specs were the same. I just wanted to point out that I was using those numbers as the measured values and therefore as the basis of my calculations (We all know that what the engineer asks for and what gets produced are often two different things!).
Beyond the simple mathematical need to know the measured values, there are of course other challenges as you point out – the physical challenge of keeping the process performing as desired and the statistical challenge to get a good estimate of this performance. I don’t pretend to know how well the postulated 1.5 sigma shift emulates either of these difficulties in real life.
As you state “Perhaps one of the design goals should be to devise a set of tolerances that are “robust” to shift and drift.” It seems that you can always make the process more robust in several ways:
1) you can create a design that is tolerant of variation. 2) you can use better methods of manufacture to reduce variation.3) you can measure a lot to catch changes and use feedback to correct the problems.
All of these cost money, so you have to balance the costs vs the benefits to decide how to most effectively “robustify” a process.
Tim
0February 8, 2004 at 10:34 am #95190Hi Reigle,
I must congratulate you … you’ve made your point well. I just wondered why you didn’t consider using a Six Sigma step process with a ‘process performance’ of 0.0005″. I believe ‘honing’ can meet this criteria.
Regards,
Andy0February 8, 2004 at 4:09 pm #95194
Reigle StewartParticipant@Reigle-StewartInclude @Reigle-Stewart in your post and this person will
be notified via email.Tim Folkerts:Perhaps now we should consider the full solution to our
design engineering problem. It should be pointed out that
the following solution is easily confirmed by Monte Carlo
simulation. For the given design, the nominal assembly
gap is computed as Nom.gap = (4.9760 (1.240 * 4)) =
.0160. The assembly gap standard deviation is computed
as S.gap = sqrt(.001^2 * 5) = 0.0022361. Of course, this
assumes zero covariance (which is most reasonable
when the assembly is random). We are also assuming a
normal distribution (again, for many machining
operations, this is quite rational). So, the gap capability
can be expressed as Z.gap = Nom.gap / S.gap = .016 /
0.0022361 = 7.155. Obviously, this is greater than 6S. It
is also interesting to note that 3S components will
produce a 7S assembly. However, it is entirely possible
that sampling error could have been present at the time of
process qualification. Given this, the statistical worst-case
condition is given by the upper confidence interval of the
standard deviation. For the adopted process, we
compute this potential worst-case condition for each of
the COMPONENT standard deviations as S.wc = S *
sqrt((n-1) / X^2)), where X^2 is the chi-square value. For
the case alpha = .005 and df = 29, we have S.wc = .001 *
sqrt( 29 / 13.121 ) = 0.0014867. Based on this, the worst-
case GAP standard deviation would be S.gap.wc =
sqrt(0.0014867^2 * 5) = 0.0033243. This means that the
worst-case gap capability would be given as Z.gap.wc =
Nom.gap / S.gap.wc = .016 / 0.0033243 = 4.81. Given
this, the assembly gap will experience an equivalent
shift in capability of Z.shift = 7.155 4.81 = 2.34 sigma
owing to an inflation of c=1.4867 in the standard deviation
of each of the 5 components (due to the potential
presence of random sampling error). Of course, this
means that each component would experiences an
equivalent shift in capability such that Z.shift = 3(c-1) =
3(1.4867 1) = 1.46 (or about 1.5). Now, let us apply
such an equivalent linear shift to the nominal
specifications of each component and then recompute
Z.gap. For example, consider the nominal worst-case
envelope condition: 4.976-(1.460*.001) = 4.97454. The
nominal worst-case part condition would be: 1.240+(1.46*
.001) = 1.24146. So, the worst-case nominal gap would
be 4.97454 (1.24146 * 4) = .0087. Hence, the worst-
case gap capability would be Z.gap.wc = Nom.gap.wc /
S.gap = .0087 / 0.0022361 = 3.89. Thus, we recognize
the dynamic worst-case condition of the assembly gap
to be Z.gap.dynamic = 4.81 (due to dynamic expansion of
the standard deviations) and the equivalent static worst-
case condition to be Z.gap.static = 3.89 (due to equivalent
off-set in the nominals). So, for this particular design
scenario an equivalent worst-case shift in the numerator
of Z.gap produces a lower gap capability when compared
to the equivalent amount of error located in the
denominator (in the form of expanded standard
deviations). Consequently, the design optimization
should be focused on the numerator term of Z.gap and
not the denominator term. In other words, the designer
should focus on optimizing the nominals, not the
tolerances (in this particular case). Since the overall
design goal is Z.gap.wc = 4.5, then (4.50 3.89) *
0.0022361 = 0.0013623 inches must be added to the
nominal gap so as to account for a worst-case shift in
each the component nominals (if the gap capability goal
of Z = 4.5 is to be met). If this full amount is added only to
the envelope (merely for the sake of example) we note
4.9760+.0013 = 4.9774. Working back through the
problem to check our assertions, we note that Nom.gap =
(4.9774 (1.2400 * 4)) = 0.01736. So, Z.gap = 0.01736 /
0.0022361 = 7.765. But, if the envelope and each of the
parts were to shift in their respective worst-case directions
by 1.46 sigma, we note that Z.gap = .0100/0.0022361=
4.50. So, the designer set the envelope specification at
4.978 +/- .003 and the parts at 1.240 +/- .003, recognizing
the assembly would be built using a prcess with a
standard deviation of .001. By way of this set of
performance specifications, the designer was able to
meet the operational expectation of no more than 3.4
interferences-per-million-assemblies while concurrently
ensuring a design that is robust to sustained shifts in
process centering of 1.460 sigma (or if the process
bandwidth were increased by a factor of 1.487).Respectfully Submitted Reigle Stewart0February 8, 2004 at 8:03 pm #95200
John H.Participant@John-H.Include @John-H. in your post and this person will
be notified via email.For the Six Sigma Experts: Who can solve this problem?
The VP of R&D at an electronics firm informs the CEO that you can’t apply the 1.5 sigma shift to a new resonator they designed because the output obeys a Cauchy Distribution with Probability density
P(x)= 1/Pi(1+x^2) and the shift does not apply.
The Company’s BB has been asked by the CEO to demonstrate the 1.5 shift for a Cauchy Distribution via a Monte Carlo simulation and is having difficulty illustrating it. How would you aid the BB in his task and what would be the predicted shift?
Respectively Submitted,
John H.0February 8, 2004 at 10:51 pm #95201Reigle,
Let me get this straight. An engineer has just finished configuring a certain assembly. He has already determined the specifications on each of the 5 components without the determination of the requirements on the CTQ (which is the gap between the 4 parts and the envelope). The process is already speced and up and running without consideration of the tolerances to meet the design requirement. He has already taken a sample which he has arbitrarily determined 30 is sufficient with no consideration of the power of the sample. And now he wants to know the probability of interference fit?
Talk about a contrived example. Or maybe is this what you consider DfSS and what you are teaching as DfSS. I cant think about a situation that could be any more anti-DfSS. Look at your situation: Arbitrary specifications on the components without consideration of the requirements on the CTQ, CTQ requirements (range of acceptability) not established prior to configuration (dont you think that there may be a upper requirement on the gap why not design the parts at 1.23 nominal so that there is no possible way to have interference fit), acceptable risk of violating the CTQ requirements not established but estimated after the fact, qualification sample set at 30 random parts with no consideration in the power in estimating process parameters or ability to assess process stability due to sampling at random.
This isnt DfSS this is what has always been done. The only thing that you have added is an overly conservative fudge factor due to sampling error. And you couldnt even do that correctly.
Lets look at the proper sequence:
Determine the CTQ (the gap between the 4 parts and the envelope)
Determine the range of acceptability of the CTQ (I will assume since the nominal is .016 that the range is (0, 0.032)
Determine the acceptable risk of violating the requirements on the CTQ (3.4 ppm)
Determine the maximum acceptable level of variation in the CTQ
Ø S.gap = .016/4.65 = 0.00344 where 4.65 is the Z-value that cuts of 1.7*10^-6 in each tailEstablish the relationship between the components and the CTQ
Ø Gap = E (P1 + P2 + P3 + P4)
Ø S.gap = sqrt((S.e)^2+ (S.p1)^2+(S.p2)^2+(S.p3)^2+(S.p4)^2)Determine the allocation of the variation in gap across the components – since the parts will all be made on the same process use equal allocation:
Ø S.gap = sqrt(5*S.c^2)Determine maximum acceptable level of the variation in the components
Ø .00344 = sqrt(5*S.c^2) – S.c = 0.001538Establish a validation sampling plan to assure that the components will meet requirements. This should be based on the minimum detectable bias in both the mean and variation. For example, a validation plan that calls for 25 subgroups of 4 will detect an inflation in S.c of 20% (c=1.2) with a power of 0.9895, a sample of 30 will detect an inflation of 30% with a power of 0.964
Determine the S.design as S.c/c where C is the minimum detectable inflation. (with a sample size of 25 subgroups of 4, c= 1.2 so S.design = 0.00128
Develop or determine the process for the components that will meet requirements on the components (nominal and variation (S.design)).
Verify the process using validation plan
Establish specifications and controls on the component processes.
In your example, the 30 part sample has estimated S.components at .001. With a sampling plan of 30 random samples This is less than what is required for qualification and less than what would be detected to reject the null (0.001538/1.3 = 0.001183). Therefore, you would qualify the process. The estimated worse case due to sampling error sigma level would be
Ø S.component = .001*1.3 = 0.0013
Ø S.gap = sqrt(5*(0.0013)^2) = 0.0029
Ø Z = .016/.0029 = 5.50
Ø PPM = .02
You know of course that you are stacking the deck in setting the alpha risk as low as .005 when the critical risk is the beta risk since we are concerned with S.qual being greater than what is estimated. In other words, we are concerned that when we fail to reject the null that the actual S.components is significantly greater than what is estimated.
But even when you have used the .005 worse case inflation of c=1.4867, you have shown once again that a shift in the numerator does nothing but confuse and add complexity. If c=1.4867, then the worse case due to sampling error is:
Ø S.component = .001*1.4867 = .0014867
Ø S.gap = sqrt(5*(0.0014867)^2) = 0.003324
Ø Z = .016/.003324 = 4.81
Ø PPM = .75
Your equivalent static worst- case condition to be Z.gap.static = 3.89 (due to equivalent off-set in the nominals) lacks any validity. You can not assess the amount of bias in the mean by assessing the sampling error in the variance.
Your clients will realize the additional cost of over design due to the voodoo statistics your are promoting.
Statman0February 9, 2004 at 2:37 am #95203
Reigle StewartParticipant@Reigle-StewartInclude @Reigle-Stewart in your post and this person will
be notified via email.Given: The initial bandwidth of process capability is
recognized as Xbar +/- 3S. Given the symmetrical nature
of a normal distribution, we will only consider the right
hand side: Xbar + 3S, or upper process limit as some
would say (UPL). 1) We naturally recognize that error in
the mean and variance will enlarge the total bandwidth of
process capability. Such inflation (due to random
sampling error) will inflate the upper process limit to such
an extent that: UPL = Xbar + UCL.xbar + 3*UCL.stdev.
2) A “typical” and normally recognized level of statistical
confidence is 95%. 3) If we seek to simultaneously
consider both sources of error (i.e., mean and variance),
then the confidence of each independent source of error
would need to be .95^.5=.9748. 4) Given 1-a=.9748 and
df=29, the corresponding t value would be t = 2.3582.
Hence, the upper confidence interval of the mean would
be UCL.xbar = .43054. 5) Given 1-a=.9748 and df=29, the
corresponding value of chi-square would be X^2=
16.0749. Hence, the upper confidence interval of the
standard deviation would be UCL.stdv = 1.3432.
Considering the upper process limit, we have UPL = 3 *
1.3432 = 4.02946. 6) So, the upper process limit (upper
limit of the process bandwidth) under worst-case
conditions (based on error in the mean and variance)
would be given as UPL = Xbar + UCL.xbar + 3*UCL.stdev
= 0 + .43054 + 4.02946 = 4.460. 7) Since the initial
(unbiased) upper process limit is given as UPL.in = 3.0
and the worst-case condition is UPL.wc = 4.46, then we
can say the initial process bandwidth has shifted
upward by UPL.wc UPL.in = 4.46 3.00 = 1.46, or
about 1.5S. The same can be said on the lower side of
things. So, UPL.wc = UPL.in + 1.46S. Consequently, we
observe a positive shift in the upper process limit: Z.shift =
1.46, or about 1.5S, but only for the given alpha and df.
Owing to this, we note that: c = 4.46 / 3 = 1.4867, or about
1.5. 8) Lets now calibrate Z.shift by considering only the
upper confidence interval of the initial standard deviation,
but using 1-a=.995 and df=29. Doing so reveals that
UPL.wc = Xbar + (3 * UCL.stdv) = 0 + (3 * 1.4867) = 4.46.
Consequently, we observe a shift in the upper process
limit of Z.shift = 1.46, or about 1.5S, but only for the given
alpha and df. Owing to this, we note that: c = 4.46 / 3 =
1.4867, or about 1.5.
Respectfully Submitted
Reigle Stewart0February 9, 2004 at 3:20 am #95204Reigle,
Do you understand the difference between alpha risk and beta risk? Do you know how the power of a test is defined? How about an error of type I vs. an error of type II?
Maybe you have not been trained in the concept that there is a difference between Reject-Support significance tests and Accept-Support significance tests.
Let me give you a little help in this area. A while ago you proposed a simulation by creating thousands of random normal samples with mean = 100 and standard deviation = 10. You showed that that about 0.5 percent of the standard deviations will be 14.9 form the random sample. You concluded that the worse case sampling error would cause you to under-estimate the true standard deviation because it can be as big as 14.9.
However, THE TRUE POPULATION STANARD DEVIATION IS 10. Therefore, if you conclude that it is greater than 10 you are making an error of type I the population variation has increased when it actually has not.
Acting in a way that might be construed as highly virtuous in the reject-support situation, for example, maintaining a very low alpha level like .005, is actually “stacking the deck” in favor of the design engineers theory that the variation has not increased ( an Accept-Support situation). I do understand how easy it is for the novice to sometimes get confused and retains the conventions applicable to RS testing.
Statman
0February 9, 2004 at 5:37 am #95205
Tim FolkertsMember@Tim-FolkertsInclude @Tim-Folkerts in your post and this person will
be notified via email.I don’t know that I want to get in the middle of what is apparently a continuing debate between Reigle & Statman. I have a degree in Math, but I am self-taught in stats and six sigma. Mostly, it is just an interesting topic and one that I want to understand.
I decided to try a different Monte Carlo Simulation. I ran 30 columns by 10,000 rows of random normal data (Minitab), with mean = 0, st dev = 1.0. For each of the 10,000 rows I calculated the st dev. I counted all of the rows where the st dev was close to 1 (0.95 <= st dev < 1.05). In this case, 2636 fit the criterion.
Then I repeated the process using standard deviations of 0.6 to 1.8 in steps of 0.1. The results are ass follows:0.6
0
0.7
31
0.8
615
0.9
2155
1
2636
1.1
2386
1.2
1296
1.3
558
1.4
223
1.5
77
1.6
29
1.7
11
1.8
4
(By coincidence, there are very nearly 10,000 results; 10,021 to be precise).
This could be interpreted to say “of all the ways to calculate a st dev of 1.0 from a sample of 30 peices, here are the odds that it came distribution with a specific ‘true’ standard deviation.” In this particular case, there turns out to be approximately a 0.5% chance that the calculated st dev of 1.0 could have come from a distribution with a ‘true’ st dev greater than 1.5.
Certainly, this approach is not perfect. I could run more than 10,000 rows. I could look at results closer to 1.0. I could make finer steps in the st dev. I could try something other than n =30. Perhaps fundamentally, the biggest question is whether it is legitamate to assume a uniform step size in st devs as the universe of all possible st dev. Perhaps a geometric series would be more appropriate. (I have no idea if such a study has been published, but it seems someone must have done this before.)
So, in this particular case, if you observe a st dev of 1.0, you can be quite sure that the true st dev was 0.7 – 1.5.
Cheers,
Tim
0February 9, 2004 at 4:37 pm #95227
GabrielParticipant@GabrielInclude @Gabriel in your post and this person will
be notified via email.Tim,
I am not sure I understood you.
You said:
– “of all the ways to calculate a st dev of 1.0 from a sample of 30 peices, here are the odds that it came distribution with a specific ‘true’ standard deviation.” In this particular case, there turns out to be approximately a 0.5% chance that the calculated st dev of 1.0 could have come from a distribution with a ‘true’ st dev greater than 1.5.
In fact, there was allways ONE “true” stdev, and it was allways 1.0. All the 300,000 individuals belong to a population with stdev 1.0, which is the “true” one. Then you took several samples from this population, and you found different “sample’s standard deviations”, which is used as an estimator of the “true” “population’s standard deviation”.
So, in fact, what you are saying is that if you take a sample of size 30 there is a 0.5% of chance to get a sample’s standard deviation of 1.5 or grater when the true standard deviation is 1.0.
You must understand the difference between that statement and and a statement like “if I take a sample of size 30 and find a sample’s standard deviation of 1.0, there is an X% of chance that the actual standard deviation was in fact 1.5 or greater”.
To use a different example: There is a coin and I don’t know if it is fair (heads and tails are 50/50) or if it delivers more heads than tails.
I do not need a motecarlo simulation to know that, if the coin was fair, the rate of heads would be 50% (it’s a binomial distribution). But let’s do it anyway: I run a 10 columns by 10,000 rows where each cell has either a 0 or a 1 with equal probability (50/50), and count the number of ones (which will be taken as “heads”. I get the following result:Heads
Cases
%
Theory0
16
0.16%
0.10%1
105
1.05%
0.98%2
470
4.70%
4.39%3
1166
11.66%
11.72%4
2035
20.35%
20.51%5
2472
24.72%
24.61%6
2102
21.02%
20.51%7
1102
11.02%
11.72%8
425
4.25%
4.39%9
102
1.02%
0.98%10
5
0.05%
0.10%Total
10000
100%
100%
The columns “Cases” and “%” correspond to the simulation run in Excel. The column “Theory” is according to the normal diustribution.
Now, that clearly means that, GIVEN THAT THE RATE OF HEADS OF THE COIN IS 50%, the chances to get 9 or 10 heads out of 10 trials is 1.07 or 1.08%.
BUT IT DOES NOT TELL YOU ANYTHING ABOUT THE CHANCES OF THE ACTUAL RATE OF HEADS OF THE COIN, GIVEN THAT YOU GOT ANY GIVEN RESULT OUT OF 10 TRIALS.
That is what Statman was trying to explain Reigle.
In other word, who cares about the chances to get a bad standard deviation in the sample given that the actual standard deviation is good. The real risk is that you get a good standard deviation in the sample, and assume that the actual one is also good, given that it is not.0February 9, 2004 at 11:29 pm #95248
Tim FolkertsMember@Tim-FolkertsInclude @Tim-Folkerts in your post and this person will
be notified via email.Gabriel,
I was trying to go a level or two deeper than your coin example. Let me see if I can explain it a little better.
Suppose you have 20 sets of blocks, and each set has lots and lots of blocks. All 20 sets have an average length of 10.000″. The first set has a st dev of 0.100″ and the lengths follow a normal distribution. The next set has a st dev of 0.200″, and so on up to the 20th set with a st dev of 2.000″.
Now you go to the first set of blocks and draw out a random group of 30 blocks. The true st dev is indeed always 0.1″, but you won’t always get 0.1″. Now repeat this 10,000 times (equivalent to the monte carlo simuilation I did). Of all the different st devs you get, how many will happen to be as large as 1″ (or more specifically, how many are in the range 0.950″ – 1.050″)? It turns none were that large.
Now go to the next set of blocks, with a st dev of 0.2 and repeat the previous paragraph. Then repeat for all the other sets of blocks.
For every one of these 20 x 10,000 200,000 experiments, I will get some st dev. One might be 0.234″, the next might be 1.435″, the next 1.010″. I went through and picked out just the experiments were I estimated the st dev to be in the range of 0.95 to 1.05. This happened to include 10021 of the 200,000 total experiments.
Now I can ask the question “If I did indeed observe the st dev to be ~1.0″, which set of blocks did the data likely come from?” That is where the table comes in.
True OddsSt Dev
0.6 0 / 100210.7 31 / 100210.8 615 / 100210.9 2155 / 100211.0 2636 / 100211.1 2386 / 100211.2 1296 / 100211.3 558 / 100211.4 223 / 100211.5 77 / 100211.6 29 / 100211.7 11 / 100211.8 4 / 100211.9 0 / 10021
The odds it came from the blocks with a true st dev of 1.0 is 2636/10021 = 26.3%. The odds it came from a set a block with a st dev of 1.5 or greater is (77+29+11+4)/10021 = 1.2%.
I completely agree with you when you say “In other word, who cares about the chances to get a bad standard deviation in the sample given that the actual standard deviation is good. The real risk is that you get a good standard deviation in the sample, and assume that the actual one is also good, given that it is not.”
I was trying to simulate what you call the “real” risk. I did 10,021 experiments where I found a “good” st dev of 1.0. Some of these came from experiments where the “real” st dev was better than I thought, BUT some of these actually came from experiments with a “bad” st dev. Specifically, 1.2% of these came from a distribution where the st dev was at least 1.5 times worse than I thought.
I hope that clarifies what I was saying. :-)
Tim
0February 10, 2004 at 11:38 am #95260
GabrielParticipant@GabrielInclude @Gabriel in your post and this person will
be notified via email.Ok, I had missunderstood you. Thanks for the clarification.
0 -
AuthorPosts
The forum ‘General’ is closed to new topics and replies.