# Teach Darth Something

Six Sigma – iSixSigma Forums Old Forums General Teach Darth Something

Viewing 64 posts - 1 through 64 (of 64 total)
• Author
Posts
• #50968

Darth
Participant

The population I wish to sample from is exponential based on some historical data. Obviously the standard sample size calculations are not appropriate because the s.d. is so large. I don’t want to take multiple samples and hope the central limit theorem will save me. Simple question is how do I calculate an appropriate sample size and power for a non normal population? The data is continuous in nature. Thanks for the enlightenment.

0
#175948

Darth
Participant

We seem to be getting lots of posts on silly stuff. How about one of you experts actually tackle the stat question I have posted?

0
#175950

Sloan
Participant

Everyone thinks it’s a trap. No one wants to risk stepping into the Dark Lord’s domain for fear of being smited.
Smite me not, for I am merely the court fool.

0
#175951

Darth
Participant

Not a trap but a true question for a real situation. I have a few ideas but am not confident in them.

0
#175957

Taylor
Participant

Darth
I’ll take the bait, And I promise no expert in this category, but this how I would do it
The calculation for sample size n is dependent on the departure from normality. To be pratical, how large is large enough? If the distribution for X is severly nonnormal, so you will want a large n, say >50 (or 100), for adequate approximation to normality. However, if the underlying distribution does not differ too greatly from normality, then obviously a smaller sample size.
This is based on the fact that sampling distribution of Xbar is approximately normally distributed regardless of the probability distribution of X.
Yes I know this gets back to CLT, but just the way I would do it.

0
#175958

Mr. IAM
Participant

Check out “statistical power of non-parametric tests: a quick guide for designing sampling strategies.” by PJ Mumby
Sounds like he uses monte carlo simulation to some extent to calc power… – M

0
#175959

Darth
Participant

Good start, but I said I didn’t want to deal with the CLT and take multiple samples. Just as if it were a normal distribution, I want to take a single sample for inference while achieving some alpha and beta level. We have simple sample size calculators that require the alpha, precision and s.d.. They are predicated upon a normal distribution. Minitab allows us to adjust for power as well. But, in an exponential distribution, the s.d. is very high and thus the calculated sample size is ridiculous. Is there an equivalent method for calculating sample size for an exponential or any skewed distribution that allows me to determine a sample size but adapts to the skewness? Thanks.

0
#175964

Mikel
Member

I know the answer, but I am still waiting for you to answer my
question about sample size when you are interested in standard
deviations, not means. I know the answer to that one too.

0
#175968

Darth
Participant

Stan, I answered that one years ago. But after all the tequila, wine and grappa that we consumed you probably don’t remember. Plus, BOAS was there and looking hot.

0
#175977

Mikel
Member

If you don’t know the answer just say so, I’ll still answer your exponential sample size quiz. You really should do your own homework though.
BOAS is ready to pop BTW.

0
#175979

Darth
Participant

Ok, maybe I was the one that was too drunk to remember and it was only a dream that I answered it. You can send both answers offline. Again, I swear I am not the father despite what she claims.

0
#175982

Mikel
Member

Too bad, I suspect it would have been fun.

0
#175987

Taylor
Participant

OK Darth, My curiosity got the best of me, so went on a Google Search, Really sorry I did, because this stuff is greek to me. Anyway here it is.
http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aos/1176325491

0
#175990

Darth
Participant

Good try. Unfortunately the entire article wasn’t available only the abstract. But it does sound fascinating doesn’t it? I was hoping for something simple like an excel spreadsheet. Keep working on it.

0
#175992

Severino
Participant

I’ve gotten as far as a confidence interval for 1/lambda.
http://en.wikipedia.org/wiki/Exponential_distribution

0
#175993

Darth
Participant

That thought occurred to me in the plane flying home last night. If we derive the simple sample size calculations from the confidence interval of a normal mean, why can’t we do the same for the confidence interval of the mean of the exponential? Until Stan rescues us with the RIGHT answer, I think I will explore your idea. Hopefully this will trigger somebody else to think this through. Would be cool to develop a sample size calculator for all distributions. The standard ones are binomial and normal but other distributions exist with some frequency. Thanks.

0
#175996

Bower Chiel
Participant

Hi DarthAn interesting question! Some thoughts: -Suppose that you wish to test the null hypothesis that the mean of an exponential distribution is 80 against the alternative that it is less than 80. With a “large” sample the central limit theorem kicks in and we could use a z-test and would reject the null in favour of the alternative, at the 5% level of significance, if the z test statistic was less than -1.64. Since the mean and standard deviation of the exponential distribution are equal we could use 80 as the standard deviation in calculating z.Suppose now that we wish to have power of 90% to detect a reduction in the mean from 80 to 75 with a test performed at the 5% level of significance. If the mean is actually 75 then so will be the standard deviation so if we perform a z-test power calculation we can use 75 for the standard deviation. This yielded n = 1927.In order to see if this was sensible I carried out some simulation in Minitab. With sample size 1927 the critical region for the test of Z

0
#175997

Darth
Participant

Bower,
Thanks for the continuing dialog.  But here is the kicker and misconception of the CLT.  CLT refers to the distribution of sample means not a single sample.  As the number of sample means increases, the average of the sample means X doublebar approaches the average of the population as well as approaches normality.  A single large sample from a non normal population will just replicate the non normal distribution, not become normal.  Our traditional sample size calculators work under the assumption of normality of the underlying population and are derived from the relevant confidence interval formulas.  I stated at the beginning that I want to just capture a single random sample not multiple samples.  Given all the purported stat experts posting on this site, it is amazing that all of a sudden they have gone quiet.  Thanks for the continuing effort.

0
#175998

melvin
Participant

Couldn’t you use bootstrapping and draw a large number of secondary samples from your original sample to build the CLT distribution and hence get the mean and confidence interval?

0
#175999

Vallee
Participant

Darth,
Unless it’s posts on keeping Robert S. in line, I find that most substantial posts started on Friday are usually not responded to as often.  Especially, note that Robert Butler has not replied… at least publicly.
I confess that most of my drill down (back to the fundamentals) in stats was in grad school with most of it covering the normal distro. Most of the time now, it’s just using the basics in use and explanation.
With that said, I found that formulas used in simulation models in industrial engineering courses covered many of these topics with time based scenarios. Often using the formulas geared to if the mean of the population is not known.  Here is also a link that might help that focuses on using the median and also discusses bootstrap with exponential distro’s. I like the idea of using the median which does not require as many samples as the bootstrap method.
Hope this helps,
HF Chris Vallee

0
#176002

Darth
Participant

Bob,
Bootstrapping and CLT are intended to create a normal distribution of sample means.  I already know from prior data that the distribution is exponential and the estimated mean and s.d., which of course are approximately equal.   I just want to take a single sample of size n to approximate the confidence intervals for the unknown population mean which is exponential.  I would have to take a good number of samples which defeats the purpose of the single sample.  Possibly I could use the confidence interval formulas for the exponential that are estimated in Minitab and solve for n which is how we get the standard sample size formulas for continuous data that is assumed to be normal.  Thought about doing some transformations but not sure how well that will work.  Challenging question isn’t it?  Surprised nobody has developed sample size calculations for distributions other than the standard normal and binomial.

0
#176003

Darth
Participant

Possibly I need to provide more clarification. I have about 5,000 documents containing financial information. Because of that, the distribution of the financial information bumps against a natural boundary of 0. There are historical indications that the distribution is exponential and I have some estimate of the mean and thus the s.d.. I want to draw a sample of size n from that “population” and make some inference about the specific financial characteristic with a power of B, and confidence of A. I don’t want to draw multiple small samples and hope the CLT works. Once I draw the appropriate sample size n, I can use Minitab and calculate the CI of the various parameters using the exponential. This can be found under some right censoring function for the exponential. Key is what sample size to see the desired precision, alpha and beta. Hope that helps.

0
#176005

Robert S
Member

Chris, keeping me in line? Just what is it you feel I have done that is out of line?

0
#176006

Vallee
Participant

Darth,
Knowing that your median for the historical documents would be more representative, why not use it for the sample size calculation instead of the mean? It would at limit the number of samples required for bootstrap.
HF Chris Vallee

0
#176008

Vallee
Participant

Robert S/ AKA Brandon,
Put the boxing gloves back in the drawer and read the message for what it is… what posts received lots of attention recently on weekends? Yours. Darth’s original post was a well placed question with a small number of responses… thus my comment.
Since I have already reduced the quality of Darth’s original post by my comment of what gets weekend answers, I will just answer your question and leave it alone. While all of us have put our foot in our mouth at one time or another, there has been humility and lessons learned.  From Brandon to Robert S. I have seen none of that. Get the chip off of your shoulder, do some introspective review and see how to recover…. a wise man once told me (when I was in the Chief’s office), to pick your fights… not everything is a battle or worth fighting at the time.
HF Chris Vallee

0
#176010

Robert S
Member

Chris, there is merit to what you say however I will always fight when under personal attack for unfounded reasons.
I posted as Brandon for likely similar reasons Stan and Darth and HB^2 and others post under other than their real names. I changed that.
Second the post of yours I questioned said the extraneous posts where “to keep me in line”. You therefore must be saying I was out of line. You didn’t say the posts between Stan & I were extraneous…an opinion which I could understand you holding.
I participate in the same manner I do here in the forum on lean.org and have NEVER had to deal with attacks like those I have had to endure from Stan. This garbage is about Stan – not me.

0
#176013

Szentannai
Member

Hi,
looking at the exponential distribution in Wikipedia it is pretty clear that there is no clear-cut formula for the sample size in the case of an exponential, but a pragnmatic approach might work in Excel.
Taking a confidence of 95% I did the following:
1. I created a column with numbers 1 to 100.
2. A second column with the values of the function CHIINV(0,025, 2*n) where n is the integer from column one.
3. A third column with the values CHIINV(0,925, 2*n)
4. A fourth column with the values 2*n*(1/column2 – 1/column3)
Now, when I have an estimated mean value for my exponential distribution I can just multiply the values from the 4th column with this value and it will give something like the precision of the estimate (the width of the confidence interval). Then I can go and pick the n number that gives me the required precision.
A quick warning – the point estimate is NOT in the middle of the confidence interval, so the width is not so nicely related to the precision of the measurement as in the normal case, but it is still better then nothing.
If you need a different confidence (like 90% or 99% you need of course to change the parameter of the CHIINV function accordingly.
Hope this helps.
Regards
Sandor

0
#176014

?
Participant

Hi Stan.
Are you going to help us all out with your answer ?

0
#176018

Szentannai
Member

Ooops,
the function in point 3 should be CHIINV(0,975, 2*n) not CHIINV(0,925, 2*n)!!

0
#176019

Anonymous
Participant

Darth,
What are you trying to determine? Confidence Interval for Mean? Power for One-Sample t? Two-Sample t? ANOVA? Factorial DOE? Equal Variance? Are you comparing the exponential scale parameter (lambda)?
Are you able to transform the data to normaility using Box-Cox or Johnson?

0
#176021

melvin
Participant

So to make sure I understand the question.
1. You are looking to find the sample size you need to give you a confidence interval for the population mean based on your chosen alpha/beta values?
2. What you want is the equation for exponential distributions so you can experiment a bit with the values before you actually draw your sample?
If so, I think Forrest Breyfrogle covers this in his books, will take a look when I get home.
Bob

0
#176022

Mikel
Member

Booby,You represent a business on here. You sound like a crybaby. Can you

0
#176032

Darth
Participant

Bob,
That’s about it. Minitab calculates a CI for parameters of an exponential set of data under Stat/Reliability Survival/Distribution Analysis (Right censoring)/parametric distribution analysis. I am able to find the formula that she uses to calculate the CI for the mean. It contains an n variable and if someone with better algebra skills than I can solve for n in that equation it appears that this might work. Of course, it only covers alpha but not beta but is better than nothing. Will be interesting to see what Breyfogle comes up with. Thanks.

0
#176035

Taylor
Participant

Darth, Did you select the PDF link?
At this point I have lost interest, Especially since Stan and Robert have once again turned to Peeing on each other
My only suggestion at this point is to use the FORCE and pull out a SWAG and hope for the best………

0
#176036

Darth
Participant

Sandor,
I am setting up the spreadsheet and trying to understand what is going on. Will be back with my thoughts. Thanks.

0
#176037

Darth
Participant

Yes, it appears very interesting from the title. I am slogging my way through the Greek and trying to see if something practical is contained within. Thanks for searching it out. I guess we called Stans bluff on this one and he would rather pee on Robert than answer the question. Oh well.

0
#176039

Robert S
Member

Chad, I have not and will not respond to Stan….so we aren’t “peeing on each other again”. Anybody here capable of tracking reality?

0
#176040

Vallee
Participant

Chad and Darth,Blame me for stirring the nest again… I knew better. As far as the original post, I am still wondering why the median can not be used instead of the mean? While not optimal it is less susceptible to skewness as opposed to the mean. Also since I got you on the post Chad, did you get the answer you needed on DuPont’s Stop application? Never got a reply on that one. HF Chris Vallee

0
#176041

Sloan
Participant

[A quick warning – the point estimate is NOT in the middle of the confidence interval, so the width is not so nicely related to the precision of the measurement as in the normal case, but it is still better then nothing.]
I’m not a statistician or a mathematician, far from it in fact, but intuitively, wouldn’t you expect the “shape” of the confidence interval to be somewhat similar to the shape of the exponential curve? And therefore is it a surprise that the point estimate is not in the middle of the confidence interval?
Just thinking out loud and I could be completely off base in my thinking.
It happens a lot.

0
#176042

Darth
Participant

It is a given that you are usually off base but this time you are correct in the fact that the CI will not be symmetrical much like what happens with the F or Chi square and other skewed distributions. So, what is KL going to do with his latest toy? Should be some more action on the work front, eh???

0
#176043

Darth
Participant

Nothing inherently wrong with the median. Have you figured out how to identify an appropriate sample size to satisfy desired alpha and possibly beta?

0
#176044

Taylor
Participant

Chris-DuPont STOP-Yes, and we are moving along nicely with the program. We injected some common sense with the upper VP’s and they agreed. And to date it has made a positive impact within the company. As with any program of this nature, maintaining the momentum is key, so time will tell if the “Drivers” can actually “Drive”.
I read one of you post the other day, and found it very interesting how your applying Sigma tools to Safety, I may want to contact you in the future about Process Safety Analysis and Job Safety Analysis. I don’t feel the group as a whole has a grasp on the functions of these systems. I have your Email so no need to post again. Thanks for the follow up.
As for Rob and Stan, Guys it’s insanely funny, and without the differences in opinion this site would be extremely boring. I respect the both of you and that’s why I brought it up, why not throw some more fuel on the fire, hell it’s Monday, and boring around here.
I remember the days when poor Mario was the subject of Stans affection, “Mario Honey” this and “Mario Honey” that, boy those where the days…………..

0
#176045

Robert S
Member

Chad, really? You find Stan insulting people and posting to them in a demeaning manner hilariously funny? Wow, some sense of humor. You must enjoy American Idol then, huh? To each his own.

0
#176046

Sloan
Participant

Hurray me! There’s a cliche about blind pigs and truffles that seems appropriate here.
KL does indeed like shiney expensive things. He’s a lot like Mrs. Outlier in that regard. “…But I got it on sale and look how much money I saved!”
ME: “Yes dear, but do we really need a \$300 cheese straightener?”
SHE: “You just can’t appreciate real value.”
I just hope this shopping spree keeps us busy. It has in the past. I gotta pay for this stuff somehow.

0
#176047

Taylor
Participant

Robert
Exactly-Our world has became so ridden with people that have thin skin and think that everyone is supposed to be nicey nicey to each other it makes me want to hurl. If you can’t stand the heat, stay out of the kitchen, or if you can, put your big girl panties on and deal with it. What I find hilariouly funny is that you take such offense to someone that as you have stated, “Doesnt even know me”.

0
#176049

melvin
Participant

Afraid Forest does not go into the CI for exponential distributions. Here is the link to a worked example on google books, not worked through the maths, might help?

0
#176052

Stevo
Member

Darth,

I almost have it figured out.  It involves time travel, but first I have to iron out a wrinkle in the time/space continuum.

You can find this and many other theorems in my book Stevos big book of Who Cares?

Stevo

Ps.  Sorry, sorry.  Im just lashing out because these topics make me feel that maybe my 3rd grade education was inadequate.

Pss.  Is the answer jelly donut?

0
#176053

Darth
Participant

And how can we ever forget Joe BB who Stan outed.  There were enough Joe Honeys that I started to wonder about Stan.

0
#176054

Darth
Participant

Mini does the CI for an exponential so that isn’t an issue.  I was even able to replicate her output.  Thanks for the effort.

0
#176056

Darth
Participant

Don’t worry Stevo.  You serve your purpose on this Forum and do a darn fine job of it.  Although the answer might be jelly donut, we are awaiting Stan’s official answer so we can close this thread down.

0
#176058

Darth
Participant

Chad, I just tried reading the article you linked me to.  Now I know how Stevo feels with normal posts.  I didn’t understand a single thing the article said.  I tried to get to the end where the author would say, “Now do this…..”  Considering I failed calculus twice in college, most of those symbols and letters were Greek to me….wait, they were Greek, go figure.  I will continue on my quest.

0
#176059

Taylor
Participant

Geez, we must be related, I too failed Calculus, not twice, but non the less………I thought it was Egyption Hyroglyphics, OH well, shows you what I know

0
#176060

Robert Butler
Participant

I didn’t realize until awhile ago that this thread really had a serious question as its point of origin.
You can run a power and sample size calculation for the Wilcoxon-Mann-Whitney test for two independent groups.  I do know that this capability exists in SAS 9.2.  Part of the program is the specification of the underlying distribution and one of the options is exponential.  In the SAS documentation they state:
“Note that the OBrien-Castelloe approach to computing power for the Wilcoxon test is approximate, based on asymptotic behavior as the total sample size gets large. The quality of the power approximation degrades for small sample sizes; conversely, the quality of the sample size approximation degrades if the two distributions are far apart, so that only a small sample is needed to detect a significant difference. But this degradation is rarely a problem in practical situations, in which experiments are usually performed for relatively close distributions.”
Unfortunately, I don’t have ready access to that reference so there isn’t much else I can provide at the moment.

0
#176061

Bower Chiel
Participant

Hi DarthTwo scenarios concerning light sabres whose lives are believed to follow an exponential distribution: -Scenario 1We are interested in the new Sc1 light sabre and wish to estimate mean life with 95% confidence (alpha = 0.05) using a sample of size n. We are also interested in having a confidence interval of width 10 days, say. Having specified alpha and a “precision” (width) for the interval estimate we turn to statistical theory to help us pin down a suitable value for n.Scenario 2The old Sc2 sabre has been around for a while and operates with a mean life of 80 days. An improvement initiative has been looking at improving its life. An increase of 20 days in the mean is deemed worthwhile. We plan to take a sample of size n of the prototype Sc2A model to formally test the null hypothesis that mean life is still 80 against the alternative hypothesis that the mean life is greater than 80, with significance level 5% (alpha = 0.05). In addition we stipulate that we’d like to be pretty certain to pick up an increase in the mean of 20 days – let’s say 90% certain (power = 0.90). The probability of failing to pick up such an increase would then be 10% (beta = 0.10). Having specified alpha and beta for the test of hypotheses we again turn to statistical theory to help us pin down a suitable value for n.I feel that in the discussion the two scenarios have been confused at times.Best WishesBower Chiel

0
#176070

Markert
Participant

What is the process ?????
Without a process this is meaningless babble.

0
#176071

Szentannai
Member

Hi,
once you know the data is exponentially distributed what else would you need to make this more meaningful and less of a babble ?
IMHO all the relevant information is already there in the statement “exponentially distributed”
Regards
Sandor

0
#176077

Darth
Participant

Data is financial in nature and thus bumps against a natural boundary of 0.  Shape is not normal and while a number of skewed distributions fit, exponential seems to work.  Two questions are out there; first, how would I calculate an appropriate sample size to do some inference estimating and secondly what would be an appropriate sample size for a one sample or two sample hypothesis test.  Here are recommendations so far:
1.  For inference just grab a sample and calculate the confidence interval.  An excel approach was suggested that let’s me back into a sample size based on the estimated mean and desired alpha and the formula for the ci of the exponential.  Was hoping to find a simple approach to let me calculate desired sample size directly if possible.
2.  For hypothesis testing we can transform the data and then use standard alpha/power/sd calculations in Minitab remembering to transform the precision as well using the same transformation
3.  For hypothesis testing we can use a non parametric test and thus the medians.  In this case, we have to find a comparable alpha/powersample size calculation.  No one seemed to have one at hand.  I did link to a simple program called Studysize that seems to contain everything.

0
#176079

Lke Skywalker
Participant

Can’t you leverage the mean being equal to the varinace for the exponential and work with an F or Chi-square distribution to back calculate what you want from more easily defined confidence interval formulas? Just a thought.

0
#176080

Darth
Participant

Luke, I saw that somewhere but haven’t gotten a chance to think it through.  Thanks for bringing it to my attention.  It could be an easier solution.

0
#176081

luke skywalker
Participant

I know it’s in the course material here these days (CI for variances, etc.). I’ll e-mail the nuts and bolts to you – just let me know if it works for you.

0
#176087

Taylor
Participant

I hope this works for Stan and Roberts sake

0
#176089

Mikel
Member

Luke has it right. CI for exponential is derived using Chi-Squared.
Sample size for variance and exponential will be the same.Now if Darth just knew how to derive sample size for variance.

0
#176094

Bower Chiel
Participant

Hi Darth
The formulae in Excel for the 95 % confidence limits for the mean of an exponential distribution involves taking the sample mean and multiplying by the factor 2*A2/CHIINV(0.025,2*A2), where cell A2 holds the sample size, to get the lower limit and by the factor  2*A2/CHIINV(0.975,2*A2) to get the upper limit.  (With 95% confidence we have alpha = 0.05 so the first arguments in the CHIINV formulae are alpha/2 and 1-alpha/2.)  For sample size 500, for example, the factors are 0.918 and 1.094 respectively.  For sample size 2000 the factors are 0.958 and 1.046 respectively.  Thus, in broad terms, with a sample size of 500 the 95% lower confidence limit for the population mean is 92% of the sample mean and the upper limit is 109% of the sample mean.  With sample size 2000 the 95% lower confidence limit for the population mean is 96% of the sample mean and the upper limit is 105% of the sample mean.
The large sample 95% confidence interval for a population mean based on the CLT is xbar plus/minus 1.96sigma/root(n).  We don’t know the population sigma but with an exponential distribution we do know that it equals the population mean which gives xbar plus/minus 1.96xbar/root(n).  The xbar is a common factor so with this approach the lower 95% limit is the sample mean multiplied by the factor 1-1.96/root(n) and the upper limit is the sample mean multiplied by the factor 1+1.96/root(n).  For n = 500 the factors are 0.912 and 1.088 respectively and for n = 2000 the factors are 0.956 and 1.044.  Thus the performance of the CLT approach is getting closer to that of the exact chi-squared approach as sample size increases.  Please note that the CLT approach involves taking a single sample – I think that explantions of the CLT involving reference to repeated sampling from a population may cloud the issue of its application in practice..
I’ve created an Excel spreadheet that computes and graphs the factors which I’m happy to pass on to anyone who sends an e-mail to [email protected] .  It’s not perfect as the CHIINV formula in Excel 2000 fails to return a value when the second argument reaches 1368 (!) so I had to use a well-documented normal approximation to the chi-squared distribution.  Perhaps someone could modify it in a later version of Excel?
I’m puzzled by the last poster’s comment on this thread that sample size for exponential and variance will be the same.  Perhaps he could clarify.
Best Wishes
Bower Chiel

0
#176113

Darth
Participant

Bower,
Just glanced over what you suggested. Looks good and as soon as I finish my bottle of tequila I will dive into it. Looks similar to what Sandor recommended in a previous post. Thanks for the efforts. Glad to see we had a relatively long thread which actually provided some technical debate with only a minimum of silliness . It can be done. Thanks to all. Now let’s get back to trashing newbies and anybody that Stan knows :-).

0
Viewing 64 posts - 1 through 64 (of 64 total)

The forum ‘General’ is closed to new topics and replies.