Central limit theorem average of averages
Six Sigma – iSixSigma › Forums › Old Forums › General › Central limit theorem average of averages
 This topic has 9 replies, 7 voices, and was last updated 13 years, 5 months ago by So Not Stan.

AuthorPosts

February 27, 2009 at 4:30 pm #51928
ericksonParticipant@chiporip Include @chiporip in your post and this person will
be notified via email.I have a CLT explanation problem, I have data that is time data, continuous, that comes in as an event with a time. Varying events per day at different processes. The data collection system logs the average time of these events per day by process. 365 days of average event times at several processes. These distributions are nonnormal, can I use standard hypothesis testing methods to determine differences or would nonparametrics be the only way, outside of transformation. I can’t seem to determine if the CLT applies here or if averaging averages negates it.
Mark0March 2, 2009 at 9:23 am #181843Hi,
Before you proceed further, pls check if all the events are of same nature. If not, you may want to analyze them seperately.
Thanks0March 2, 2009 at 4:19 pm #181861Chip…please clarify…
(1) Are you looking at 365 data points, each averaged across one full day?
(2) Which distribution is nonnormal? The individual data points? Or the averages?
Typically, if you were averaging more than about 30 individual data points per day, the distribution of the averages would be normalized due to the CLT.
Obiwan0March 2, 2009 at 4:35 pm #181862Mark:Approximately how many events occur each day?Are these data from a process where consecutive events are taking different times like an assembly line or independent random events like call length for an incoming call centre?Cheers, Alastair
0March 2, 2009 at 8:02 pm #181870
ericksonParticipant@chiporip Include @chiporip in your post and this person will
be notified via email.Thank you for the responses, I’ll try to clarify. The events are the same in nature, but vary in duration and randomly occur durning the day and are independent. I do not get the time in minutes for each individual event but rather the daily of average of events from each process. There can be 20 to 100 events per day. Table below may make more sense. Day, would continue to 365 and events are variable and the trick is it is difficult to determine the actual number of events per day (manual data collection from another system), say for a weighted average approach. But I do get the daily average by process by day. The business metric is to make sure the average does not exceed spec. but we are educating people on how poor a process may be even though the average of averages meets a spec.
process
day
event
daily avg. in minutes1
1
1
xbar(event 1,2)1
1
22
1
3
xbar(event 3,4)2
1
41
2
5
xbar(event 5,6)1
2
62
2
7
xbar(event 7,8)2
2
80March 2, 2009 at 10:54 pm #181875Why 30?
0March 2, 2009 at 11:37 pm #181877Mark:Taking the average of averages can be used to hide all kinds of problems.Use Minitab to generate some test data assuming both a normal distribution and a lognormal distribution (very common for financial and timeontask situations). Fiddle the parameters so the average is the same for both distributions. Now use these two datasets to calculate the number of individual defect counts versus the number of days of ‘out of spec’ days based on the averages.Compare the two numbers for the normal data set and the lognormal data set. You should be able to satisfy the group why they should not be using the average of the averages for making decisions.Customers do not usually make decisions based on daily averages, but individual transactions.Cheers, Alastair
0March 3, 2009 at 8:38 am #181886Don’t check normality if you have grouped different processes with different events, it is misleading. Try to cluster same processes and see if it’s normal. If not, try to transform if its heavily skewed. If you see a trend, cycle or shift, and not that skewed try to use non normal hypothesis tests, anyway this will still give you conclusive results provided you have large sample sizes
0March 3, 2009 at 11:24 am #181890
ericksonParticipant@chiporip Include @chiporip in your post and this person will
be notified via email.Dr. J, now you are on point, the data transforms poorly and each process is nonnormal. I used Mood’s to analyze the data but was hoping to determine if the CLT with my large sample sizes would hold true so I could use the ANOVA and Tukey’s to analyze the data. In the CLT, I was hoping my daily averages would act like a random sample for that day but I don’t think that is the intent of the CLT.
0March 3, 2009 at 11:37 am #181892
So Not StanMember@SoNotStan Include @SoNotStan in your post and this person will
be notified via email.Stan, you are so not helpful in your postings. Maybe you should just stop for awhile. We are not impressed by your feeble attempts to show your importance and inflated ego. Your inability to communicate clearly is only cluttering this forum and is only serving to confuse those asking legitimate questions. If you are going to post a response, and in your response, if you are going to attempt to teach a lesson, then teach the lesson in a clear, concise, understandable manner. “Why 30?”, doesn’t help the original poster at all. Those of us with a greater knowledge of statistics can appreciate your words, but I’m sure many more don’t.
If you want to teach, then teach. If you want to criticize without adding value, you should be asked to leave this forum. I’ll be glad to start that process.
SNS0 
AuthorPosts
The forum ‘General’ is closed to new topics and replies.