iSixSigma

Central limit theorem average of averages

Six Sigma – iSixSigma Forums Old Forums General Central limit theorem average of averages

Viewing 10 posts - 1 through 10 (of 10 total)
  • Author
    Posts
  • #51928

    erickson
    Participant

    I have a CLT explanation problem, I have data that is time data, continuous, that comes in as an event with a time.  Varying events per day at different processes.  The data collection system logs the average time of these events per day by process.  365 days of average event times at several processes.  These distributions are non-normal, can I use standard hypothesis testing methods to determine differences or would non-parametrics be the only way, outside of transformation.  I can’t seem to determine if the CLT applies here or if averaging averages negates it.
    Mark

    0
    #181843

    MM
    Participant

    Hi,
    Before you proceed further, pls check if all the events are of same nature. If not, you may want to analyze them seperately.
     
    Thanks

    0
    #181861

    Obiwan
    Participant

    Chip…please clarify…
    (1)  Are you looking at 365 data points, each averaged across one full day?
    (2)  Which distribution is non-normal?  The individual data points? Or the averages?
    Typically, if you were averaging more than about 30 individual data points per day, the distribution of the averages would be normalized due to the CLT.
    Obiwan

    0
    #181862

    BTDT
    Participant

    Mark:Approximately how many events occur each day?Are these data from a process where consecutive events are taking different times like an assembly line or independent random events like call length for an incoming call centre?Cheers, Alastair

    0
    #181870

    erickson
    Participant

    Thank you for the responses, I’ll try to clarify.  The events are the same in nature, but vary in duration and randomly occur durning the day and are independent.  I do not get the time in minutes for each individual event but rather the daily of average of events from each process.  There can be 20 to 100 events per day.  Table below may make more sense.  Day, would continue to 365 and events are variable and the trick is it is difficult to determine the actual number of events per day (manual data collection from another system), say for a weighted average approach.  But I do get the daily average by process by day.  The business metric is to make sure the average does not exceed spec. but we are educating people on how poor a process may be even though the average of averages meets a spec. 

    process
    day
    event
    daily avg. in minutes

    1
    1
    1
    xbar(event 1,2)

    1
    1
    2

    2
    1
    3
    xbar(event 3,4)

    2
    1
    4

    1
    2
    5
    xbar(event 5,6)

    1
    2
    6

    2
    2
    7
    xbar(event 7,8)

    2
    2
    8

    0
    #181875

    Mikel
    Member

    Why 30?

    0
    #181877

    BTDT
    Participant

    Mark:Taking the average of averages can be used to hide all kinds of problems.Use Minitab to generate some test data assuming both a normal distribution and a log-normal distribution (very common for financial and time-on-task situations). Fiddle the parameters so the average is the same for both distributions. Now use these two datasets to calculate the number of individual defect counts versus the number of days of ‘out of spec’ days based on the averages.Compare the two numbers for the normal data set and the log-normal data set. You should be able to satisfy the group why they should not be using the average of the averages for making decisions.Customers do not usually make decisions based on daily averages, but individual transactions.Cheers, Alastair

    0
    #181886

    Dr.J
    Participant

    Don’t check normality if you have grouped different processes with different events, it is misleading. Try to cluster same processes and see if it’s normal. If not, try to transform if its heavily skewed. If you see a trend, cycle or shift, and not that skewed try to use non normal hypothesis tests, anyway this will still give you conclusive results provided you have large sample sizes

    0
    #181890

    erickson
    Participant

    Dr. J, now you are on point, the data transforms poorly and each process is non-normal.  I used Mood’s to analyze the data but was hoping to determine if the CLT with my large sample sizes would hold true so I could use the ANOVA and Tukey’s to analyze the data.  In the CLT, I was hoping my daily averages would act like a random sample for that day but I don’t think that is the intent of the CLT.

    0
    #181892

    So Not Stan
    Member

    Stan, you are so not helpful in your postings.  Maybe you should just stop for awhile.  We are not impressed by your feeble attempts to show your importance and inflated ego.  Your inability to communicate clearly is only cluttering this forum and is only serving to confuse those asking legitimate questions.  If you are going to post a response, and in your response, if you are going to attempt to teach a lesson, then teach the lesson in a clear, concise, understandable  manner.  “Why 30?”, doesn’t help the original poster at all.  Those of us with a greater knowledge of statistics can appreciate your words, but I’m sure many more don’t.
    If you want to teach, then teach.  If you want to criticize without adding value, you should be asked to leave this forum.  I’ll be glad to start that process.
    SNS

    0
Viewing 10 posts - 1 through 10 (of 10 total)

The forum ‘General’ is closed to new topics and replies.