iSixSigma

Sample Size Affirmation

Six Sigma – iSixSigma Forums Old Forums General Sample Size Affirmation

Viewing 10 posts - 1 through 10 (of 10 total)
  • Author
    Posts
  • #53306

    Doa
    Participant

    Hi To All,
    I’ve combed through many of the previous posts regarding sample size and its importance regarding hypothesis testing and significance of the AD number for discrete data; as a newbie I’m looking for a little SS affirmation.  Given the numbers of related posts this appears to be a triggrer topic.
    Given 18 samples, one can calculate descriptive statistics and run a minitab normality test , but this will be of little value, any analysis would be of dubious value. 
    Rather than even attempt to assess normality and comment on the descriptive statistics, I’m probably better off  requesting that more samples be taken >30 and then conduct a normality assessment and run descriptive stats again and proceed with further analysis.
    Thanks….
    Marty

    0
    #189568

    Mikel
    Member

    Why 30?We called that kind of number a brown number where I learned this
    stuff.

    0
    #189579

    MBBinWI
    Participant

    Marty:  You first need to decide what you are trying to do with the data that you are gathering via sampling.  This will aid in determining how to sample.  Next you need to establish the confidence level of the analysis you need.  Finally, what level of precision is critical to you.  The answers to these questions can/should be determined prior to sampling or analysing any data.
    The final item to determine sample size is the distribution and variation of the data itself.  All of these items combine to determine the method and size of sampling.  If you know absolutely nothing about the population, then begin by gathering some data and look at what you have.  18 is certainly enough for a first pass analysis – so look at the descriptive statistics and histogram.  You appear to use Minitab, so plug the info into the sample size calculator and see how many data points are indicated for what you are trying to answer.  Have you exceeded that amount?  If not, then gather that amount and verify again.  If the increased amount of data has now exceeded the level indicated, you can perform your intended analysis.
    Of course, this assumes that you have a stable process, acceptable MSA, and proper sampling methods.
     

    0
    #189580

    Robert Butler
    Participant

    Your statement  “Given 18 samples, one can calculate descriptive statistics and run a minitab normality test , but this will be of little value, any analysis would be of dubious value. 
    Rather than even attempt to assess normality and comment on the descriptive statistics, I’m probably better off  requesting that more samples be taken >30 and then conduct a normality assessment and run descriptive stats again and proceed with further analysis.” is curious in the extreme.
      I know of no book nor peer reviewed paper that would declare 18 samples to be of little value and the analysis of same of dubious value.  Sample size is a function of many things but two very big issues, at least in every industry I’ve worked in, are time and money.  As I’ve mentioned in other posts on this topic – if the cost, in terms of time and/or money, of gathering and assessing a single sample is something other than trivial the odds that anyone is going to allow you even 18 samples is extremely low.   
      As for analysis remember the following: The t-test and ANOVA are robust with respect to non-normality. The t-table at the back of any statistics book I’ve ever examined always starts with n = 2.  There are no asterisks nor caveats in these tables with respect to the number 2 or any other value of n between 2 and 30. A check of the historical literature indicates the t-test is valued  precisely because of the low number of samples needed to run it  and reach meaningful conclusions concerning whatever it is you are investigating. 
     Will your confidence intervals be larger for a small sample than for one with a larger sample? Of course. But, if all you can afford are two samples per condition then you have one of two choices, you can refuse to run any kind of test and make a blind/gut feel guess or you can take your two samples and analyze them and use the information gained to guide you in your efforts.

    0
    #189581

    Darth
    Participant

    Marty,First of all, as Robert indicated, a statistically valid sample size depends on a lot of variables. Typically, the sample size calculators only worry about your alpha error. You specifically mentioned hypothesis testing so you better worry about beta error as well (power). How much difference or change you want to see will have a big effect. The variation of the population will impact. You seem too concerned with normality. Most tests are pretty robust and there is always Plan B if the normality doesn’t work out. Obviously larger is better after you have considered the resources needed to get a bigger sample size. Bottom line question has to be, “what are you trying to learn?”

    0
    #189593

    Doa
    Participant

    Hi,
    The rest of the story, I have four different sets of data; #1 the current process, #2-#4 that are variations of process #1 which I am trying to evaluate in terms of stability, capability etc to ultimatley assess or recommend to improve the current process #1. 
    Each data set contains the same key output for that process (tensile strength); #1 120 samples 40 subgroups with 3 observations per, #2&#3 10 subgroups 3 observations per and #4 with 18 total observations. 
    Confidence level is 95%, I am using Minitab 15, when assessing each option, my first step is to assess the data or process as stable or not, in the case of #4 it is as are #2-#3, data set #1 is not stable. My second step is to assess the distribution as normal or not, when assessing option #4 (as well as #1,#2,#3) I use the  Stat>Basic Stats>Normality test and it returns AD = .374 and P=.378, I also assess the distribution visually running a histogram with curve fit, it appears to be rather bi-modal. 
    I then run Stats>Basic Stats>Graphical summary, this confirms the AD and P values, yet when I run a Stat>Reliabilty/Survival>Distribution Analysis Right Censoring>Distribution ID Plot with estimation Method set to Least Squares or Maximum Likleyhood Weibull carries the day with the highest correlation or the smallest AD number.
    Knowing that when P<0.05 you have a non-normal dsitribution, false in this case, I visually compare the histograms of this ditribution with the other two I am evaluating one appears to be more "normal", yet  they have smaller or larger AD/Correlation number.
    All these said I have been reading posts here and elsewhere that directly or indirectly caution evaluting small data sets <30 and making any decisons on such a small set.
    I may have a case of analysis paralysis; for when P>0.05 why pursue the distribution analysis right censoring to find a “better” distribution fit?  All part of the learning experience I guess.  Given the number of posts I’ve come across this seems to be a popular subject.
    Thank you for your time…..
    Marty

    0
    #189599

    Darth
    Participant

    Yes, you seem to have analyzed this to death. Your varying sample sizes is a small complication. If you want, send me the data at [email protected] and I will try to take a look and we can take it offline.

    0
    #189601

    Tierradentro
    Participant

    Marty,
    You got three smart guys helping you here…all their advice should be taken into account.  I would simply counsel you to back up and go into MTB > Power and Sample Size and play with the fields to better educate yourself as to the elements (ie alpha, beta, variance, precision, sample size, analytical method) involved and their relationship to one another.  This will do more for your understanding than regurgatating prior GB/BB lecture material. 
    Don’t outrun your headlights on this one.  Decide what you need the information to tell you (eg  Is there a statistical difference between processes 1-4? Is there a practical one?  etc).   I always state my critical research questions first, then build my sampling and data collection plan accordingly.  Best of luck.
    And if Darth is offering to help you offline, take it! 

    0
    #189603

    Doa
    Participant

    Hey John,
    Good and timely advice as we now addressing the alpha, beta, precsion etc. subjects.  Appears I was getting ahead of myself. Thank you for the input!
    Regards,
    Marty

    0
    #189616

    Doa
    Participant

    Good Morning Darth,
    This site is to say the least invaluable.  Thank you for your offer, I will email my Minitab file to you with a statement of what I am trying to accomplish via DMAIC.
    Many Thanks,
    Marty

    0
Viewing 10 posts - 1 through 10 (of 10 total)

The forum ‘General’ is closed to new topics and replies.