Sample Size Affirmation
Six Sigma – iSixSigma › Forums › Old Forums › General › Sample Size Affirmation
 This topic has 9 replies, 6 voices, and was last updated 10 years, 9 months ago by Doa.

AuthorPosts

February 20, 2010 at 6:07 pm #53306
DoaParticipant@optomist1 Include @optomist1 in your post and this person will
be notified via email.Hi To All,
I’ve combed through many of the previous posts regarding sample size and its importance regarding hypothesis testing and significance of the AD number for discrete data; as a newbie I’m looking for a little SS affirmation. Given the numbers of related posts this appears to be a triggrer topic.
Given 18 samples, one can calculate descriptive statistics and run a minitab normality test , but this will be of little value, any analysis would be of dubious value.
Rather than even attempt to assess normality and comment on the descriptive statistics, I’m probably better off requesting that more samples be taken >30 and then conduct a normality assessment and run descriptive stats again and proceed with further analysis.
Thanks….
Marty0February 20, 2010 at 7:01 pm #189568Why 30?We called that kind of number a brown number where I learned this
stuff.0February 21, 2010 at 1:42 pm #189579
MBBinWIParticipant@MBBinWI Include @MBBinWI in your post and this person will
be notified via email.Marty: You first need to decide what you are trying to do with the data that you are gathering via sampling. This will aid in determining how to sample. Next you need to establish the confidence level of the analysis you need. Finally, what level of precision is critical to you. The answers to these questions can/should be determined prior to sampling or analysing any data.
The final item to determine sample size is the distribution and variation of the data itself. All of these items combine to determine the method and size of sampling. If you know absolutely nothing about the population, then begin by gathering some data and look at what you have. 18 is certainly enough for a first pass analysis – so look at the descriptive statistics and histogram. You appear to use Minitab, so plug the info into the sample size calculator and see how many data points are indicated for what you are trying to answer. Have you exceeded that amount? If not, then gather that amount and verify again. If the increased amount of data has now exceeded the level indicated, you can perform your intended analysis.
Of course, this assumes that you have a stable process, acceptable MSA, and proper sampling methods.
0February 21, 2010 at 2:10 pm #189580
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.Your statement “Given 18 samples, one can calculate descriptive statistics and run a minitab normality test , but this will be of little value, any analysis would be of dubious value.
Rather than even attempt to assess normality and comment on the descriptive statistics, I’m probably better off requesting that more samples be taken >30 and then conduct a normality assessment and run descriptive stats again and proceed with further analysis.” is curious in the extreme.
I know of no book nor peer reviewed paper that would declare 18 samples to be of little value and the analysis of same of dubious value. Sample size is a function of many things but two very big issues, at least in every industry I’ve worked in, are time and money. As I’ve mentioned in other posts on this topic – if the cost, in terms of time and/or money, of gathering and assessing a single sample is something other than trivial the odds that anyone is going to allow you even 18 samples is extremely low.
As for analysis remember the following: The ttest and ANOVA are robust with respect to nonnormality. The ttable at the back of any statistics book I’ve ever examined always starts with n = 2. There are no asterisks nor caveats in these tables with respect to the number 2 or any other value of n between 2 and 30. A check of the historical literature indicates the ttest is valued precisely because of the low number of samples needed to run it and reach meaningful conclusions concerning whatever it is you are investigating.
Will your confidence intervals be larger for a small sample than for one with a larger sample? Of course. But, if all you can afford are two samples per condition then you have one of two choices, you can refuse to run any kind of test and make a blind/gut feel guess or you can take your two samples and analyze them and use the information gained to guide you in your efforts.0February 21, 2010 at 2:38 pm #189581Marty,First of all, as Robert indicated, a statistically valid sample size depends on a lot of variables. Typically, the sample size calculators only worry about your alpha error. You specifically mentioned hypothesis testing so you better worry about beta error as well (power). How much difference or change you want to see will have a big effect. The variation of the population will impact. You seem too concerned with normality. Most tests are pretty robust and there is always Plan B if the normality doesn’t work out. Obviously larger is better after you have considered the resources needed to get a bigger sample size. Bottom line question has to be, “what are you trying to learn?”
0February 22, 2010 at 4:53 pm #189593
DoaParticipant@optomist1 Include @optomist1 in your post and this person will
be notified via email.Hi,
The rest of the story, I have four different sets of data; #1 the current process, #2#4 that are variations of process #1 which I am trying to evaluate in terms of stability, capability etc to ultimatley assess or recommend to improve the current process #1.
Each data set contains the same key output for that process (tensile strength); #1 120 samples 40 subgroups with 3 observations per, #2 10 subgroups 3 observations per and #4 with 18 total observations.
Confidence level is 95%, I am using Minitab 15, when assessing each option, my first step is to assess the data or process as stable or not, in the case of #4 it is as are #2#3, data set #1 is not stable. My second step is to assess the distribution as normal or not, when assessing option #4 (as well as #1,#2,#3) I use the Stat>Basic Stats>Normality test and it returns AD = .374 and P=.378, I also assess the distribution visually running a histogram with curve fit, it appears to be rather bimodal.
I then run Stats>Basic Stats>Graphical summary, this confirms the AD and P values, yet when I run a Stat>Reliabilty/Survival>Distribution Analysis Right Censoring>Distribution ID Plot with estimation Method set to Least Squares or Maximum Likleyhood Weibull carries the day with the highest correlation or the smallest AD number.
Knowing that when P<0.05 you have a nonnormal dsitribution, false in this case, I visually compare the histograms of this ditribution with the other two I am evaluating one appears to be more "normal", yet they have smaller or larger AD/Correlation number.
All these said I have been reading posts here and elsewhere that directly or indirectly caution evaluting small data sets <30 and making any decisons on such a small set.
I may have a case of analysis paralysis; for when P>0.05 why pursue the distribution analysis right censoring to find a “better” distribution fit? All part of the learning experience I guess. Given the number of posts I’ve come across this seems to be a popular subject.
Thank you for your time…..
Marty0February 22, 2010 at 7:59 pm #189599Yes, you seem to have analyzed this to death. Your varying sample sizes is a small complication. If you want, send me the data at [email protected] and I will try to take a look and we can take it offline.
0February 22, 2010 at 8:48 pm #189601
TierradentroParticipant@john Include @john in your post and this person will
be notified via email.Marty,
You got three smart guys helping you here…all their advice should be taken into account. I would simply counsel you to back up and go into MTB > Power and Sample Size and play with the fields to better educate yourself as to the elements (ie alpha, beta, variance, precision, sample size, analytical method) involved and their relationship to one another. This will do more for your understanding than regurgatating prior GB/BB lecture material.
Don’t outrun your headlights on this one. Decide what you need the information to tell you (eg Is there a statistical difference between processes 14? Is there a practical one? etc). I always state my critical research questions first, then build my sampling and data collection plan accordingly. Best of luck.
And if Darth is offering to help you offline, take it!0February 22, 2010 at 9:08 pm #189603
DoaParticipant@optomist1 Include @optomist1 in your post and this person will
be notified via email.Hey John,
Good and timely advice as we now addressing the alpha, beta, precsion etc. subjects. Appears I was getting ahead of myself. Thank you for the input!
Regards,
Marty0February 23, 2010 at 2:46 pm #189616
DoaParticipant@optomist1 Include @optomist1 in your post and this person will
be notified via email.Good Morning Darth,
This site is to say the least invaluable. Thank you for your offer, I will email my Minitab file to you with a statement of what I am trying to accomplish via DMAIC.
Many Thanks,
Marty0 
AuthorPosts
The forum ‘General’ is closed to new topics and replies.