iSixSigma

Determining Experiment Sample Size

Six Sigma – iSixSigma Forums Old Forums General Determining Experiment Sample Size

Viewing 24 posts - 1 through 24 (of 24 total)
  • Author
    Posts
  • #48525

    Alvarez
    Participant

    Suppose one is going to conduct a series of experiments to determine the value of a new technology. The experiment will consist of 3 different scenarios. Each scenario will be run two times – once with the technology and once with out it – by two different teams. Each team has two members. At the end of each paired experiment [once with the technology and once with out it] each team member will complete a survey. The survey asks each team member to rate 10 capabilities associated with the technology according to 5 factors [i.e. significantly improved, somewhat improved, no change, somewhat decline, significant decline]. Once both teams complete the survey there will be 12 complete surveys. My question are (1) given there are 12 complete survey by 4 different people how large is the sample; (2) how is this determined; and (3) how would I determine the size of the sample necessary to draw valid conclusions? Thank you for your kind assistance.

    0
    #163999

    Ken Feldman
    Participant

    Sorry, it ain’t gonna happen from a statistical standpoint.  What is your experimental question?  Is it just with and without?  Does team or team member enter into it?  They will be introducing noise into your experiment if you don’t care about stratifying by team or member.  The use of the survey with that ordinal design sounds like some Chi square analysis needs to be used.  At this point you don’t really have enough information to adequately analyze the data and draw conclusions.

    0
    #164013

    Dr. Scott
    Participant

    Andrew,
    There is no way for you to determine the sample size without knowing what the variance of your chosen metric has been in the past.
    Furthermore, your numbers don’t add up. Seems to me you can have no fewer than 24 combinations. That is; 3 scenarios X 2 technologies X 2 teams X 2 members = 24.
    Good Luck,
    Dr. Scott
     

    0
    #164077

    Alvarez
    Participant

    First thank you for your reply. The question we are trying to answer is “Does the technology improve combat capability”? The 12 independent variables being evaluated are assumed to be combat capability factors. There is also an instrument to measure the experience of team members. You wrote ” At this point you don’t really have enough information to adequately analyze the data and draw conclusions.”. My originally question basically is/was how much data do we need and how do I determine this? Thanks again.Regards,
    Andrew

    0
    #164080

    Alvarez
    Participant

    Dr Scott,Thank you for your reply. You wrote “Furthermore, your numbers don’t add up. Seems to me you can have no fewer than 24 combinations. That is; 3 scenarios X 2 technologies X 2 teams X 2 members = 24.”. The survey was issued once for each scenario and team – that is after the scenario was run with and without the technology. This explains why there are 12 [not 24] complete surveys.I assume that we don’t have enough data to reliably determine the variance of the factors we measured [e.g. situation awareness]?My original question(s) were really designed to learn how to determine how much data we would need to draw valid conclusions if we were to run the experiment again. Thank you again for your time and assistance.Regards,
    Andrew

    0
    #164096

    Ron
    Member

    This is not a DOE it is a one factor experiment. OFAT one factor at a time.
     

    0
    #164112

    Dr. Scott
    Participant

    Andrew,
    I think I understand now. So you are taking one survey from each “team” not each of the two “team members”. Is that correct?
    I would suggest you survey each team member, for a number of reasons (I am willing to explain later if you wish).
    As far as sample size goes, you probably will have to replicate the experiment several times (certainly more than one) to get enough data to learn from.
    And finally, I would very much rather you had historical data on this measure and an MSA performed. Is there not a better measure you can use such as success v. failure in attempt (binomial) or total value sold (better because continuous)?
    Good Luck,
    Dr. Scott

    0
    #164127

    Alvarez
    Participant

    Dr Scott,No, each team member completes the survey once – after each scenario is run with and without the technology – for three different scenarios.Each team [of which there are 2]consists of a pilot and a weapon system officer. They fly 2 missions [based on the scenario]in an F15-E simulator. The first time they fly they use the new technology – the second time they fly without the new technology. Once the two flights are complete each crew member takes the survey once.Why fly with and without the technology – and take the survey once? The survey is designed to determine how the technology improved their ability to perform various functions. So the idea was that the crew would have a more objective means to determine whether the technology helped them if they flew the mission without the technology and then with it.Then they fly the second scenario twice and both crew members take the survey once. Then the same crew flies the third scenario and takes the survey. Team 1 is done at this point.The second team does exactly what the first team did.The survey was designed to assess how the technology improved the crew’s combat capability. Each question on the survey measured a combat capability factor – they can be likened to CTQs.So there exists a survey taken by all 4 crew members for all three scenarios. One of my original questions was – how big is my current sample size? I assume it is 4 for each scenario. Where I was going here was – if I were to perform the experiment again (1) would it be better to run the same scenario or a mix? (2) If the best approach would be to run the same scenario – how large of a sample would I need and how would I determine it?MSA – Measurement System Analysis?Thank you again for your time and assistance.Regards,
    Andrew

    0
    #164128

    annon
    Participant

    Hey Andrew, If you are running this thing out of the sim, comparing two different weapon systems, avionics packages, etc, why not run an ANOVA or 2 sample t test?I would move away from your survey format and use the wealth of continuos information provided by the sim. You will find much greater precision and thus, much smaller sample sizes. You can measure the deviations from altitude, airspeed, heading, TOT, glide rates, climb rates, reaction times, time to target acquisition, etc between the two packages to determine differences (if any) in ¨combat effectiveness¨. And statistically, you are going to require very large sample sizes (larger than flight or squadron size) if you intend to use a survey style instrument with any degree of statistical precision.Good luck.

    0
    #164162

    Dr. Scott
    Participant

    Andrew,
    Please see my responses in bold below.
    No, each team member completes the survey once – after each scenario is run with and without the technology – for three different scenarios. In this statement alone you have mentioned three factors 1) two team members, 2) two levels of technology, and 3) scenarios which you have stated to be three.
    Each team [of which there are 2] a 4th factor is identified here, the team, 2 levels because of two teams consists of a pilot and a weapon system officer this factor has already been identified as team member above, factor 1. They fly 2 missions [based on the scenario] This factor has been identified as number 3 above in an F15-E simulator. The first time they fly they use the new technology – the second time they fly without the new technology This factor has been identified as number 2 above. Once the two flights are complete each crew member takes the survey once.
    Why fly with and without the technology – and take the survey once? The survey is designed to determine how the technology improved their ability to perform various functions. Per annon’s suggestion and the previous suggestion I made in another post, you need to use a more continuous measure and objective measure. You mention below “combat capability”. I suspect this is a measure (or some similar rating) given or taken after each simulation. So the idea was that the crew would have a more objective means to determine whether the technology helped them if they flew the mission without the technology and then with it. A measure of “combat capability” or something similar would be much more objective and usable than the survey you suggest to use.
    Then they fly the second scenario twice and both crew members take the survey once. Then the same crew flies the third scenario and takes the survey. Team 1 is done at this point.
    The second team does exactly what the first team did.
    The survey was designed to assess how the technology improved the crew’s combat capability Again, what is your measure of combat capability, surely the Air force or Navy (whichever it is you are working for) has this or a similar metric that you can use. Furthermore, I suspect there are performance ratings of both the “pilot” and “weapons system officer” separately Each question on the survey measured a combat capability factor – they can be likened to CTQs.
    So there exists a survey taken by all 4 crew members for all three scenarios.
    One of my original questions was – how big is my current sample size? I assume it is 4 for each scenario. Where I was going here was – if I were to perform the experiment again (1) would it be better to run the same scenario or a mix? (2) If the best approach would be to run the same scenario – how large of a sample would I need and how would I determine it?
    MSA – Measurement System Analysis? combat capability, annon’s comments regarding other factors and his observation of moving away from a survey (which goes back to a measure of combat capability)
    Thank you again for your time and assistance.
    I assume that annon’s references to the various other factors that might be considered are covered by the three different scenarios.
    So, you have at least four factors 1) pilot vs. weapon systems officer, 2) new vs. old technology, 3) team 1 vs. team 2, and 4) mission scenario levels 1, 2, and 3.
    Therefore I would consider an experiment that looks something more like the following:
    Run     Pilot v. WSO  New Tech v. Old        Team 1 v. Team 2      Scenario 1,2,3
    1                  -1                       -1                                   -1                                 1
    2                   1                       -1                                   -1                                 1
    3                  -1                        1                                   -1                                 1
    4                   1                        1                                   -1                                 1
    5                  -1                       -1                                    1                                 1
    6                  1                       -1                                    1                                 1
    7                  -1                        1                                    1                                 1
    8                   1                        1                                    1                                 1
    9                  -1                       -1                                   -1                                 2
    10                 1                       -1                                   -1                                 2
    11                -1                        1                                   -1                                 2
    12                 1                        1                                   -1                                 2
    13                -1                       -1                                    1                                 2
    14                 1                       -1                                    1                                 2
    15                -1                        1                                    1                                 2
    16                 1                        1                                    1                                 2
    17                -1                       -1                                  -1                                 3
    18                 1                       -1                                  -1                                 3
    19               -1                         1                                  -1                                 3
    20                1                         1                                  -1                                 3
    21               -1                       -1                                    1                                 3
    22                1                       -1                                    1                                 3
    23               -1                        1                                    1                                 3
    24                1                        1                                    1                                 3
    Finally, try to take the advice that annon and I gave you, which is to use a more objective and continuous measure. I am sure that the men and women that are participating in these simulations are rated on performance in some way. You can ask their opinion also, but a rating of “combat capability” or similar would be much better.
    Again Good Luck,
    Dr. Scott
     
     

    0
    #164170

    Dr. Scott
    Participant

    Ron,
    Have a look at my previous post and let me know what you think about the idea.
    Thanks in Advance,
    Dr. Scott

    0
    #164204

    Alvarez
    Participant

    Hi Annon,I appreciated the advice. There is sim data but I’m not sure how valuable it is – but it is worth looking into. The experiment focused on pushing and pulling data to and from the cockpit from a “simulated” info grid to improve combat effectiveness. For classification reasons I can’t speak to details. Thanks again.Regards,
    Andrew

    0
    #164206

    Alvarez
    Participant

    Dr Scott,Please see my response to annon for continuous data. Can you point me to a reference to better understand the experiment you recommended? I can’t thank you enough for all time and advice.Regards,
    Andrew

    0
    #164210

    annon
    Participant

    Andrew,
    I understand….you could tell me, but then you would have to kill me.
    It sounds like you have a basic CTQ drilldown exercise in front of you, starting with the pilots idea of ‘combat effectiveness’ and ending with continuous variables you can actually analyze, such as the magnitude of the deviations from the stated flight parameters while using tech 1 v. tech 2.
    Once you have done that, then you could run a simple full factorial DOE, blocking on the pilot, allowing you a minimum number of runs (4 to 8), a wider inductive base, and a statistically valid assessment, which could then be compared against what amounts to customer satisfaction surveys (ie pilot surveys).  That part would be really interesting. 
    Good luck. Keep me posted, I like the topic.
     

    0
    #164224

    Dr. Scott
    Participant

    Andrew,
    The DOE is a simple full factorial. Here is a simulation with a “combat capability” between 0 and 100.

    Try to take a shot at analyzing it, and then get back to me.
    Regards,
    Dr. Scott

    0
    #164225

    Dr. Scott
    Participant

    Ok Andrew,
    I cannot figure out how to get a table pasted in here. So you will have to email for more help [email protected] .
    Dr. Scott

    0
    #164297

    Alvarez
    Participant

    Hi annon,Appreciate your interest and assistance. Dr Scott and I are going to go through a full factorial DOE – will keep you posted.Regards,
    Andrew

    0
    #164302

    annon
    Participant

    Good deal.  The doc knows what he is talking about.  Best of luck.

    0
    #164304

    Dr. Scott
    Participant

    annon,
    Thanks for what I take as a compliment.
    Dr. Scott

    0
    #164307

    New ATI
    Participant

    Please  send  me  a  copy

    0
    #164311

    annon
    Participant

    Doc,
    Absolutely.  Thanks for your insights.  Keep us posted.

    0
    #164530

    Perryman
    Participant

    Andrew,
    I agree with Annon.  Since you are looking at evaluating the effectiveness of a technology, could you not look at the mission results as your key output variable and determine the impact of the technology on it versus the results achieved without the technology?
    My 2 cents (as a former army requirements officer)

    0
    #164532

    Dr. Scott
    Participant

    annon,
    If you are willing, email at [email protected] and I will send you the design idea I recommended to Andrew. I would love to get your feedback to see if there is something I am missing that he needs to know.
    Thanks,
    Dr. Scott

    0
    #164568

    annon
    Participant

    Sure doc…would love to see it…[email protected]

    0
Viewing 24 posts - 1 through 24 (of 24 total)

The forum ‘General’ is closed to new topics and replies.