Dumb Question Extravaganza

Six Sigma – iSixSigma Forums Old Forums General Dumb Question Extravaganza

Viewing 3 posts - 1 through 3 (of 3 total)
  • Author
  • #45451


    Thanks to the forum for indulging my ignorance…i got a couple quick ones:

    How do you explain rational subgrouping with the assumption of randomization?  If I want to ensure a data set is representative (includes different shifts, times, days, operators, etc) am I simply identifying the subgroups and then randomly selecting observations across said subgroups until I reach my desired sample size for a given power and variance?  THANKS
    Does the use of proper randomization always induce independence into the data set?  For example, must I randomize a simple 2 sample t-test or can i make 10 runs, make the change, run ten more, test…..
    Lastly….can identical distribution be induced into a data set using a methodology?

    Does the assumption of randomization always induce independence? 


    Erik L

    Wow, none of these questions are quickies.  There is much to cover, foundationally, to really get at the need of rational subgrouping and the part that concept plays in deriving valuable control charts.  I guess the broad brushstrokes would be to first talk about the various sampling methodologies that there are.  Our choices are:  Random, sequential, interval, stratified, and cluster sampling. 
    Ultimately, what we’re trying to do is obtain as reflective a picture of the VOP as is possible with our sampling strategy.  To be most efficient with the sampling we need to understand where, within the data, are the dependent and independent elements.  Why is this important?  It’s important because we’re not adding to our knowledge of the VOP with data that has dependencies in it.  One of the core issues, in determining ‘rational subgrouping’  is to determine what constitutes the independent element in the variance components of the process.  We can gain this insight through a compilation of the following:  looking at the process steps, the product, and/or through stratifying the process data (through Multi-Vari/COV analysis).  Here’s a link to a past post that I gave to introduce some of the key elements of the MV/COV analysis:

    As a result, we have much better guidance for our DCP on the who, what, when, where, and how much to collect.
    Mathematically, random is usually the go-in discussion.  I have a population of values, and I can somehow go in and randomly pull samples from the process, and somehow maintain an equal likelihood that any one unit would be pulled.  Process based scenarios largely blow away the assumptions that make a random choice of sampling viable.  This is an issue, since the typical sample size calculators are giving you a minimum ‘n’ based on this academic basis.  Rut-roh!
    Understanding the variance components can get us to determine where the process is similar and where the largest differences exist.  So, could we then just randomly pull data within an hour (let’s say for simplicity sake we determined that was the rational subgroup).  We potentially could use the idea of randomization at this point, but maintaining time series order is kind of important in control charts.  This random selection of values was a technique that I used in statusing patient wait time within any one hour.
    The next issue is how much data to pull within the rational subgroup.  We could look at Power and Sample size calculations to provide guidance on the size of n.  Another approach would be to use instant in time calculations for the variation of the process and look at the 95% CI of this variance component.  We could look at this as a ratio to the tolerance for the measure and use the %’s as guidance to whether we’re sampling too little or too much.
    2.  Randomization does not necessarily ensure independence, but it is one of the most effective techniques that we have in our hip pocket.  Randomization, replication, and blocking are some of the best resources that we have to ensure that the effects that were observed-actually are what caused the impact to the responses average/variability.  Could you run 10 points, make a change, and then run the next 10?  Sure.  However, there will be questions that challenge whether the effects that were seen/not seem were really do to the change or other lurking/noise variables that were moving in parallel.  Diagnostics are there to look and see if we got bit by noise variables, so we can choose not to randomize.  But, what happens if we see something?  Now, we potentially need to re-run the analysis.  And its difficult enough to get the go-ahead for the first study then to come back with hat in hand and beg for more time, money, and resources. 
    3.  Not sure what you are asking…
    Well, hope this has helped a little.
    Erik L



    Erik L,
    Thanks so much for taking the time…it was very helpful and indicates the need for more quality time (so to speak) with one of these titilating authors….thanks again!

Viewing 3 posts - 1 through 3 (of 3 total)

The forum ‘General’ is closed to new topics and replies.