Dumb Question Extravaganza
- December 6, 2006 at 6:27 pm #45451
annonParticipant@annon Include @annon in your post and this person will
be notified via email.
Thanks to the forum for indulging my ignorance…i got a couple quick ones:
How do you explain rational subgrouping with the assumption of randomization? If I want to ensure a data set is representative (includes different shifts, times, days, operators, etc) am I simply identifying the subgroups and then randomly selecting observations across said subgroups until I reach my desired sample size for a given power and variance? THANKS
Does the use of proper randomization always induce independence into the data set? For example, must I randomize a simple 2 sample t-test or can i make 10 runs, make the change, run ten more, test…..
Lastly….can identical distribution be induced into a data set using a methodology?
NO MORE….THANK YOU AGAIN
Does the assumption of randomization always induce independence?0December 6, 2006 at 8:50 pm #148557
Erik LParticipant@Erik-L Include @Erik-L in your post and this person will
be notified via email.
Wow, none of these questions are quickies. There is much to cover, foundationally, to really get at the need of rational subgrouping and the part that concept plays in deriving valuable control charts. I guess the broad brushstrokes would be to first talk about the various sampling methodologies that there are. Our choices are: Random, sequential, interval, stratified, and cluster sampling.
Ultimately, what were trying to do is obtain as reflective a picture of the VOP as is possible with our sampling strategy. To be most efficient with the sampling we need to understand where, within the data, are the dependent and independent elements. Why is this important? Its important because were not adding to our knowledge of the VOP with data that has dependencies in it. One of the core issues, in determining rational subgrouping is to determine what constitutes the independent element in the variance components of the process. We can gain this insight through a compilation of the following: looking at the process steps, the product, and/or through stratifying the process data (through Multi-Vari/COV analysis). Heres a link to a past post that I gave to introduce some of the key elements of the MV/COV analysis:
As a result, we have much better guidance for our DCP on the who, what, when, where, and how much to collect.
Mathematically, random is usually the go-in discussion. I have a population of values, and I can somehow go in and randomly pull samples from the process, and somehow maintain an equal likelihood that any one unit would be pulled. Process based scenarios largely blow away the assumptions that make a random choice of sampling viable. This is an issue, since the typical sample size calculators are giving you a minimum n based on this academic basis. Rut-roh!
Understanding the variance components can get us to determine where the process is similar and where the largest differences exist. So, could we then just randomly pull data within an hour (lets say for simplicity sake we determined that was the rational subgroup). We potentially could use the idea of randomization at this point, but maintaining time series order is kind of important in control charts. This random selection of values was a technique that I used in statusing patient wait time within any one hour.
The next issue is how much data to pull within the rational subgroup. We could look at Power and Sample size calculations to provide guidance on the size of n. Another approach would be to use instant in time calculations for the variation of the process and look at the 95% CI of this variance component. We could look at this as a ratio to the tolerance for the measure and use the %s as guidance to whether were sampling too little or too much.
2. Randomization does not necessarily ensure independence, but it is one of the most effective techniques that we have in our hip pocket. Randomization, replication, and blocking are some of the best resources that we have to ensure that the effects that were observed-actually are what caused the impact to the responses average/variability. Could you run 10 points, make a change, and then run the next 10? Sure. However, there will be questions that challenge whether the effects that were seen/not seem were really do to the change or other lurking/noise variables that were moving in parallel. Diagnostics are there to look and see if we got bit by noise variables, so we can choose not to randomize. But, what happens if we see something? Now, we potentially need to re-run the analysis. And its difficult enough to get the go-ahead for the first study then to come back with hat in hand and beg for more time, money, and resources.
3. Not sure what you are asking
Well, hope this has helped a little.
Erik L0December 7, 2006 at 2:02 am #148575
The forum ‘General’ is closed to new topics and replies.