Population vs. Sample
Six Sigma – iSixSigma › Forums › Old Forums › General › Population vs. Sample
 This topic has 9 replies, 9 voices, and was last updated 19 years ago by George Chynoweth.

AuthorPosts

July 28, 2003 at 12:27 pm #32900
This is a stupid question, but I’m going to ask for opinions anyway. I have a sales region with 200 stores. Over a fixed amount of time – let’s say 6 months – a promotion was conducted for ‘participating’ stores. 50 of the 200 stores in this region participated in this promotion. Sales data for this time period were then compared to the same time period a year ago, and expressed as a percentage. The 50 stores that participated in this event posted sales that were 131% of prior year sales; the 150 stores that did not participate were 118% of prior year.
Now, is that significant? Forget any contributing factors, or whether this delta is the result of the promotion – is this difference significant? Since this is a population, not a sample, isn’t any difference significant? My first reaction was to run a 2sample T but then I scratched my head and said, “Why am I running a hypothesis test? I don’t need a confidence interval – this is the population.”
Your comments, and your patience if this is truly stupid, are welcome.
Thanks.0July 28, 2003 at 12:56 pm #88396
Glenn CaudillParticipant@GlennCaudill Include @GlennCaudill in your post and this person will
be notified via email.Jeff,
What is the standard deviation or variance of the two? This is a hypothesis test about the difference between the means.
Ho: U1U2>0
Ha: U1U2<=0
Troy0July 28, 2003 at 1:04 pm #88397
Mike ArcherParticipant@MikeArcher Include @MikeArcher in your post and this person will
be notified via email.Jeff,
I’m not going to pretend to be an expert on the subject, but in my mind, you are always working with a sample in inferential statistics no matter how inclusive it is. If I shot the catapult 100 times and looked at that data, is that the population? Not really – the inferences would represent an assumption that I continue to shoot the catapult without changing the process. I’m only a GB, so you may have just got a stupid answer to your question :0
Mike0July 28, 2003 at 1:53 pm #88399
Mike ArcherParticipant@MikeArcher Include @MikeArcher in your post and this person will
be notified via email.Jeff,
Okay.. maybe I am pretending to be a little bit of an expert. I just wanted to add one extra thought. If you assumed that your data set was a population and not subject to hypothis testing, then you would have to say that the difference between 119.0% and 119.01% is significant. I say proceed with hypothisis testing.
Mike0July 31, 2003 at 6:28 am #88504
David MorenoParticipant@DavidMoreno Include @DavidMoreno in your post and this person will
be notified via email.You need to do an ANOVA test or a 2 sample t test as you did. Although you are delaing with populations, the difference between the sales of the 50 stores and the sales of the other 150 stores could be just due to random factors. Obviously the average sales will not be exactly the same in the two groups even if you did not run a promotion in those 50 stores. This is what you want to test. Is this difference between means really significant?, is this promotion really making the difference?. Same thing if you divide the 150 stores in two groups, one of 100 and another of 50 stores, the average sales will not be the same, however, if you run an ANOVA test you will probably find that the difference between means are no significant.
0July 31, 2003 at 7:17 am #88506My 2 bits:
Presumably, you are trying to gauge the effect of the promotion between participating and nonparticipating stores. In this case last years sales data is of really no consequence (unless similar promo was run last year too).
Therefore suggest following course of action:1) Check the variance of the two samples (50 vs 150 stores) – Either it will be or wont be the same (statistically speaking).
2) Either way, use the requisite option (equal variances / not equal variances) when testing whether difference between means is significant or not (2 sample t).
The gist of the matter is that in any process that varies over time (like sales at a store), whatever data you collect is essentially a sample and therefore needs to be treated thus.
Hope this clarified rather than confused.
Bee0July 31, 2003 at 8:26 am #88509Jeff,
I never think this is a “stupid” question. A lot of discussion on the hypothesis test for those kind of data sets, like %. But it is not easy to clarify who is correct and other is not, when the people strugle in using Hypothesis test or not.
I think you can proceed 2 samples ttest to see if it is “statistically significant” difference. By direct sense, I would like to say this promotion program provided significant positive support in term of the sales revenue. I would more concern the promotion program cost versus the gains (the difference between 131% and 118%, i.e. 13% times your 50 stores revenue), and the potential opportunities for rest 150 stores if you continue to proceed promotion program in rest of 150 stores (this could be estimated as 13% times sales from 150 stores). I will suggest more promotion program.
Well, since we are in a real complex economic enviorment, we couldn’t ignore the marketing issue in some specifc areas (countries or cities).
Jeff, are you in an industry market or consumer market? This could lead different conclusion.
Back to this questions, I would suggest to proceed ttest to see the difference of mean, you should also conduct a Ftest to see the difference of variance.
Hope this could help among the discussion.
Regards,
Mike0July 31, 2003 at 1:01 pm #88516
Dean BottorffParticipant@DeanBottorff Include @DeanBottorff in your post and this person will
be notified via email.Jeff,
Getting the results of the means and variances tests is a good starting point, but this likely will not conclude anything of particular value. The issues of bias, root cause, nonlinear vs linear relationships, mixed effects and marginal economic analysis may be more important to your future promotion planning than statistical significance at 10,000 feet. This is the problem with applying linear methods in behavioral studies, which can tend to be more nonlinear in nature. Were the 50 participants randomly selected, or did they “opt in?” If the latter, biases are quite likely, such as the management were more agressive, or for one reason or another had more propensity to try promotion ideas. Perhaps they were the stronger sales markets and had more disposable income to invest in promotion. Collinearity may also be a problem. In national chain stores sometimes a limitied promotion in one region can accrue benefits to other regions. Perhaps all stores saw a benefit, not just the 50. Also, for any root cause to be concluded there needs to be a validation on the ground that this promotion caused changes in consumer behavior, and that these changes were not the result of accounting variations or timing effects. Finally, consumer influences often result in mixed effects which can be explained better by nonlinear studies. There can be changes in relationships such as laggings, pain first then gain, or gain first then pain, type relationships at work. When all is better understood, the bottom line then is more a matter of economic marginal analysis than statistical significance. The key is to learn something about your customer from this promotion, not merely to conclude its apparent significane.
0July 31, 2003 at 2:58 pm #88523
DANG Dinh CungParticipant@DANGDinhCung Include @DANGDinhCung in your post and this person will
be notified via email.Good morning,
Let
CTO_LY50 be the sum of last year turnover of the 50 stores participating in the promotion,
CTO_LY150, the sum of last year turnover of the 150 stores not participating in the promotion,
CTO_TY50, the sum of this year turnover of the 50 stores participating in the promotion,
CTO_TY150, the sum of this year turnover of the 150 stores not participating in the promotion,
CTO_LY = CTO_LY50 + CTO_LY150, the sum of last year turnover of the 200 stores indistinctly participating in the promotion or not,
CTO_TY = CTO_TY50 + CTO_TY150, the sum of this year turnover of the 200 stores indistinctly participating in the promotion or not,
CTO_50 = CTO_LY50 + CTO_TY50, the sum of this year and last year turnover of the 50 stores participating in the promotion,
CTO_150 = CTO_LY150 + CTO_TY150, the sum of this year and last year turnover of the 150 stores not participating in the promotion
TO = CTO_50 + CTO_150 = CTO_LY + CTO_TY, the sum of this year and last year turnover of the 200 stores indistinctly participating in the promotion or not,
TTO_LY50 = CTO_LY*CTO_50/TO, the theoretical sum of last year turnover of the 50 stores participating in the promotion,
TTO_LY150 = CTO_LY*CTO_150/TO, the theoretical sum of last year turnover of the 150 stores not participating in the promotion,
TTO_TY50 = CTO_TY*CTO_50/TO, the theoretical sum of this year turnover of the 50 stores participating in the promotion,
TTO_TY150 = CTO_TY*CTO_150/TO, the theoretical sum of this year turnover of the 150 stores not participating in the promotion.
If the turnovers of the 50 stores participating in the promotion are identical to those of the 150 stores not participating, we should have
TTO_LY50 = CTO_LY50
TTO_LY150 = CTO_LY150
TTO_TY50 = CTO_TY50
TTO_TY150 = CTO_TY50
As it may not be the case, you need to calculate the Khi2 and read on the Khi2 table with one degree of freedom.
You may calculate by yourself or, please supply me with datum so that I can send you the answer to your question.
Best regards,
DANG Dinh Cung,[email protected]0July 31, 2003 at 4:40 pm #88527
George ChynowethParticipant@georgechynoweth Include @georgechynoweth in your post and this person will
be notified via email.It seems to me that two issues are at hand: 1) was the sales promotion responsible for the increase in revenue, and 2) was the increase significant?
1. If you didn’t randomize, you cannot conclude cause and effect, even if it does exist. Any number of intervening vars (e.g., location, hours, economic situation of customer base, timing, etc.) could be individually or collectively responsible for the difference.
2. Assuming valid sampling, a simple tTest will determine the level of significance. However, if I were an executive, I wouldn’t need to know the level of significance – I can evaluate the difference between 118% and 131% without knowledge of any mindnumbing statistics. Just give me the bottom line. Assume you have a 3 sigma difference: how does that knowledge contribute to making an executive decision? Often, numbers will speak for themselves, and speak very well, without additional analyses.
0 
AuthorPosts
The forum ‘General’ is closed to new topics and replies.