Dice and Minitab
Six Sigma – iSixSigma › Forums › Old Forums › General › Dice and Minitab
 This topic has 13 replies, 8 voices, and was last updated 14 years, 2 months ago by DeadHorse Alert.

AuthorPosts

May 12, 2008 at 6:31 pm #50056
Mike ArcherParticipant@MikeArcher Include @MikeArcher in your post and this person will
be notified via email.Hello. I am playing with dice to create random distributions in minitab. There is something I don’t understand. If I roll 2 or more dice, I get a nice bell shaped curve (expected). What is baffeling me is, that the more I roll, the lower the andersondarling p value becomes. The data becomes nonnormal somewhere between 100 and 200 rolls. The shape of the distribution is very normal, but the andersondarling p value just goes to nothing after a while. What is the statistical explanation for this?
Thanks,
Mike0May 12, 2008 at 6:56 pm #171914Mike,
I assume you mean that you roll 2 dice at a time and record the sum.
You are limited to the values 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12. These values will be repeated over and over and over, fewer repeats for the values 2 and 12, more repeats for the values 6, 7, 8. In a histogram, this starts to look like the bell curve. However, on a normal probability plot, the data are “stacked” because of the repeats.
This is nonnormal data that approximates a normal distribution. Normal data are continuous and true normality is only theoretical. With enough tosses of the pair of dice, the AD recognizes the nonnormality of dicetossing.
I wouldn’t let this discourage me from using dicetossing as an illustration of normal distributions. Just tell your disciples that “in the limit, this becomes the normal distribution.”
Hope this helps.0May 12, 2008 at 7:22 pm #171918You are experiencing normal process variaiton.
throw another 200 rolls and it may go back again.
0May 13, 2008 at 12:04 pm #171938
Mike ArcherParticipant@MikeArcher Include @MikeArcher in your post and this person will
be notified via email.Actually, the more data that is collected, the lower the pvalue gets. I found that to be true even for randomly generated numbers with lots of decimal places. So I guess that more data is not always better for testing normality?
Mike0May 13, 2008 at 12:09 pm #171939Mike,Go read BC’s answer. It is correct. Think about it.
0May 13, 2008 at 12:44 pm #171941Does the concept of normality apply to this at all?
There are a finite number of possible outcomes, some two dice totals are more frequently represented in the possible outcomes.Total
2
1+13
1+2
2+14
1+3
3+1
2+25
1+4
4+1
2+3
3+2
Over time the observed results will match the relative frequency of the possible outcomes, which just happens to be symetrical, but is NOT bell shaped. The more throws, the more obvious the fact that it is not normal.
Glen_A
0May 13, 2008 at 1:42 pm #171943Glen_A,
Absolutely right. With 1 die at a time, you get a uniform distribution. With 2 at a time, you get a triangular distribution (for the very reason you say). With 6 at a time (I did it in Excel), it starts to smooth out and look bellshaped.
BC0May 13, 2008 at 1:51 pm #171944Mike,
How are you doing the simulation in MINITAB?0May 13, 2008 at 2:00 pm #171945
Mike ArcherParticipant@MikeArcher Include @MikeArcher in your post and this person will
be notified via email.Hello. Thanks to everyone so far for the responses.
Daves – For the dice, I am creating random integers from 1 to 6. I create a column for each die, then add the columns together for the total roll.
What I am only just now realizing though, is that the pvalue for normality becomes smaller for all normal data as the sample size increases. So data that is measured to even 100 thousandth decimal, will eventiall get a pvalue less than .05 no matter how normal it is. So as the sample size increases, the confidence in the mean and SD increases, but the confidence in pvalue decreases. Makes me wonder how much data is too much when testing for normality? 30 works but is 100 too much??
Mike0May 13, 2008 at 2:49 pm #171946Not true
0May 13, 2008 at 2:53 pm #171947
Iain HastingsParticipant@IainHastings Include @IainHastings in your post and this person will
be notified via email.Mike,Your questions at the end of your post would concern me a bit. There is no time when there is “too much data when testing for normality”. The point is that your particular example of tossing two dice does not really represent a normal distribution. Reread BC’s response – what you have is a histogram with discrete bins (2,3,4…12). You do not have values between these integers (eg 2.25, 2.5, 2.63 or whatever. Furthermore, The AD statistic is more heavily biased towards the tails of the distribution. Approximately 35 of the time you will get a “2” or a “12”, there is nothing outside these limits – quite a difference from a normal distribution. This, of course is less obvious with fewer samples.If you take the time to look at the data on a normal probability plot you will immediately see the issue. With that in mind you may want to reconsider your last question.
Having said all that – if your goal is to show how the histogram of the die tossing broadly forms the shape of a normal distribution, i wouldn’t see any issue other than the above caveats.0May 13, 2008 at 2:55 pm #171948
Iain HastingsParticipant@IainHastings Include @IainHastings in your post and this person will
be notified via email.The “35 of the time” should read “3% of the time” in the prior post.
0May 13, 2008 at 3:30 pm #171949Mike,
As others have said, your problem is in the lack of extreme values. You have nothing below 2 or above 12.
The Anderson darling test that you must be using is very sensitive to departures in the tail. I recreated your method for 300 rolls. The AD test under stat>basic stat>normality test fails. The RJ and KS tests pass the distribution. These tests are far less sensitive to extrema.
You can’t have too much data if the test distribution is a theoretical data. You can have too much if the underlying data is better fit by another distribution. Try something like a Weib(2,2). 30 points will pass normality on all. 100 will not on AD test nor KJ, but will on RS. If you graph this distribution as histogram with a bunch of points (10000 or so) you can see that it is not terribly skewed. I suspect it would be fine for ttest, ANOVA, control charting etc. and other techniques which are robust to normality. Probably be a bit off for capability studies.
Bottom line, don’t be a tool slave to any particular method or assumption. Think it out and get advice from qualified statistical types if unsure.
0May 13, 2008 at 8:15 pm #171954
DeadHorse AlertParticipant@DeadHorseAlert Include @DeadHorseAlert in your post and this person will
be notified via email.At risk of beating a dead horse ….
I just ran across another point of view / food for thought (browsing my Analyze material):
Think of your plot as a probability curve. There is a higher probability to hit a 7 (6 different combinations) than it is to hit a 4 or 10 (3 different combinations). The area under the curve represents 100% of the answers you can get. This is limited to more than or equal to 2 and less than or equal to 12. So it really isn’t a normal bell curve where you have slight possibilities at the extremes.
Your rolls just conform to and confirm the probabilities.
MrMHead Rides Again!0 
AuthorPosts
The forum ‘General’ is closed to new topics and replies.