# Dice and Minitab

Six Sigma – iSixSigma Forums Old Forums General Dice and Minitab

Viewing 14 posts - 1 through 14 (of 14 total)
• Author
Posts
• #50056

Mike Archer
Participant

Hello.  I am playing with dice to create random distributions in minitab.  There is something I don’t understand.  If I roll 2 or more dice, I get a nice bell shaped curve (expected).  What is baffeling me is, that the more I roll, the lower the anderson-darling p value becomes.  The data becomes non-normal somewhere between 100 and 200 rolls.  The shape of the distribution is very normal, but the anderson-darling p value just goes to nothing after a while.  What is the statistical explanation for this?
Thanks,
Mike

0
#171914

BC
Participant

Mike,
I assume you mean that you roll 2 dice at a time and record the sum.
You are limited to the values 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12.  These values will be repeated over and over and over, fewer repeats for the values 2 and 12, more repeats for the values 6, 7, 8.  In a histogram, this starts to look like the bell curve.  However, on a normal probability plot, the data are “stacked” because of the repeats.
This is non-normal data that approximates a normal distribution.  Normal data are continuous and true normality is only theoretical.  With enough tosses of the pair of dice, the AD recognizes the non-normality of dice-tossing.
I wouldn’t let this discourage me from using dice-tossing as an illustration of normal distributions.  Just tell your disciples that “in the limit, this becomes the normal distribution.”
Hope this helps.

0
#171918

Ron
Member

You are experiencing normal process variaiton.
throw another 200 rolls and it may go back again.

0
#171938

Mike Archer
Participant

Actually, the more data that is collected, the lower the p-value gets.  I found that to be true even for randomly generated numbers with lots of decimal places.  So I guess that more data is not always better for testing normality?
Mike

0
#171939

Mikel
Member

Mike,Go read BC’s answer. It is correct. Think about it.

0
#171941

Glen_A
Participant

Does the concept of normality apply to this at all?
There are a finite number of possible outcomes, some two dice totals are more frequently represented in the possible outcomes.

Total

2
1+1

3
1+2
2+1

4
1+3
3+1
2+2

5
1+4
4+1
2+3
3+2
Over time the observed results will match the relative frequency of the possible outcomes, which just happens to be symetrical, but is NOT bell shaped.  The more throws, the more obvious the fact that it is not normal.
Glen_A

0
#171943

BC
Participant

Glen_A,
Absolutely right.  With 1 die at a time, you get a uniform distribution.  With 2 at a time, you get a triangular distribution (for the very reason you say).  With 6 at a time (I did it in Excel), it starts to smooth out and look bell-shaped.
BC

0
#171944

DaveS
Participant

Mike,
How are you doing the simulation in MINITAB?

0
#171945

Mike Archer
Participant

Hello.  Thanks to everyone so far for the responses.
Daves – For the dice, I am creating random integers from 1 to 6.  I create a column for each die, then add the columns together for the total roll.
What I am only just now realizing though, is that the p-value for normality becomes smaller for all normal data as the sample size increases.  So data that is measured to even 100 thousandth decimal, will eventiall get a p-value less than .05 no matter how normal it is.  So as the sample size increases, the confidence in the mean and SD increases, but the confidence in p-value decreases.  Makes me wonder how much data is too much when testing for normality?  30 works but is 100 too much??
Mike

0
#171946

Mikel
Member

Not true

0
#171947

Iain Hastings
Participant

Mike,Your questions at the end of your post would concern me a bit. There is no time when there is “too much data when testing for normality”. The point is that your particular example of tossing two dice does not really represent a normal distribution. Re-read BC’s response – what you have is a histogram with discrete bins (2,3,4…12). You do not have values between these integers (eg 2.25, 2.5, 2.63 or whatever. Furthermore, The AD statistic is more heavily biased towards the tails of the distribution. Approximately 35 of the time you will get a “2” or a “12”, there is nothing outside these limits – quite a difference from a normal distribution. This, of course is less obvious with fewer samples.If you take the time to look at the data on a normal probability plot you will immediately see the issue. With that in mind you may want to reconsider your last question.
Having said all that – if your goal is to show how the histogram of the die tossing broadly forms the shape of a normal distribution, i wouldn’t see any issue other than the above caveats.

0
#171948

Iain Hastings
Participant

The “35 of the time” should read “3% of the time” in the prior post.

0
#171949

DaveS
Participant

Mike,
As others have said, your problem is in the lack of extreme values. You have nothing below 2 or above 12.
The Anderson darling test that you must be using is very sensitive to departures in the tail. I recreated your method for 300 rolls. The AD test under stat>basic stat>normality test  fails. The RJ and KS tests pass the distribution. These tests are far less sensitive to extrema.
You can’t have too much data if the test distribution is a theoretical data. You can have too much if the underlying data is better fit by another distribution. Try something like a Weib(2,2). 30 points will pass normality on all. 100 will not on AD test nor KJ, but will on RS. If you graph this distribution as histogram with a bunch of points (10000 or so) you can see that it is not terribly skewed. I suspect it would be fine for t-test, ANOVA, control charting etc. and other techniques which are robust to normality. Probably be a bit off for capability studies.
Bottom line, don’t be a tool slave to any particular method or assumption. Think it out and get advice from qualified statistical types if unsure.

0
#171954