iSixSigma

Dice and Minitab

Six Sigma – iSixSigma Forums Old Forums General Dice and Minitab

Viewing 14 posts - 1 through 14 (of 14 total)
  • Author
    Posts
  • #50056

    Mike Archer
    Participant

    Hello.  I am playing with dice to create random distributions in minitab.  There is something I don’t understand.  If I roll 2 or more dice, I get a nice bell shaped curve (expected).  What is baffeling me is, that the more I roll, the lower the anderson-darling p value becomes.  The data becomes non-normal somewhere between 100 and 200 rolls.  The shape of the distribution is very normal, but the anderson-darling p value just goes to nothing after a while.  What is the statistical explanation for this?
    Thanks,
    Mike

    0
    #171914

    BC
    Participant

    Mike,
    I assume you mean that you roll 2 dice at a time and record the sum.
    You are limited to the values 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12.  These values will be repeated over and over and over, fewer repeats for the values 2 and 12, more repeats for the values 6, 7, 8.  In a histogram, this starts to look like the bell curve.  However, on a normal probability plot, the data are “stacked” because of the repeats.
    This is non-normal data that approximates a normal distribution.  Normal data are continuous and true normality is only theoretical.  With enough tosses of the pair of dice, the AD recognizes the non-normality of dice-tossing.
    I wouldn’t let this discourage me from using dice-tossing as an illustration of normal distributions.  Just tell your disciples that “in the limit, this becomes the normal distribution.”
    Hope this helps.

    0
    #171918

    Ron
    Member

    You are experiencing normal process variaiton. 
    throw another 200 rolls and it may go back again.
     

    0
    #171938

    Mike Archer
    Participant

    Actually, the more data that is collected, the lower the p-value gets.  I found that to be true even for randomly generated numbers with lots of decimal places.  So I guess that more data is not always better for testing normality?
    Mike

    0
    #171939

    Mikel
    Member

    Mike,Go read BC’s answer. It is correct. Think about it.

    0
    #171941

    Glen_A
    Participant

    Does the concept of normality apply to this at all?
    There are a finite number of possible outcomes, some two dice totals are more frequently represented in the possible outcomes.

    Total

    2
    1+1

    3
    1+2
    2+1

    4
    1+3
    3+1
    2+2

    5
    1+4
    4+1
    2+3
    3+2
    Over time the observed results will match the relative frequency of the possible outcomes, which just happens to be symetrical, but is NOT bell shaped.  The more throws, the more obvious the fact that it is not normal.
    Glen_A
     

    0
    #171943

    BC
    Participant

    Glen_A,
    Absolutely right.  With 1 die at a time, you get a uniform distribution.  With 2 at a time, you get a triangular distribution (for the very reason you say).  With 6 at a time (I did it in Excel), it starts to smooth out and look bell-shaped.
    BC

    0
    #171944

    DaveS
    Participant

    Mike,
    How are you doing the simulation in MINITAB?

    0
    #171945

    Mike Archer
    Participant

    Hello.  Thanks to everyone so far for the responses.
    Daves – For the dice, I am creating random integers from 1 to 6.  I create a column for each die, then add the columns together for the total roll.
    What I am only just now realizing though, is that the p-value for normality becomes smaller for all normal data as the sample size increases.  So data that is measured to even 100 thousandth decimal, will eventiall get a p-value less than .05 no matter how normal it is.  So as the sample size increases, the confidence in the mean and SD increases, but the confidence in p-value decreases.  Makes me wonder how much data is too much when testing for normality?  30 works but is 100 too much??
    Mike

    0
    #171946

    Mikel
    Member

    Not true

    0
    #171947

    Iain Hastings
    Participant

    Mike,Your questions at the end of your post would concern me a bit. There is no time when there is “too much data when testing for normality”. The point is that your particular example of tossing two dice does not really represent a normal distribution. Re-read BC’s response – what you have is a histogram with discrete bins (2,3,4…12). You do not have values between these integers (eg 2.25, 2.5, 2.63 or whatever. Furthermore, The AD statistic is more heavily biased towards the tails of the distribution. Approximately 35 of the time you will get a “2” or a “12”, there is nothing outside these limits – quite a difference from a normal distribution. This, of course is less obvious with fewer samples.If you take the time to look at the data on a normal probability plot you will immediately see the issue. With that in mind you may want to reconsider your last question.
    Having said all that – if your goal is to show how the histogram of the die tossing broadly forms the shape of a normal distribution, i wouldn’t see any issue other than the above caveats.

    0
    #171948

    Iain Hastings
    Participant

    The “35 of the time” should read “3% of the time” in the prior post.

    0
    #171949

    DaveS
    Participant

    Mike,
    As others have said, your problem is in the lack of extreme values. You have nothing below 2 or above 12.
    The Anderson darling test that you must be using is very sensitive to departures in the tail. I recreated your method for 300 rolls. The AD test under stat>basic stat>normality test  fails. The RJ and KS tests pass the distribution. These tests are far less sensitive to extrema.
    You can’t have too much data if the test distribution is a theoretical data. You can have too much if the underlying data is better fit by another distribution. Try something like a Weib(2,2). 30 points will pass normality on all. 100 will not on AD test nor KJ, but will on RS. If you graph this distribution as histogram with a bunch of points (10000 or so) you can see that it is not terribly skewed. I suspect it would be fine for t-test, ANOVA, control charting etc. and other techniques which are robust to normality. Probably be a bit off for capability studies.
    Bottom line, don’t be a tool slave to any particular method or assumption. Think it out and get advice from qualified statistical types if unsure.
     
     

    0
    #171954

    DeadHorse Alert
    Participant

    At risk of beating a dead horse ….
    I just ran across another point of view / food for thought (browsing my Analyze material):
    Think of your plot as a probability curve.  There is a higher probability to hit a 7 (6 different combinations) than it is to hit a 4 or 10 (3 different combinations).  The area under the curve represents 100% of the answers you can get.  This is limited to more than or equal to 2 and less than or equal to 12. So it really isn’t a normal bell curve where you have slight possibilities at the extremes.
    Your rolls just conform to and confirm the probabilities.
    MrMHead Rides Again!

    0
Viewing 14 posts - 1 through 14 (of 14 total)

The forum ‘General’ is closed to new topics and replies.