What Is Distribution?
This topic contains 5 replies, has 4 voices, and was last updated by Rip Stauffer 11 months, 2 weeks ago.
- December 21, 2017 at 2:57 pm #55902
Anyone want to take a shot at explaining this in simplified terms?
I would appreciate it.December 22, 2017 at 1:28 pm #202082
@andycroniser – why don’t you provide your explanation and we can give you feedback?December 26, 2017 at 6:42 am #202087
Try a histogram or stat>basic stats>graphical summary in Minitab.
@MBBinWI please tell me you’re staying warm and safe–brrr it’s kinda cold up there! GO PACK GO!December 26, 2017 at 6:45 am #202088
A distribution is a description of the amount of variation and the kind of variation.December 26, 2017 at 10:04 am #202091
It’s an arrangement of data that reflects the frequency of the values under study. Think of a forest – the heights of the trees would be one type of distribution. Some few trees will be very small – 1-3 feet – most, probably the greatest number, will be average height – 15 – 30 feet, and a very few will be greater than 30 feet. That range and frequency of the individual height variables – usually in the form of a histogram – will be the distribution.December 31, 2017 at 10:16 am #202099
We usually talk about a distribution in two different senses:
1. Empirical…how does some set of collected data pile up, with a number scale that includes all the values as the x-axis and frequency (number of observations) as the y-axis? This pile of data can be called the “distribution” of observations. It is common to depict the empirical distribution in a histogram (for continuous data), or a discrete plot in the case of discrete values. We use statistics to characterise estimates of spread, center and other important parameters of these empirical piles of data (xbar, for instance, is the statistic we use to estimate mu; S for the estimate of sigma).
2. Theoretical distributions: Models we use for inferential statistics. These models are based on mathematical operations performed using parameters for shape, center and spread (mu, for example, or sigma). If the empirical pile appears similar to the theoretical distribution, we often assume that the data from the phenomenon of interest (population or process) will be distributed similarly to the theoretical model, so we extrapolate from the empirical to the theoretical.
A couple of important notes:
1. Many commonly-used distribution models’ curves are asymptotic; that is, the tails of the distribution approach zero, but never actually touch zero. In a histogram, the tails do touch, and do not stretch out to infinity in one or the other (or both) directions.
2. One important notion that’s often neglected in these discussions is that a distribution implies predictability or homogeneity…that the observations all come from the same universe (population or process cause system).
3. From Don Wheeler: “a probability model is, at best, a limiting characteristic of an infinite sequence of data. Therefore, it cannot be a property of any finite portion of that sequence.” What this means, essentially, is that data are never “normally distributed.” The best we can say is that our observations pile up in a way that is not inconsistent with the model we are going to use to characterise them.
You must be logged in to reply to this topic.