iSixSigma

Percentage….attribute or variable data

Six Sigma – iSixSigma Forums Old Forums General Percentage….attribute or variable data

Viewing 11 posts - 1 through 11 (of 11 total)
  • Author
    Posts
  • #33387

    Hersey
    Participant

    I am trying to end a debate that started during a training class earlier. Is a percentage described as an Attribute data type or a variable data type?

    0
    #90179

    Doc
    Participant

    Attribute. Though it may seem like there could be any number of decimal places, there are a finite number of possible percentiles based upon the number of integers from 0 to n, where n is the sample size.
     

    0
    #90181

    Gabriel
    Participant

    Don,
    Are you really concerned about whether it is “attribute or variable” or whether it is “coninous or discrete”?
    If it is “attribute or variable”, it is variable (unless the case is something like “low %” and  “high %” are the possible outcomes).
    If it is “discrete or continous”, it is discrete. Why? Because you used the word “data” before “type”. Because the lack of infinite resolution, data, and no matter which data, is allways discrete, even if it is the result of measuring a continous characteristic. Take, for example, a measurement of time using a chronograph of 1sec. Clearly, there will be no available value between two consecutive integer seconds, and then the data of time can NOT be continous, even when “time” IS continous.
    However, if you have enough different possible values you can approach the “discrete” data as if it was “continous” and, for example, you can check if the data is normally distributed (which make no sence if you take the data as discrete, because no discrete distribution can be normal).
    Returning to the example of the time, if you ae mesuring a “time to push the stop button after the alarm starts” that varies from 3 to 5 seconds, clearly you can not take a time measured in seconds as “continous” (you have only 3 possible values: 3, 4 and 5). Now, if you are measuring a “time to deliver pizza”  that goes from 10 to 20 minutes, clearly the time measured in seconds can be taken as if it was continous (you have 601 possible values in that range).
    So, is % continous? No. Can it be taken as if it was continous? It depends. Not if the % is mesured on samples of size 3 (only 0, 33%, 67% and 100% are the possible outcomes). And if the sample size is about 1000? It depends again. Not if the % is allways between 74.9 and 75.1 (only 74.9, 75.0 and 75.1 are possible). Yes if the actual % is between 70% and 80% (101 possible values).
    A rule of thumb? 10 or more possible values in the actual range (+/- 3 sigmas).

    0
    #90183

    Gabriel
    Participant

    The following can calrify the concepts about “type of variables”. It is taken from StatSoft, Inc. (2003). Electronic Statistics Textbook. Tulsa, OK: StatSoft. WEB: http://www.statsoft.com/textbook/stathome.html.
    “Measurement scales. Variables differ in “how well” they can be measured, i.e., in how much measurable information their measurement scale can provide. There is obviously some measurement error involved in every measurement, which determines the “amount of information” that we can obtain. Another factor that determines the amount of information that can be provided by a variable is its “type of measurement scale.” Specifically variables are classified as (a) nominal, (b) ordinal, (c) interval or (d) ratio.

    Nominal variables allow for only qualitative classification. That is, they can be measured only in terms of whether the individual items belong to some distinctively different categories, but we cannot quantify or even rank order those categories. For example, all we can say is that 2 individuals are different in terms of variable A (e.g., they are of different race), but we cannot say which one “has more” of the quality represented by the variable. Typical examples of nominal variables are gender, race, color, city, etc.
    Ordinal variables allow us to rank order the items we measure in terms of which has less and which has more of the quality represented by the variable, but still they do not allow us to say “how much more.” A typical example of an ordinal variable is the socioeconomic status of families. For example, we know that upper-middle is higher than middle but we cannot say that it is, for example, 18% higher. Also this very distinction between nominal, ordinal, and interval scales itself represents a good example of an ordinal variable. For example, we can say that nominal measurement provides less information than ordinal measurement, but we cannot say “how much less” or how this difference compares to the difference between ordinal and interval scales.
    Interval variables allow us not only to rank order the items that are measured, but also to quantify and compare the sizes of differences between them. For example, temperature, as measured in degrees Fahrenheit or Celsius, constitutes an interval scale. We can say that a temperature of 40 degrees is higher than a temperature of 30 degrees, and that an increase from 20 to 40 degrees is twice as much as an increase from 30 to 40 degrees.
    Ratio variables are very similar to interval variables; in addition to all the properties of interval variables, they feature an identifiable absolute zero point, thus they allow for statements such as x is two times more than y. Typical examples of ratio scales are measures of time or space. For example, as the Kelvin temperature scale is a ratio scale, not only can we say that a temperature of 200 degrees is higher than one of 100 degrees, we can correctly state that it is twice as high. Interval scales do not have the ratio property. Most statistical data analysis procedures do not distinguish between the interval and ratio properties of the measurement scales.” (end of citation)
    The type a, Nominal variables, are clearly what we call “attribute”. The types c, Interval variables, and d, Ratio variables, are clearly what we call “variable” (note than even an attribute is a variable since its value can “vary”, i.e. there is more than 1 possible value). The type b, Ordinal variables, falls in a grey zone to me. Sometimes we assign a correlative number to each correlative value in the rank and use it as if it was an Interval variable. For example, in a survey: “Compared with our competitiors, the performance of our product is: amnong the worst, below average, average, above average, among the best” The answer to this question is clearly an ordinal variable, and that nature does not change if we ask the people to put an “X” under columns marked 1, 2, 3, 4 and 5 respectively, then somehow we find ourself calculating the avertage and standard deviation of the answers. To make these calculations we used the numbers 1 to 5 as the value of the variable, as if it was an interval or ratio variable, but in fact we cannot say “4 is better than 3 in the same magnitude that 5 is better than 4”.
    Now, note that these definitions of scales do not mention “continous” or “discrete”.
    The types a, Nominal varibales, and b, Ordinal variables, can only be discrete, since there are no “aditional” categories between any 2 consecutive categories.
    The types c, Interval variables, and d, Ratio variables, can, in theory, be continous or discrete. For example, a time would be Ratio and Continous, while the “number of defects per ft^2” would be Ratio and Discrete.
    However, as I said in the previous post, even if the phisical cahracteristic being measuered is continous (sich as the time), the data, which is the outcome of the mesurement, is allways discrete because of lack of infinite resolution (both as the mesurement result itself and when “writing down” that result, for example on a spreadsheet). Yet, if there are enough distict values in the population, it can be approached as if it was a continous variable.

    0
    #90192

    Hersey
    Participant

    H i Doc & Gabriel, thanks for the feedback. It has helped alot.

    0
    #90198

    Doc
    Participant

    Here is a little description of the relationship between the binomial distribution (attribute or discrete), poisson distribution (attribute or discrete), and the normal distribution (variable or continuous), that I wrote many years ago. From a historical perspective, the normal distribution was actually originally created as an approximation of the binomial distribution.
    A quick lesson in the relationship between the Binomial, Poisson, and Normal Distributions:
    Data that involve the number of successes (x) out of some number of trials (n), where the probability of a success, p=x/n, is fixed, is said to follow a Binomial{n,p} distribution. Note that this is a discrete distribution, since x consists of integers falling between 0 and n, inclusive. All proportions (and percentages) follow a binomial distribution. As p becomes small and n becomes large (p100), the binomial distribution tends to have a nearly continuous, but skewed shape, quite similar to the shape of the Poisson{l=p} distribution. As p continues to get smaller and n continues to get larger (np> 15), the binomial distribution appears even more continuous and becomes more symmetric about the mean, approaching the shape of the Normal{m=p, s2=2p(1-p)/n} distribution.
     
    What Gabriel is saying, is that as np becomes large, the proportion can be treated as a continuous normal variable.

    0
    #90204

    Gabriel
    Participant

    What I am saying is that:
    1) Any real world set of data must belong to a discrete distribution, unlike the normal distribution which is continous.
    2) If there exist enough distinct possible values for the individuals of a discrete distribution, it can be treated as if it was continous.
    This is general and applies to any kind of data. The Normal approach to the Binomial distribution when np is large is just an example of that.

    0
    #90944

    Andrew Brody
    Participant

    Please forgive my apparent bluntness, but don’t you have something important to do?  Thousands of projects have been completed without knowing the answer to this question.  Take your view and run with it.  The string here indicates varied viewpoints. Who is right is irrelevant, the important thing is that you save money for your company, thats what they pay you for.
    Again, no offense intended.  I am an extreme pragmatist.
    Andrew Brody

    0
    #90950

    Gabriel
    Participant

    Please forgive my apparent bluntness too, but:
    1) You must understand what you are doing. “Just do it” is a nice slogan for Nike, but not for a belt. Otherwise you may “think” that you are saving a lot of money for your company, when in fact you are not. I have seen “indicators” going up when “the thing” was going down.
    2) This is a discussion forum. You can ask whatever you want, as long as it is related to the subject of the forum (which is pretty wide in this case). Don just wanted to know if % was discrete or not. Do you have a problem with that?
    3) Don didn’t say that he was waiting this answer to take a decision that would save money to the company. He said he wanted to end a debate initiated in a training course. A student asks “Is % dicrette?” As a trainer, I would answer “Don’t you have something important to do?”. And as a student I would not accept a “Who cares!?” as an answer.
    Knowledge is power.
    Gabriel.

    0
    #90953

    Andrew Brody
    Participant

    Gabriel’
    Point well made and taken.
    Andrew Brody

    0
    #90968

    Tab
    Member

    I thought the student’s question was thoughtful as were the preceding messages.  I had a couple of thoughts that I hope frame the student’s question properly.
    When conducting Six Sigma training, I usually get similar questions relative to the selection of control charts (or less frequently when developing hypothesis tests) as this information figures in the decision tree for control chart selection.  When looking at the classification of a percentage measurement as attribute or variable data for control chart purposes, I think it important to communicate to the student that this factor (type of data) considers whether the distribution underlying the control chart selection is appropriate which I assume, in this example, to be continuous (e.g., I-MR) vs. binomial (i.e., p or np control chart selection).  In this case, my communicated concern is that the control chart not improperly signal special cause variation (or miss instances thereof) and result in overadjustment (or doing nothing).
    As noted by earlier respondents, as a binomial data sample size increases (and p approaches 50%), binomial distributions become more symmetric (looks increasingly like a continuous distribution).  This mitigates those situations when students use the I-MR chart.
    I hope this helps.

    0
Viewing 11 posts - 1 through 11 (of 11 total)

The forum ‘General’ is closed to new topics and replies.