iSixSigma

Are percentages continuous data?

Six Sigma – iSixSigma Forums Old Forums General Are percentages continuous data?

Viewing 18 posts - 1 through 18 (of 18 total)
  • Author
    Posts
  • #52475

    David Baker
    Participant

    If we count the number of processed item failures and express them as a percentage of the total items processed, can we consider this to continuous data – even though the underlying data is discreteΒ / count data?

    0
    #184597

    Mikel
    Member

    No you may not.

    0
    #184603

    Indrajit Lahiri
    Participant

    By convention Continuous as it can take all possible values.
    However in some cases, it may not be effective in the m/m of process capability or for Hypothesis testing.
    I agree that it is a easy way to normalization and comparison but I would resist using this metric unless it is the only feasible one.

    0
    #184605

    Mikel
    Member

    So you are saying that a discrete number divided by a discrete
    number makes a continuous number?According to research done by Harry, Wheeler, Box, and Feldman the
    only legitimate way to make discrete data continuous is to take the
    5th root of the number.

    0
    #184609

    Russell
    Member

    Percentages are not continuous however are often considered in that format. Try this to get an indication of what happens when you ‘normalize’ the data,Β  Make a set of normal data – say 100 data points – of defects with opportunities. This would be similiar to the count data that you have – then covert it to a percentage. Now run aΒ 1 sample t-test for a 20% improvement on the percentage data and aΒ 1 proportion test for the same improvement on the count data. This will give you the number of samples required to indicate that a shift in the data is observable. Compare the sample sizes- have fun with it.

    0
    #184617

    Darth
    Participant

    Although my rigorously peer reviewed and seminal work in this area is well known, I do take a more practical approach when working with Clients. While you are correct that the underlying data is still discrete no matter how many decimals or how many roots you take it sometimes makes sense to loosen up a bit. If there is sufficient ordinal data spread across a wide range and distributed somewhat in a symmetrical fashion, I often can demonstrate that it will indeed take on many characteristics of continuous data. Each circumstance is unique so I put forth no absolutes but let’s face it, sometimes we have to ease up in the “real world” if the downside risk is not overly punitive.

    0
    #184618

    Shafi Khalisdar
    Member

    it depends on the type of data the percentages are
    derived from. e.g. 50% male in a classroom is
    attribute data while 50% of students has high fever is
    continuous data.

    0
    #184619

    GB
    Participant

    What?
    That is absolutely wrong.

    0
    #184620

    Darth
    Participant

    A measurement of temperature will be continuous data. If you have categorized students into high fever and not high fever then it is still discrete despite the underlying continuous nature of temperature. If you were using temperature as continuous, you wouldn’t be using percentage but average temperature and s.d. of temperature. The minute you take robust continuous data and change to categorical data as you suggest, you now have discrete data.

    0
    #184622

    BTDT
    Participant

    The reasons percentage data are not considered continuous are numerous:- For each subgroup you have no estimate of the within-group variation- Statistical tests for continuous data require an estimate of the within-group variation- The effect of subgroup size is ignored and can lead to manipulation of the resultsThe consequences of ignoring the subgroup size are manifested in Simpson’s paradox, a situation where statistical based conclusions seem reversed when the data are subdivided or combined. Please read the following Wiki and look at the examples for:- Berkeley sex bias case- Kidney stone treatment- Batting averageshttp://en.wikipedia.org/wiki/Simpson%27s_paradoxSimpson’s Paradox also showed up in the Numb3rs episode “Conspiracy Theory”On a more personal note, we saw percentage data being presented at the GE corporate level that was subdivided in a manner that made all the sales regions look much better than the global picture.Cheers, Alastair

    0
    #184624

    Darth
    Participant

    BTDT,
    Not sure where the issue of subgroup fits in the discussion. I can do an I/MR chart with percentages as suggested by Wheeler without worrying about subgroup size. I also don’t see the relevance of within sample variance if I wish to do a one sample t test to see if my data meets some % spec.

    0
    #184626

    BTDT
    Participant

    Darth:Yes, you can construct an I/MR using a column of data values expressed as percentages. The software will estimate the standard deviation by using the n to n-1 differences between the subgroup means. The result is that the estimate of standard deviation for constructing the control limits is for the n-1 subgroups. This does not include any contribution within the subgroups.Wheeler’s advice works well when constructing a control chart where the number of samples within each subgroup is similar and the number within each subgroup is large enough that the assumption of normality of error distribution between subgroups is not seriously violated. The subgroup size must be much larger than the 30 rule-of-thumb if the data is highly skewed.The following set of data can be put into an I/MR chart by calculating percentages of each subgroup and running the chart. The mean will be 50.21 pc with UCL and LCL of 69.24 pc and 31.19 pc respectively. The conclusion will be the process is under control.A p chart using the defect and subgroup size will correctly show the mean of 54.04 pc with the UCL and LCL of 68.99 and 39.09 pc respectively. It also identifies subgroup 9 as out of control.The relevance to Simpson’s paradox is that when subgroup size is ignored, conclusions can be misleading.Cheers, AlastairDefects Subgroup size48 10051 10044 10055 10042 10052 10055 10046 100600 100051 10053 10052 10045 10049 100

    0
    #184627

    Saherngu
    Participant

    I don’t agree with the NEVER – the key is the basis of the underlying data
    For example – what about a concentration where the result is expressed as a percentage………
    Concentration = weight of ABCDΒ / total weight
    weight is a continuous variable therefore the % is a continuous variable in this case
    Β 

    0
    #184633

    Jim T
    Participant

    Thank you, Alastair! I learned something today!

    0
    #184634

    BTDT
    Participant

    JimT:You are welcome. This is one of my bailiwicks. I have seen percentage data misused so often that I propose to never permit its use in a project.Cheers, Alastair

    0
    #184637

    Jim Bossert
    Participant

    Darth, are youΒ mellowing in your old age???Β  I guess you are coming out of the “Dark Side” into the light.Β 
    Yoda

    0
    #184640

    Darth
    Participant

    Hey Yoda. Hope you are finding peace in your new environment. Certainly not mellowing just keeping a lower profile out of respect for the more tenuous times we live in.

    0
    #184705

    Brar
    Participant

    Most of the black Belts take percentage data as continous and do analysis. It is standard practice and also X mR chart is also used.
    However you should not be doing it as converting anything into percentage results in losing out on proportion. It would be better to use P ChartΒ for percentage. If you need to calculate the capability; use the binomial distribution capability analysis.

    0
Viewing 18 posts - 1 through 18 (of 18 total)

The forum ‘General’ is closed to new topics and replies.