# Are percentages continuous data?

Six Sigma – iSixSigma Forums Old Forums General Are percentages continuous data?

Viewing 18 posts - 1 through 18 (of 18 total)
• Author
Posts
• #52475

David Baker
Participant

If we count the number of processed item failures and express them as a percentage of the total items processed, can we consider this to continuous data – even though the underlying data is discrete / count data?

0
#184597

Mikel
Member

No you may not.

0
#184603

Indrajit Lahiri
Participant

By convention Continuous as it can take all possible values.
However in some cases, it may not be effective in the m/m of process capability or for Hypothesis testing.
I agree that it is a easy way to normalization and comparison but I would resist using this metric unless it is the only feasible one.

0
#184605

Mikel
Member

So you are saying that a discrete number divided by a discrete
number makes a continuous number?According to research done by Harry, Wheeler, Box, and Feldman the
only legitimate way to make discrete data continuous is to take the
5th root of the number.

0
#184609

Russell
Member

Percentages are not continuous however are often considered in that format. Try this to get an indication of what happens when you ‘normalize’ the data,  Make a set of normal data – say 100 data points – of defects with opportunities. This would be similiar to the count data that you have – then covert it to a percentage. Now run a 1 sample t-test for a 20% improvement on the percentage data and a 1 proportion test for the same improvement on the count data. This will give you the number of samples required to indicate that a shift in the data is observable. Compare the sample sizes- have fun with it.

0
#184617

Ken Feldman
Participant

Although my rigorously peer reviewed and seminal work in this area is well known, I do take a more practical approach when working with Clients. While you are correct that the underlying data is still discrete no matter how many decimals or how many roots you take it sometimes makes sense to loosen up a bit. If there is sufficient ordinal data spread across a wide range and distributed somewhat in a symmetrical fashion, I often can demonstrate that it will indeed take on many characteristics of continuous data. Each circumstance is unique so I put forth no absolutes but let’s face it, sometimes we have to ease up in the “real world” if the downside risk is not overly punitive.

0
#184618

Shafi Khalisdar
Member

it depends on the type of data the percentages are
derived from. e.g. 50% male in a classroom is
attribute data while 50% of students has high fever is
continuous data.

0
#184619

GB
Participant

What?
That is absolutely wrong.

0
#184620

Ken Feldman
Participant

A measurement of temperature will be continuous data. If you have categorized students into high fever and not high fever then it is still discrete despite the underlying continuous nature of temperature. If you were using temperature as continuous, you wouldn’t be using percentage but average temperature and s.d. of temperature. The minute you take robust continuous data and change to categorical data as you suggest, you now have discrete data.

0
#184622

BTDT
Participant

The reasons percentage data are not considered continuous are numerous:- For each subgroup you have no estimate of the within-group variation- Statistical tests for continuous data require an estimate of the within-group variation- The effect of subgroup size is ignored and can lead to manipulation of the resultsThe consequences of ignoring the subgroup size are manifested in Simpson’s paradox, a situation where statistical based conclusions seem reversed when the data are subdivided or combined. Please read the following Wiki and look at the examples for:- Berkeley sex bias case- Kidney stone treatment- Batting averageshttp://en.wikipedia.org/wiki/Simpson%27s_paradoxSimpson’s Paradox also showed up in the Numb3rs episode “Conspiracy Theory”On a more personal note, we saw percentage data being presented at the GE corporate level that was subdivided in a manner that made all the sales regions look much better than the global picture.Cheers, Alastair

0
#184624

Ken Feldman
Participant

BTDT,
Not sure where the issue of subgroup fits in the discussion. I can do an I/MR chart with percentages as suggested by Wheeler without worrying about subgroup size. I also don’t see the relevance of within sample variance if I wish to do a one sample t test to see if my data meets some % spec.

0
#184626

BTDT
Participant

Darth:Yes, you can construct an I/MR using a column of data values expressed as percentages. The software will estimate the standard deviation by using the n to n-1 differences between the subgroup means. The result is that the estimate of standard deviation for constructing the control limits is for the n-1 subgroups. This does not include any contribution within the subgroups.Wheeler’s advice works well when constructing a control chart where the number of samples within each subgroup is similar and the number within each subgroup is large enough that the assumption of normality of error distribution between subgroups is not seriously violated. The subgroup size must be much larger than the 30 rule-of-thumb if the data is highly skewed.The following set of data can be put into an I/MR chart by calculating percentages of each subgroup and running the chart. The mean will be 50.21 pc with UCL and LCL of 69.24 pc and 31.19 pc respectively. The conclusion will be the process is under control.A p chart using the defect and subgroup size will correctly show the mean of 54.04 pc with the UCL and LCL of 68.99 and 39.09 pc respectively. It also identifies subgroup 9 as out of control.The relevance to Simpson’s paradox is that when subgroup size is ignored, conclusions can be misleading.Cheers, AlastairDefects Subgroup size48 10051 10044 10055 10042 10052 10055 10046 100600 100051 10053 10052 10045 10049 100

0
#184627

Saherngu
Participant

I don’t agree with the NEVER – the key is the basis of the underlying data
For example – what about a concentration where the result is expressed as a percentage………
Concentration = weight of ABCD / total weight
weight is a continuous variable therefore the % is a continuous variable in this case

0
#184633

Jim T
Participant

Thank you, Alastair! I learned something today!

0
#184634

BTDT
Participant

JimT:You are welcome. This is one of my bailiwicks. I have seen percentage data misused so often that I propose to never permit its use in a project.Cheers, Alastair

0
#184637

Jim Bossert
Participant

Darth, are you mellowing in your old age???  I guess you are coming out of the “Dark Side” into the light.
Yoda

0
#184640

Ken Feldman
Participant

Hey Yoda. Hope you are finding peace in your new environment. Certainly not mellowing just keeping a lower profile out of respect for the more tenuous times we live in.

0
#184705

Brar
Participant

Most of the black Belts take percentage data as continous and do analysis. It is standard practice and also X mR chart is also used.
However you should not be doing it as converting anything into percentage results in losing out on proportion. It would be better to use P Chart for percentage. If you need to calculate the capability; use the binomial distribution capability analysis.

0
Viewing 18 posts - 1 through 18 (of 18 total)

The forum ‘General’ is closed to new topics and replies.