Are percentages continuous data?
Six Sigma – iSixSigma › Forums › Old Forums › General › Are percentages continuous data?
 This topic has 17 replies, 12 voices, and was last updated 12 years, 1 month ago by Brar.

AuthorPosts

July 28, 2009 at 9:20 am #52475
David BakerParticipant@DavidBaker Include @DavidBaker in your post and this person will
be notified via email.If we count the number of processed item failures and express them as a percentage of the total items processed, can we consider this to continuous data – even though the underlying data is discrete / count data?
0July 28, 2009 at 9:25 am #184597No you may not.
0July 28, 2009 at 10:33 am #184603
Indrajit LahiriParticipant@IndrajitLahiri Include @IndrajitLahiri in your post and this person will
be notified via email.By convention Continuous as it can take all possible values.
However in some cases, it may not be effective in the m/m of process capability or for Hypothesis testing.
I agree that it is a easy way to normalization and comparison but I would resist using this metric unless it is the only feasible one.0July 28, 2009 at 10:40 am #184605So you are saying that a discrete number divided by a discrete
number makes a continuous number?According to research done by Harry, Wheeler, Box, and Feldman the
only legitimate way to make discrete data continuous is to take the
5th root of the number.0July 28, 2009 at 11:20 am #184609Percentages are not continuous however are often considered in that format. Try this to get an indication of what happens when you ‘normalize’ the data, Make a set of normal data – say 100 data points – of defects with opportunities. This would be similiar to the count data that you have – then covert it to a percentage. Now run a 1 sample ttest for a 20% improvement on the percentage data and a 1 proportion test for the same improvement on the count data. This will give you the number of samples required to indicate that a shift in the data is observable. Compare the sample sizes have fun with it.
0July 28, 2009 at 8:46 pm #184617Although my rigorously peer reviewed and seminal work in this area is well known, I do take a more practical approach when working with Clients. While you are correct that the underlying data is still discrete no matter how many decimals or how many roots you take it sometimes makes sense to loosen up a bit. If there is sufficient ordinal data spread across a wide range and distributed somewhat in a symmetrical fashion, I often can demonstrate that it will indeed take on many characteristics of continuous data. Each circumstance is unique so I put forth no absolutes but let’s face it, sometimes we have to ease up in the “real world” if the downside risk is not overly punitive.
0July 28, 2009 at 9:54 pm #184618
Shafi KhalisdarMember@ShafiKhalisdar Include @ShafiKhalisdar in your post and this person will
be notified via email.it depends on the type of data the percentages are
derived from. e.g. 50% male in a classroom is
attribute data while 50% of students has high fever is
continuous data.0July 28, 2009 at 9:56 pm #184619What?
That is absolutely wrong.0July 28, 2009 at 10:24 pm #184620A measurement of temperature will be continuous data. If you have categorized students into high fever and not high fever then it is still discrete despite the underlying continuous nature of temperature. If you were using temperature as continuous, you wouldn’t be using percentage but average temperature and s.d. of temperature. The minute you take robust continuous data and change to categorical data as you suggest, you now have discrete data.
0July 29, 2009 at 2:41 pm #184622The reasons percentage data are not considered continuous are numerous: For each subgroup you have no estimate of the withingroup variation Statistical tests for continuous data require an estimate of the withingroup variation The effect of subgroup size is ignored and can lead to manipulation of the resultsThe consequences of ignoring the subgroup size are manifested in Simpson’s paradox, a situation where statistical based conclusions seem reversed when the data are subdivided or combined. Please read the following Wiki and look at the examples for: Berkeley sex bias case Kidney stone treatment Batting averageshttp://en.wikipedia.org/wiki/Simpson%27s_paradoxSimpson’s Paradox also showed up in the Numb3rs episode “Conspiracy Theory”On a more personal note, we saw percentage data being presented at the GE corporate level that was subdivided in a manner that made all the sales regions look much better than the global picture.Cheers, Alastair
0July 29, 2009 at 3:22 pm #184624BTDT,
Not sure where the issue of subgroup fits in the discussion. I can do an I/MR chart with percentages as suggested by Wheeler without worrying about subgroup size. I also don’t see the relevance of within sample variance if I wish to do a one sample t test to see if my data meets some % spec.0July 29, 2009 at 5:20 pm #184626Darth:Yes, you can construct an I/MR using a column of data values expressed as percentages. The software will estimate the standard deviation by using the n to n1 differences between the subgroup means. The result is that the estimate of standard deviation for constructing the control limits is for the n1 subgroups. This does not include any contribution within the subgroups.Wheeler’s advice works well when constructing a control chart where the number of samples within each subgroup is similar and the number within each subgroup is large enough that the assumption of normality of error distribution between subgroups is not seriously violated. The subgroup size must be much larger than the 30 ruleofthumb if the data is highly skewed.The following set of data can be put into an I/MR chart by calculating percentages of each subgroup and running the chart. The mean will be 50.21 pc with UCL and LCL of 69.24 pc and 31.19 pc respectively. The conclusion will be the process is under control.A p chart using the defect and subgroup size will correctly show the mean of 54.04 pc with the UCL and LCL of 68.99 and 39.09 pc respectively. It also identifies subgroup 9 as out of control.The relevance to Simpson’s paradox is that when subgroup size is ignored, conclusions can be misleading.Cheers, AlastairDefects Subgroup size48 10051 10044 10055 10042 10052 10055 10046 100600 100051 10053 10052 10045 10049 100
0July 29, 2009 at 6:09 pm #184627I don’t agree with the NEVER – the key is the basis of the underlying data
For example – what about a concentration where the result is expressed as a percentage………
Concentration = weight of ABCD / total weight
weight is a continuous variable therefore the % is a continuous variable in this case
0July 30, 2009 at 12:50 pm #184633Thank you, Alastair! I learned something today!
0July 30, 2009 at 1:12 pm #184634JimT:You are welcome. This is one of my bailiwicks. I have seen percentage data misused so often that I propose to never permit its use in a project.Cheers, Alastair
0July 30, 2009 at 2:50 pm #184637Darth, are you mellowing in your old age??? I guess you are coming out of the “Dark Side” into the light.
Yoda0July 30, 2009 at 7:52 pm #184640Hey Yoda. Hope you are finding peace in your new environment. Certainly not mellowing just keeping a lower profile out of respect for the more tenuous times we live in.
0August 4, 2009 at 7:32 am #184705Most of the black Belts take percentage data as continous and do analysis. It is standard practice and also X mR chart is also used.
However you should not be doing it as converting anything into percentage results in losing out on proportion. It would be better to use P Chart for percentage. If you need to calculate the capability; use the binomial distribution capability analysis.0 
AuthorPosts
The forum ‘General’ is closed to new topics and replies.