iSixSigma

Continuous vs. Discrete Data

Six Sigma – iSixSigma Forums Old Forums General Continuous vs. Discrete Data

Viewing 22 posts - 1 through 22 (of 22 total)
  • Author
    Posts
  • #36131

    Michael Miller
    Participant

    A debate has been ongoing among MBBs here as the the essential difference(s) between discrete and continuous data.  We are certainly aware of popular descriptions such as discrete being countable and indivisible vs. continuous being measurable.  However, can this be applied when the metric is a rate?  Suppose you have a discrete countable activity which can be counted as having occurred so many times within a certain timeframe.  Is a time-based discrete metric such as Xs/hr discrete or continuous and why? 
    Also, discrete data are countable in that we are counting items which have a particular attribute. A reasonable definition of attribute is a constraint whereby objects or individuals can be distinguished. For something to be in the classification of having a particular attribute “X”, is it not true that it must be distinguishable from something that has the attribute “Not X?”
    I look forward to responses; thanks. 
    Mike
     

    0
    #103307

    John H.
    Participant

    Michael
    If c is a category, Rate because it involves the time variable is considered to be the instantaneous(Calculus) rate of change        R=Dc/ Dt in most mathematical models(ex: Poisson Distributed) thereby making it continous(ex:Reliability Engineering, Chemical Kinetics etc..)
    I hope this helps
    John H.

    0
    #103308

    Darth
    Participant

    It is usually easy to distinguish until you get to the ratio that you mention.  While you can sometimes “fake it” and assume continuous, the criteria I use is to look at the underlying characteristic you are actually measuring.  If the numerator and denominator are both continuous then I treat the ratio as continuous.  If the characteristic you are measuring is discrete (counts) I treat the ratio as discrete even though the denominator might be time such as your example.  Then again, from a practical sense, what are you trying to do, what tool are you planning on using and does it really really matter which you assume?  Common sense might dictate.

    0
    #103312

    Sigmordial
    Member

    John H started out nicely, but went awry with his conclusion: the Poisson is a discrete distribution, not a continuous distribution.  Michael’s post (a discrete countable activity which can be counted as having occurred so many times within a certain timeframe) does suggest a Poisson distribution. I should note that there is an asymptotic relationship between the Poisson (discrete) and the Normal (continuous) distributions.
    If the pre-specified number of occurrences is of interest, then Michael may want to consider the Negative Binomial distribution. 

    0
    #103325

    Gabriel
    Participant

    “Discrete” means “not continuous”. And “Continous” means that you always can find a possible value between any two values.
    “Countable in that we are counting items which have a particular attribute” is not a good definition of “discrete”. Even an items count is allways discrete, disctete is not necesarily an items count.
    In fact, the data is ALWAYS DISCRTETE, even when the characteristic you are measuring is continous. That is because there is a difference between the data and the characteristic itself, and that the data is “truncated” due to the lack of infinite resolution of your measuring and data recording systems.
    For example. say that the characteristic is a diameter. It is clearly a continous characteristic since between any two diameters you can always find another possible value for a diameter (Laxman, dont jump in saying that between any two diameters there is only a limmited number of possible diamters since the atoms of material are of finite size).
    The question is: How will you measure the diameter? With a digital caliper to the 0.01mm? Then the data (the record of the measurment) it is discrete, becuase you don’t have any possible value for a diameter betwen two consecutive values of the measuring scale like 10.12mm and 10.13mm.
    Am I splitting hairs? Maybe. If all the data is distributed among say 10.12, 10.13 and 10.14, then the resolution of the data is the same than wat you would get using a go-nogo gage at 10.125-10.135. Would you say that the outcome data of checking with a go-nogo gage can be continous? No.
    On the other extreme, if de data is distributed in a range from 10mm to 15mm then saying that the data is discrete would be nearly like saying that the diameter is is discrete itself due to the finite size of the atoms. In this case, you still have no possible data between 10.12 and 10.13 so, what’s the difference? That you now have the data distributed in 50 classes, not in 3. So the data can be safely taken as continous.
    We’ve been discussing a case where the characteristic itself was continous. What if what I am measuring is intrisicly discrte? For example I am counting occurrences in a time frame. Let’s be more specific and let’s say that they are customer complaints per month.
    It is the same case. The data will be always discrtete. But if it has enough resolution I can take it as continous. If I always have 0, 1 or 2 complaints in a month then I cannot use it as continous (note that the same is true if I always had 999, 1000 or 1001, but it is very unlikely that with such a number of average complaints per month the variation will be so small to keep the number within such a small range). Now, if I always have something between 50 and 200 complaints per month, then the “continuity” will be a very good model.
    One final remark: Note that making the number a ratio will change neither its nature nor the suitability of the continuty approach.
    To say an example, imagine that I’m counting defecives every day, but to be able to make comparisons I divide the number of defective units by the number of units produced. The rate of defectives ranges btween 0.1 to 0.2%, while the production ranges from 1000 to 2000. In this case, it is seen that the number of defects ranges from 1 to 4. So never minds that it is a rate. There are too few possible values to consider approach this as a “continous” data. Now, if the defects ranged from 5 to 10% if productions from 10,000 to 20,000, it is perfectly suitable to consider it continous (500 to 2000 defectives, 1500 possible values). As a rule, the denominator is the one that defines whether the data (wich, again, is allways DISCRETE) can be treated as continous or not. A rule of thumb is 4 or less no, 10 or more yes, in the middle the “continuity” model is marginal (but I would give it a try and see if I find some problems, probably I won’t).
    And to your last question, yes, if you count red and blue then you must classify each item either as red or blue (defective / not defective, conforming / not conforming…). Note however that there is a difference between “classifying them” and “classifying them well”. But this is the scope of Measurement System Analysis. And it hlods true for contnous characteristics too. You can classify parts by its diameter, but doing it and doing it well are two different things.

    0
    #103340

    Michael Miller
    Participant

    Thanks for all the responses; I will digest and see how these hit the mark.
    mpm

    0
    #103347

    Hmnnn…
    Participant

    Is this the same as what drives the different methods/calculations for control charting  of  variable (measured) vs. attribute (counts) data, eg shouldnt attribute p, np, c or u c control charts be used for daily yield/defect rates (counts) rather than variable spc control charts?

    0
    #103387

    John H.
    Participant

    Sigmordial
    Re: Your Comments on The Poisson as a Discrete Distribution
    Wrong! -not always in its application as a Mathematical Model
    If a cumulative distribution function is continous everywhere and possesses a continous derivative (except maybe at a certain number of interval finite points) then the stochastic variable and distribution is continous(Statistical Theory with Engineering Applications-Hald)
    Example: If the Probability that there will be an Equipment Failure in a time interval T is assumed to be KT(K a Constant) and the events independent, then the Probability of no failures within N time intervals translates to
    P(T)= (1-KT)^N = EXP(-K T ) as N approaches Infinity.
    A similiar model would apply with respect to density fluctuations in a gas with the time interval being replaced by a volume interval
    John H.

    0
    #103395

    Sigmordial
    Member

    Hi John,
     
    Michael’s situation is suggesting the number of events in an interval of time. Michael has described this “a discrete countable activity which can be counted as having occurred so many times within a certain timeframe.”  This is a tailor-made scenario for the Poisson distribution, which is a discrete distribution.  I did mention that under certain conditions, the Poisson (as the rate gets “large”) approaches the Normal distribution.
     
    Now, your quote: “If a cumulative distribution function is continuous everywhere and possesses a continuous derivative (except maybe at a certain number of interval finite points) then the stochastic variable and distribution is continuous (Statistical Theory with Engineering Applications-Hald)”
     
    The cdf for the Poisson is not continuous everywhere.  The cdf is P(X <= x) –  we are placing these individual probabilities at discrete masses (x = 0, 1, …).  As a recommendation, be wary of tossing definitive quotes that have qualifiers such as “except maybe…” Plus, maybe Hald was not referring to the Poisson distribution. By no means is this a dig on you – stochastic modeling, though exciting, can get a tad bit challenging.
     One last comment: if Michael was interested in the time between events, then we are definitely dealing with a continuous distribution (Exponential).  Your example is closer to this scenario – reliability.

    0
    #103453

    John H.
    Participant

    Hi Sigmordial
    Re: Your Comments
    The Exponential Function illustrated in my example is a special case of a Poisson distributed process involving Pn(T)=(EXP(-KT) (KT)^n)/n! n=0,1,2… subject to the Probability contraints that Po(0)=1(initial condition)and Pn(0)=0 for n>=1 thus generating Po(T)=EXP(-KT) which has the familiar applications in Reliability Engineering, Physical Chemistry and Nuclear Decay processes. i.e , in this Model, the Probability is assumed to be a function of a time interval of length T. hence the representation by continous curves. I hope that this also clarifies my original Post with regards to rate. As regards Hald’s statement he did not include the Poisson Distribution as an exception to his statement.
    I apologize for the abbreviated responses but I hate typing and usually  am not “long winded” .
    John H.

    0
    #103465

    Sigmordial
    Member

    Hi John H,
    No worries on the abbreviated response.  Looks like we were in agreement.

    0
    #104032

    arihalos
    Participant

    I’d say that in this case, the most appropriate chart to use is a c chart.  And we know that a c chart is an attribute chart.  I can’t imagine how we can use the X mR and other variable control charts in this case.

    0
    #104042

    gg
    Participant

    Reda Don Wheeler’s nooks on SPC for using XmR for rates for info
    GG

    0
    #104058

    Gabriel
    Participant

    Simple: Each rate is an X and the differecne between two consecutive rates is a mR.

    0
    #104061

    Hersey
    Participant

    Just had to destroy a nice complicated purely theoretical thread with a straightforward, practical example,  didn’t you?
     
     

    0
    #104065

    Sinnicks
    Participant

    Don,
    I suspect you were kidding Gabriel, but Gabriel nicely addressed the practical core issue.
    That’s the difference between engineering and science.  Engineering being the application of science to the practical solution of problems.  And what I believe a lot of people, including many SS practitioners don’t get, is that SS is intended as engineered improvement.
    It is fun to debate theoreticals.  Unfortunately we sometimes mystify or turn off people by debating theoreticals when what they need is a practical solution and understanding.
    Gabriel, good explanation.    Mark

    0
    #104071

    Gabriel
    Participant

    Don,
    Sorry for spoiling it. I can’t avoid to jump in when practitioners talk about things such as normality, continuity, stability etc… as if those things actually existed in real life problems. Not that I don’t like theory. In fact, I LOVE theory, but it is important to understand how it applies to real problems. In fact, my post has a lot of theory, only that explained with examples instead of with theorems. The theory is: sometimes you can use a discrete variable as if it was continous and sometimes you cannot use a continous variable as a continous one and you have to use it as if it was discrete.
    As someone said (now I can’t recall who): “No model is right. But some do work”.

    0
    #104080

    Darth
    Participant

    I believe you paraphrased George Box.

    0
    #104082

    Schuette
    Participant

    All models are wrong; some models are useful. – George Box
    In my opinion, the Official Quote for this site – for Six Sigma – should be:
    What we have to learn to do, we learn by doing.
    Aristotle

    0
    #104083

    Markert
    Participant

    Or
    “One has to be extradordinarily lucky, in our society, to meet one nymphomaniac in a lifetime.”
    Alex Comfort in “Darwin and the Naked Lady..”

    0
    #104087

    unlucky
    Member

    Amen.

    0
    #104190

    Jonathon L. Andell
    Participant

    In the strictest sense one should use non-parametric statistics for discrete events. Realistically, however, we can regard the countable events as approximating continuous if  1) there are fairly high counts, like >100 per subgroup, and  2) there are many “shades of gray” among different subgroups, like at least 10-20 individual counts.
    Thus, we readily could approximate continuous data if we counted calls to a phone center ranging anywhere between 100 and 150 calls per day. If the number was more like 30 to 40, per day, the approximation would be more tenuous.
    Also, if your events are quite rare, like work-time accidents: consider tracking person-hours per accident as a quasi-continuous variable.
    Hope this helps.

    0
Viewing 22 posts - 1 through 22 (of 22 total)

The forum ‘General’ is closed to new topics and replies.