Steel Scrap Process Capability Calculations
- August 10, 2018 at 12:18 pm #56067
I’m new to six sigma, a basic question about data type. Does continuous data after dividing by continuous or descrete data becomes descrete and how stat is performed?
Ex. In steel plant, monthly scrap is 2.2,3.1, 5.6 kg, etc. for respective steel production of 155.5, 170.3, 200.7kg, etc. the meaningful metrics is to use ratio of scrap/production because they are dependent e.g. 2.2/155.5 = 0.0141 or 1.141%. In this case weight is continuous data but ratio becomes proportion, its confusing to perform variable data statistics.
Please advise,August 10, 2018 at 12:38 pm #202916
@anavesh Ration data can be considered either Continuous or Discrete. It depends on how you use it.August 12, 2018 at 11:44 am #202923
Thanks Mr Mike,
I wanna use x bar & R charts, to estimate CI, t/z tests for hypothesis… is it ok to go ahead.
Please advise,August 13, 2018 at 3:18 am #202924
I think your metric is a great metric. The data you presented is continuous. I think of discrete numbers as binary “yes or no” information. Even then depending on what you do to discrete data, the output variables could be either continuous or discrete. In a sense, spc is taking continuous data to apply a yes or no (discrete) outcome to the data. That is, is the data in control at point x(i)?
One thing I’d be careful about though is your data distribution probably won’t be normally distributed. So t or z statistics can be very limited in describing your dataset. Log transforms can be very beneficial to make normality assumptions for t or z distributions hold.August 13, 2018 at 4:05 am #202926
What are you trying to achieve?
You can use XmR using proportions. Do not use hypothesis tests.
Dr Wheeler’s paper here will help you: https://www.qualitydigest.com/inside/quality-insider-article/what-about-p-charts.htmlAugust 13, 2018 at 4:43 am #202927
Alejandro Estrada Flores
Some thing that is useful for me to decide betwen Discrete or Continuous:
# Defects/ continuous= discrete
Continuous (as Weight)/continuous= ContinuousAugust 13, 2018 at 8:43 am #202928
You can treat your data as continuous.
“Variable are classified as continuous or discrete, according to the number of values they can take. Actual measurements of all variables occurs in a discrete manner, due to precision limitations in measuring instruments. The continuous-discrete classification, in practice, distinguishes between variables that take lots of values and variables that take few values. For instance, statisticians often treat discrete interval variables having a large number of values (such as test scores) as continuous, using them in methods for continuous responses.”
– From Categorical Data Analysis 2nd Edition – Agresti pp.3
As for running t-tests – they are robust with respect to non-normality. If you are worried about employing a t-test – run the analysis two ways – Wilcoxon-Mann-Whitney and t-test and see what you get. The chances are very good that you will get the same thing – not the exact same p-values but the same indication with respect to significance.August 13, 2018 at 2:17 pm #202931
You claim “Log transforms can be very beneficial”. You should never normalize data. Control charts work for any distribution. Please read “Normality and the Process Behavior Chart” – Dr Wheeler.
This may also help you https://www.qualitydigest.com/inside/six-sigma-article/predictable-061318.htmlAugust 14, 2018 at 11:55 am #202933
If it is continuous data, why ? Can’t use X bar R chart, why can’t go for Cp & Cpk.
Why some experts are suggesting to go for proportions stat?
In short, Why can’t perform stat based on continuous data?
Please clarify,August 14, 2018 at 12:15 pm #202934
Data should absolutely be made normal if you are using statistical tests that rely on the data being normal. That’s a basic tenet of gaussian statistics. Log transforms are usually appropriate to deal with skewness.August 14, 2018 at 3:51 pm #202935
As I stated before – the data can be treated as continuous and you can run an analysis on the data using the methods of continuous data analysis.
The basic calculation for Cpk DOES require data normality which is why there are equivalent Cpk calculations for non-normal, attribute, and other types of data. With your data you will need to look into Cpk calculations for non-normal data – Chapter 8 in Measuring Process Capability by Bothe has the details.
When testing for mean differences t-tests and ANOVA are robust with respect to non-normality and can be used when the data is extremely non-normal – a good discussion of this issue can be found on pages 51-54 of The Design and Analysis of Industrial Experiments 2nd Edition – Owen Davies.
When it comes to testing variance differences the Bartlett’s test is sensitive to non-normality. The usual procedure is to use Levene’s test instead.
If the VARIANCES are HETEROGENEOUS the t-test has adjustments to allow for this as well. Indeed most of the canned t-test routines in the better statistics packages run an automatic test for this issue and make the adjustments without bothering the investigator with the details.
Too much HETEROGENEITY in the population variances can cause problems with ANOVA in that if the heterogeneity is too extreme ANOVA will declare a lack of significance in mean differences between populations when one exists. When in doubt over this issue one employs Welch’s test for an examination of mean differences in ANOVA.
Actually, when in doubt about variance heterogeneity you should do a couple of things, histogram your data by group, compute the respective population variances, run ANOVA using the usual test and Welch’s test and see what you see. If you do this enough you will gain a good visual understanding of just how much heterogeneity is probably going to cause problems. This, in turn, will give you confidence in the results of your calculations.
As for control charts – data normality is not an issue. A good discussion of this can be found in Understanding Statistical Process Control 2nd Edition Wheeler and Chambers in Chapter 4 starting on page 76 under the subsection titled “Myths About Shewhart’s Charts.”
While not specifically mentioned in your initial post – data normality is also not an issue when it comes to regression. There are no restrictions on the distributions of the X’s or the Y’s.
Residuals need to be approximately normal because the tests for regression term significance are based on the t and F tests. But, as noted above – there is quite a bit of latitude with respect to normality approximation. For particulars you should read pages 8-24 of Applied Regression Analysis 2nd Edition by Draper and Smith and Chapter 3 of the same book “The Examination of the Residuals.” For an excellent understanding of the various facets of regression I would also recommend reading Regression Analysis by Example by Chatterjee and Price.
I would recommend you borrow the books I have listed (the inter-library loan system is your friend) and read the sections I’ve referenced.August 14, 2018 at 8:46 pm #202937
@anvesh The fundamental difference between discrete and continuous data is that discrete has finite values (Yes/No, A,B,C) etc. while the variation between continuous data points is infinite. The more decimal places you can measure, the more different values a data point can have. Consider a histogram, which lumps continuous data together into discrete “buckets”. The output is a function of the inputs. Is the output specification (what the end or internal customer needs) measured as discrete? Yes, because there are upper and lower specification limits, which makes continuous data discrete. So in my opinion all data are ultimately discrete. We cannot measure infinitely small differences. In the end it’s within specifications or it isn’t
You must be logged in to reply to this topic.