- New JobNavicent HealthBlack Belt

Home › Forums › General Forums › Tools & Templates › Steel Scrap Process Capability Calculations

Tagged: data type, Process Capability, sigma level

This topic contains 11 replies, has 6 voices, and was last updated by Strayer 2 months, 1 week ago.

Viewing 12 posts - 1 through 12 (of 12 total)

- AuthorPosts
Hi,

I’m new to six sigma, a basic question about data type. Does continuous data after dividing by continuous or descrete data becomes descrete and how stat is performed?Ex. In steel plant, monthly scrap is 2.2,3.1, 5.6 kg, etc. for respective steel production of 155.5, 170.3, 200.7kg, etc. the meaningful metrics is to use ratio of scrap/production because they are dependent e.g. 2.2/155.5 = 0.0141 or 1.141%. In this case weight is continuous data but ratio becomes proportion, its confusing to perform variable data statistics.

Please advise,

@anavesh Ration data can be considered either Continuous or Discrete. It depends on how you use it.

Thanks Mr Mike,

I wanna use x bar & R charts, to estimate CI, t/z tests for hypothesis… is it ok to go ahead.Please advise,

Anvesh,

I think your metric is a great metric. The data you presented is continuous. I think of discrete numbers as binary “yes or no” information. Even then depending on what you do to discrete data, the output variables could be either continuous or discrete. In a sense, spc is taking continuous data to apply a yes or no (discrete) outcome to the data. That is, is the data in control at point x(i)?

One thing I’d be careful about though is your data distribution probably won’t be normally distributed. So t or z statistics can be very limited in describing your dataset. Log transforms can be very beneficial to make normality assumptions for t or z distributions hold.

Anvesh,

What are you trying to achieve?

You can use XmR using proportions. Do

**not**use hypothesis tests.Dr Wheeler’s paper here will help you: https://www.qualitydigest.com/inside/quality-insider-article/what-about-p-charts.html

Some thing that is useful for me to decide betwen Discrete or Continuous:

# Defects/ continuous= discrete

Continuous (as Weight)/continuous= ContinuousYou can treat your data as continuous.

“Variable are classified as continuous or discrete, according to the number of values they can take. Actual measurements of all variables occurs in a discrete manner, due to precision limitations in measuring instruments. The continuous-discrete classification, in practice, distinguishes between variables that take lots of values and variables that take few values. For instance, statisticians often treat discrete interval variables having a large number of values (such as test scores) as continuous, using them in methods for continuous responses.”

– From Categorical Data Analysis 2nd Edition – Agresti pp.3As for running t-tests – they are robust with respect to non-normality. If you are worried about employing a t-test – run the analysis two ways – Wilcoxon-Mann-Whitney and t-test and see what you get. The chances are very good that you will get the same thing – not the exact same p-values but the same indication with respect to significance.

You claim “Log transforms can be very beneficial”. You should

**never**normalize data. Control charts work for**any**distribution. Please read “Normality and the Process Behavior Chart” – Dr Wheeler.This may also help you https://www.qualitydigest.com/inside/six-sigma-article/predictable-061318.html

Hi,

If it is continuous data, why ? Can’t use X bar R chart, why can’t go for Cp & Cpk.

Why some experts are suggesting to go for proportions stat?

In short, Why can’t perform stat based on continuous data?

Please clarify,

Data should absolutely be made normal if you are using statistical tests that rely on the data being normal. That’s a basic tenet of gaussian statistics. Log transforms are usually appropriate to deal with skewness.

As I stated before – the data can be treated as continuous and you can run an analysis on the data using the methods of continuous data analysis.

The basic calculation for Cpk DOES require data normality which is why there are equivalent Cpk calculations for non-normal, attribute, and other types of data. With your data you will need to look into Cpk calculations for non-normal data – Chapter 8 in Measuring Process Capability by Bothe has the details.

When testing for mean differences t-tests and ANOVA are robust with respect to non-normality and can be used when the data is extremely non-normal – a good discussion of this issue can be found on pages 51-54 of The Design and Analysis of Industrial Experiments 2nd Edition – Owen Davies.

Variance issues

When it comes to testing variance differences the Bartlett’s test is sensitive to non-normality. The usual procedure is to use Levene’s test instead.

If the VARIANCES are HETEROGENEOUS the t-test has adjustments to allow for this as well. Indeed most of the canned t-test routines in the better statistics packages run an automatic test for this issue and make the adjustments without bothering the investigator with the details.

Too much HETEROGENEITY in the population variances can cause problems with ANOVA in that if the heterogeneity is too extreme ANOVA will declare a lack of significance in mean differences between populations when one exists. When in doubt over this issue one employs Welch’s test for an examination of mean differences in ANOVA.

Actually, when in doubt about variance heterogeneity you should do a couple of things, histogram your data by group, compute the respective population variances, run ANOVA using the usual test and Welch’s test and see what you see. If you do this enough you will gain a good visual understanding of just how much heterogeneity is probably going to cause problems. This, in turn, will give you confidence in the results of your calculations.

Control Charts

As for control charts – data normality is not an issue. A good discussion of this can be found in Understanding Statistical Process Control 2nd Edition Wheeler and Chambers in Chapter 4 starting on page 76 under the subsection titled “Myths About Shewhart’s Charts.”

Regression

While not specifically mentioned in your initial post – data normality is also not an issue when it comes to regression. There are no restrictions on the distributions of the X’s or the Y’s.

Residuals need to be approximately normal because the tests for regression term significance are based on the t and F tests. But, as noted above – there is quite a bit of latitude with respect to normality approximation. For particulars you should read pages 8-24 of Applied Regression Analysis 2nd Edition by Draper and Smith and Chapter 3 of the same book “The Examination of the Residuals.” For an excellent understanding of the various facets of regression I would also recommend reading Regression Analysis by Example by Chatterjee and Price.

I would recommend you borrow the books I have listed (the inter-library loan system is your friend) and read the sections I’ve referenced.

@anvesh The fundamental difference between discrete and continuous data is that discrete has finite values (Yes/No, A,B,C) etc. while the variation between continuous data points is infinite. The more decimal places you can measure, the more different values a data point can have. Consider a histogram, which lumps continuous data together into discrete “buckets”. The output is a function of the inputs. Is the output specification (what the end or internal customer needs) measured as discrete? Yes, because there are upper and lower specification limits, which makes continuous data discrete. So in my opinion all data are ultimately discrete. We cannot measure infinitely small differences. In the end it’s within specifications or it isn’t

- AuthorPosts

Viewing 12 posts - 1 through 12 (of 12 total)