correlation between variable data and attribute data

Six Sigma – iSixSigma Forums Old Forums General correlation between variable data and attribute data

Viewing 5 posts - 1 through 5 (of 5 total)
  • Author
  • #30887

    [email protected]

    I need some help… I have a test result (Y) that gives my product either a pass or fail.  I have a second test (X) on this product that gives a time as a result, and if the time is greater than a specification, the product passes or fails that test.  I know that when I see failures in Y, I see lower values in X, and the same applies to higher values in X.  However, I need to know if there is a special regression to show this correlation, just to prove that a correlation exists. 
    I tried to do a simple correlation from Minitab, and the R-sqd(adj) was around 24.8% {giving a R of 0.498)… my MBB wishes to see a R>= 0.60 to say that there is a correlation.
    Why am I doing this?  I’m doing a project to minimize failures, and if I can track using variable data, my life would be easier, but the darn atribute test only reports pass or failure; so if I relate the variable test to the attribute test, I can then relate my project in terms of the variable test, and show that I’m reducing the amount of batches that are failing the attribute data.
    Any help is greatly appreciated.


    Robert Butler

      The problem that you have described is the situation where you have a continuous response variable (time) and you are regressing it on a coded X variable (-1 = fail, 1 = pass).  Usually, this kind of coding is understood to represent the high and low settings of a continuous X variable but it is also used to code a 2 level qualitative variable. 
      If the t values for the beta values in your model were statistically significant then you have a significant correlation regardless of the value of the R2 and you have demonstrated that a correlation does exist. All that R2 says about your regression equation is that the model that results from this effort can explain about 50% of the observed variation in the data.  R2 is one measure of the quality of a regression but one should never accept or reject a correlation just on the R2 value (there are many areas of effort where R2 values of .2 to .3 are the best that can be expected and the resultant models are quite useful) and one should never insist on a given R2 before declaring a model acceptable.
      You may want to check Chapter 1 of Applied Regression Analysis by Draper and Smith which has an excellent discussion of the issues surrounding the significance of a regression equation.


    Marc Richarsdon

    Try using binary logistic regression in Minitab. Binary logistic regression is a method used to determine how well a predictor variable predicts a binary response. For our purposes, we will define a binary response as the result of a test for which there are only two possible outcomes: in your case, pass or fail.
    By the way, why isn’t your MBB able to help you with your question?
    Marc Richardson
    Sr. Q.A. Engineer



    If you convert the pass fail numbers to a percent defective you are approaching variables data. Use an ImR chart to plot the data.



    Be careful…that all depends on how large your sample size is for each observation.  Percent defective data only begins behaving like variable data when the sample size approaches 20-30 or higher….

Viewing 5 posts - 1 through 5 (of 5 total)

The forum ‘General’ is closed to new topics and replies.