correlation between variable data and attribute data
Six Sigma – iSixSigma › Forums › Old Forums › General › correlation between variable data and attribute data
 This topic has 4 replies, 5 voices, and was last updated 19 years, 5 months ago by Intrepid.

AuthorPosts

November 25, 2002 at 11:28 pm #30887
[email protected]Participant@[email protected] Include @[email protected] in your post and this person will
be notified via email.Hi:
I need some help… I have a test result (Y) that gives my product either a pass or fail. I have a second test (X) on this product that gives a time as a result, and if the time is greater than a specification, the product passes or fails that test. I know that when I see failures in Y, I see lower values in X, and the same applies to higher values in X. However, I need to know if there is a special regression to show this correlation, just to prove that a correlation exists.
I tried to do a simple correlation from Minitab, and the Rsqd(adj) was around 24.8% {giving a R of 0.498)… my MBB wishes to see a R>= 0.60 to say that there is a correlation.
Why am I doing this? I’m doing a project to minimize failures, and if I can track using variable data, my life would be easier, but the darn atribute test only reports pass or failure; so if I relate the variable test to the attribute test, I can then relate my project in terms of the variable test, and show that I’m reducing the amount of batches that are failing the attribute data.
Any help is greatly appreciated.
Pritesh.0November 26, 2002 at 2:29 am #81049
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.The problem that you have described is the situation where you have a continuous response variable (time) and you are regressing it on a coded X variable (1 = fail, 1 = pass). Usually, this kind of coding is understood to represent the high and low settings of a continuous X variable but it is also used to code a 2 level qualitative variable.
If the t values for the beta values in your model were statistically significant then you have a significant correlation regardless of the value of the R2 and you have demonstrated that a correlation does exist. All that R2 says about your regression equation is that the model that results from this effort can explain about 50% of the observed variation in the data. R2 is one measure of the quality of a regression but one should never accept or reject a correlation just on the R2 value (there are many areas of effort where R2 values of .2 to .3 are the best that can be expected and the resultant models are quite useful) and one should never insist on a given R2 before declaring a model acceptable.
You may want to check Chapter 1 of Applied Regression Analysis by Draper and Smith which has an excellent discussion of the issues surrounding the significance of a regression equation.0November 26, 2002 at 2:09 pm #81059
Marc RicharsdonParticipant@MarcRicharsdon Include @MarcRicharsdon in your post and this person will
be notified via email.Robert,
Try using binary logistic regression in Minitab. Binary logistic regression is a method used to determine how well a predictor variable predicts a binary response. For our purposes, we will define a binary response as the result of a test for which there are only two possible outcomes: in your case, pass or fail.
By the way, why isn’t your MBB able to help you with your question?
Marc Richardson
Sr. Q.A. Engineer0November 26, 2002 at 4:47 pm #81067If you convert the pass fail numbers to a percent defective you are approaching variables data. Use an ImR chart to plot the data.
0December 7, 2002 at 4:21 am #81333
IntrepidParticipant@Intrepid Include @Intrepid in your post and this person will
be notified via email.Be careful…that all depends on how large your sample size is for each observation. Percent defective data only begins behaving like variable data when the sample size approaches 2030 or higher….
0 
AuthorPosts
The forum ‘General’ is closed to new topics and replies.