# Correlation

Six Sigma – iSixSigma › Forums › Old Forums › General › Correlation

This topic contains 36 replies, has 15 voices, and was last updated by PDF 10 years, 7 months ago.

- AuthorPosts
- February 20, 2009 at 9:18 am #51884
Hi,

can anyone tell is it correct to use Paired-t test to perform correlation? My customer is telling me that i should use paired-t to analyze 2 set of data when studying the correlation?0February 20, 2009 at 10:33 am #181497It might be,

What’s your data and what shape is it?0February 20, 2009 at 12:21 pm #181502A paired t test requires a unique set of data where the two sets are matched before and after. It is used to look for a difference. Correlation is used for determining relationship. Don’t know where your customer is coming from and what their rationalization is for that request. Basic rule is that customers are usually dumb and as such ask for dumb things. The key is what is the research question that is trying to be answered. But overall, hypothesis testing and correlation analysis are for different questions.

0February 20, 2009 at 1:25 pm #181508Maybe worth tieing down what they mean by correlation?

It can be used by some to be inter changable with the word relationship which is where the use of hypotheses testing may have come in. It’s not it’s true meaning but is often confused by people wanting to sound as they know what they’re talking about.0February 20, 2009 at 1:39 pm #181509Sue,Go look up paired t-test and correlation. Look at what type of data

you have and tell us what you really are trying to do.0February 20, 2009 at 1:51 pm #181510A paired T is only approriate when the two groups are identified before and after an action such as before training and after training.

The comparison has to be made with the identical unit before and after the activity it receives.

A two sample t is appropriate if this is not the case.0February 20, 2009 at 2:36 pm #181512Use the hypothesis test, babble about P values and confidence, thank them for the directional tip and charge them accordingly.

In the mean time, use the correct tool and wait for them to come back asking for the right information, give it to them and charge them accordingly.

From the book of Stevo.0February 20, 2009 at 3:25 pm #181516Sue,

I guess the real question at hand here is (based on the other responses) “Do you give them exactly what they ask for, or do you give them what they really need?”

Find out what practical question they are trying to answer and then use the right statistical tool to answer it.0February 20, 2009 at 6:03 pm #181525

Bower ChielParticipant@Bower-Chiel**Include @Bower-Chiel in your post and this person will**

be notified via email.Hi SueBox, Hunter and Hunter give the following data set.

Boy A B

1 13.2 14.0

2 8.2 8.8

3 10.9 11.2

4 14.3 14.2

5 10.7 11.8

6 6.6 6.4

7 9.5 9.8

8 10.8 11.3

9 8.8 9.3

10 13.3 13.6Each boy wore one shoe soled with material A and one soled with material B. The decision whether the shoe made with A was the right one or the left was made at random. The data are measurements of wear. A paired sample t-test demonstrates “a highly statistically significant increase in the amount of wear associated with the cheaper material B”.If you plot a scatter plot of B versus A you will in fact find a strong positive correlation which will simply reflect the obvious – less active boys with low wear on the A sole will have low wear on the B sole; more active boys with high wear on the A sole will also have high wear on the B sole. Does it add anything useful to the analysis? It looks as though other respondents are saying a very firm no! However I believe that when you run a paired t-test using the SPSS statistical package it gives you both the P-value for the paired t-test and also the P-value for testing the null hypothesis that the population correlation coefficient is zero. I imagine that the team that developed that software must have had a sound reason for doing so!Best WishesBower Chiel0February 20, 2009 at 9:09 pm #181539numbers of samples were exposed to the same treatment (heated 45 minutes at 350F) at the same time. In which case, the great example given by Bower Chiel reinforces the power of a paired comparison.

analysis

p-vlaueANOVA =var

0.716497954ANOVA not = var

0.716501068paired

0.0085387810February 20, 2009 at 9:29 pm #181541Bower, as you know the paired t test is really just a one sample t test that examines whether the difference between sample one and sample two is zero. You can get the same results by calculating another column of the difference and then doing a one sample t against a hypothesized value of zero. As you pointed out, correlation and the t test answer different questions. I still don’t know what the original poster is trying to do. An common conclusion for a set of data might be that there is a significant difference between the two groups yet they are highly correlated. Different yet correlated are two different conclusions. Key is what is the research question.

0February 21, 2009 at 4:32 am #181543Bower Chiel,

Nice job. The example really brings home the point. The correlation between factors A & B is very good. The p-value for the population means is not statistically significant. Yet, the p-value for the paired analysis shows a statistical difference. I used excel for the example and these are the number that I obtained are below.

Sue,

Here is the major question? Are you trying to show that the mean of two different populations are statistically different? Note: A linear regression with a good correlation coefficient does not mean that the intercept and slope are very near 0 and 1 respectively. A difference in either the x-intercept or slope from 0 and 1 gives a great correlation coefficient but the means of the two populations are statistical different.

Perhaps your customer was saying that the individual measurements are correlated. For example, equal numbers of samples were exposed to the same treatment (heated 45 minutes at 350F) at the same time. In which case, the great example given by Bower Chiel reinforces the power of a paired comparison.analysis

p-vlaue

ANOVA =var

0.716497954

ANOVA not = var

0.716501068

paired

0.008538781

0February 21, 2009 at 3:27 pm #181549Looks like Sue has gone AWOL. Without her input as to the original question, all this discussion is just a guess. We all know that looking for difference is not the same as looking for relationship.

0February 21, 2009 at 9:57 pm #181556Sue:Plot your data on an X-Y chart using Excel. Right click a data point, add a linear trend line, and print the equation on the graph.The general equation for a straight line is y=mx+b. If the pairs of data are only different by a constant, then the intercept will not equal to zero and the slope will be close to unity. The correlation will be quite high. This constant difference can be statistically tested using a paired t-test.However, I’m sure you can see that pairs of data could be related in all kinds of other ways even though there is still a linear relationship. The slope may be different from unity and the intercept can be anything even if the correlation is very high.In short, the paired t-test can by used to verify if the pairs of data are related by ONLY a constant difference, while correlation and regression can confirm a more general relationship.Try it on the following two sets of dataSet 1 (significant paired t-test, significant correlation)1, 22, 43, 54, 55, 7Set 2 (NOT significant paired t-test, significant correlation)1, 72, 53, 54, 45, 2Cheers, Alastair

0February 23, 2009 at 1:44 am #181576Hi CT,

the data shows that:

For correlation method: Pearson correlation between 2 data = 0.998 with p-value =0.000, it does shows a linearity of R-Sq 99.6%.

As for using Paired-t test, P-Value = 0.064.

Both data does show that the correlation is quite high.

Following is the data:

Set A: 0.16463, 0.16347, 0.16246, 0.16378, 0.16197, 0.16388, 0.916276, 0.16424, 0.16343, 0.16383

Set B: 0.16465, 0,16341, 0.16237, 0.16378, 0.16189, 0.16387, 0.16268, 0.16416, 0.16351, 0.16374.

0February 23, 2009 at 8:51 pm #181618

Roger G PreeceMember@Roger-G-Preece**Include @Roger-G-Preece in your post and this person will**

be notified via email.I would agree with Darth’s comments regarding the use of the paired t-test and the requirement for a unique set of data. The paired t-test is used when the same sample population provides both before and after test data – you are trying to determine if there is a statistically significant difference. Whereas correlation is strictly used to determine the relationship between two variables. So…depending on you test design and type of analysis results you are looking for, you need to determine the appropriate analytical tool.

0February 23, 2009 at 10:42 pm #181619Sue:Data!I have looked at your data and have run the same analyses you have. I can conclude that this is data from a process with two measurements from each item or event.The interpretation depends on the circumstances.If you are testing if there is a significant difference between process A and process B, the two sample paired t-test is the appropriate test (In Minitab, Stat – Basic Stats – paired-t…). It shows there is NO significant difference between the two sets. Sorry.If you are running a measurement system analysis and are comparing two sets of measurements, the high correlation shows that the intra-measurement variation is very small compared with the part-to-part variation. This shows the measurement system is adequate. (In Minitab, Stat – Regression – Fitted line plot… with display prediction interval and display confidence interval). Even the difference between pair #9 is very small compared with the average variation in the measurements. The residual plots show the measurement error is not sensitive to the size of the measurement. I suspect you did the paired t-test (correctly) and showed there is no significant difference, then a stakeholder decided that you should run a correlation analysis to TRY to show some kind of statistical test to verify his/her opinion. Even then, the correct way is to do a linear fit, correlation, and residual analysis all in one.Hope this helps,Cheers, Alastair

0February 23, 2009 at 11:34 pm #181621

Bower ChielParticipant@Bower-Chiel**Include @Bower-Chiel in your post and this person will**

be notified via email.Hi SueI think that you intended the 7th value in Set A to be 0.16276 but that is nitpicking. What would be good to have is the story behind these data. Are you able to post that?Best WishesBower Chiel

0February 28, 2009 at 3:19 am #181808

Rajeev sethParticipant@Rajeev-seth**Include @Rajeev-seth in your post and this person will**

be notified via email.Hi

Actually it depend upun that what is your objective let me explain purpose of these tools

Correlation : Used to study correlation between two variables 1) Strength (Strong, Week or No Correlation) 2) Direction (+ve, -ve). But it is not a cause and effect relationship.

Regression : Is a cause and effect relationship & you can use it to predict a variable on the basis of other variable.

Paired t-Test: Is used to see if two inter linked populations are significantly different or not. Example : Line performance before & after putting new process in place, Call center efficiency before & after training.

Rajeev, HongKong0February 28, 2009 at 5:41 am #181810Regression is definitely NOT a cause and effect tool…never has been, never will be.

Obiwan0February 28, 2009 at 5:43 am #181811Oh…and wrong examples of paired t-test as well. The examples you are stating cover almost every possible situation.

A better example is a comparison of tennis shoe wear…and whether the left foot of a person or right foot of a person wears the shoe out quicker…that is truly interlinked. A process before and after, unless you hold absolutely every other variable constant, cannot be said to truly be interlinked.

Obiwan0February 28, 2009 at 6:05 am #181812Rajeev’s definition of regression is , of course, wrong – regression will not help determine a ’cause and effect’ relationship between two variables. Paired t- test is used to test difference between means of samples when the samples are DEPENDENT. The example of employee performance ‘before’ and ‘after’ training Or weight loss ‘before’ and ‘after’ the weight loss program are quite relevant .bbusa

0February 28, 2009 at 12:33 pm #181814

AnonymousObiwan,I’m interested in your recent comment.If regression isn’t a cause and effect tool, what is?My interest is coincidental as I’m currently reading Chapter X, of Foundations of Physics by Lindsay and Margenau, entitled The Problem of Causality.Cheers,Andy

0February 28, 2009 at 1:19 pm #181815Regression and correlation are measures of relationship. The classic example of shark attacks and ice cream sales shows strong correlation and you can develop a regression formula but it is not a cause and effect relationship. The cause is likely a lurking variable such as temperature. DOE is a useful tool for demonstrating cause and effect since it is controlled and real time rather than after the effect. Right Obi?

0February 28, 2009 at 2:15 pm #181816

AnonymousDarth,Regression can be based on happenstance data, or it can be based on setting levels; which do you mean.I also think it is also important to define whether the system is closed or open. For example, would you not say there was a causal relationship between gas pressure and temperature?Cheers,

Andy0February 28, 2009 at 2:47 pm #181817Darth

As everyone that knows Star Wars knows…you were, at one point, my Padawan learner…and it appears that you learned well! Of course, in real life, you are of much greater and grander senior standing than I am…and I have learned from you much!

Either way…you are, as usual, dead on with your explanation of regression and correlation!

Obiwan0February 28, 2009 at 2:49 pm #181818Granted, a DOE is nothing but a big honkin regression wih the outcome being a prediction equation. I believe that most of the questions and applications that are posted here refer to a case of looking retrospectively at some characteristics and trying to infer c&e. Likewise, most seem to refer to an open system as you put it. That is why it might be safer to say regression demonstrates relationship and may or may not be c&e. In your case, increased pressure will likely cause increased temperature. In my example, increased shark attacks can’t be said to necessarily be an effect of increased ice cream sales. Increased temperature means more people at the beach and thus may effect ice cream sales and more people at the beach may mean more fish food. So, while ice cream sales and shark attacks may be highly correlated, there is not necessarily a causal effect. The cause is the lurking variable of temperature.

0February 28, 2009 at 2:51 pm #181820Andy

Regression is based on historical data, period. If you are manipulating the levels of the inputs and reacting to that information, then you should be performing a designed experiment, which would happen to use regression and/or ANOVA for its analysis, typically. That does NOT mean that regression, by itself can establish causality. How do you know that relationship is not by “happenstance data?”

On the closed versus open question…regression did not establish causal relationship between gas, pressure and temperature…active experimentation did that.

Obiwan0February 28, 2009 at 2:52 pm #181821Thank you Master. I still vividly recall our little “mine is bigger than yours is” demo we did in front of the class during the beginning of our time together. The issue of whether to use coded or uncoded values still comes up now and then. I just refer them to what Mini says in the Help section and don’t argue anymore.

0February 28, 2009 at 3:00 pm #181823Former Padawan…and bringer of balance to the force…I, as always, am humbled by your presence and cannot agree with you any stronger…much of the tempestuous days of our youth have led to wisdom in these later years…

Obiwan0March 1, 2009 at 8:07 am #181831

AnonymousObiwan,Do you hold the view causality is a property of the experimentation, which has been lost in regressing the independent variable?Isn’t it true that Darth’s example does not fix the number of shark attacks, but has to accept whatever happens on a particular day?(Anyone remember the territorial sharks on Mustang Island?)In my view, the ability to set the value of an independent variable is what distinguishes cause and effect. In this poster’s example, from memory, as I’m not prepared to read all the posts again, the point of discussion was to determine if the within samples are correlated or not.If the experimenter has taken the trouble of orienting the sample and uniquely tagging the measurement locations, then the independent variable has been fixed and we (I at least) would say any relationship between the two measurement locations is causal.Cheers,

Andy0March 1, 2009 at 6:13 pm #181837

AnonymousFrom http://www.mysanantonio.com/news/Then__Now_Dangerous_waters.html“In April 1987, a girl’s arm was bitten off by a shark at Mustang Island State Park, the Associated Press reported. Three months later, a teenager and a 32-year-old woman were bitten in the legs in separate shark attacks about 30 minutes apart in 4-foot-deep water off the island, near Port Aransas.”

0March 2, 2009 at 12:04 am #181839Andy

Causality is not a property of the experimentation…it is a physical characteristic that the two factors represent.

By definition, you can “set” the value of the independent variable. Just because I can “set” it does not mean that it truly is causal. Forget the shark example…I have numerous examples of totally unrelated factors that show correlation…due to a lurking variable that affects both factors. In one shop I worked at, we actually saw a correlation of changing the settings on one machine changing the output of an unrelated machine…obviously not causal.

Obiwan0March 2, 2009 at 7:49 am #181842

AnonymousObiwan,I don’t claim two correlated variables have a causal relationship – that is old hat.What I’ve claimed is regression can be causal. If not, as Darth anticipated, why conduct DOEs?Andy

0March 2, 2009 at 4:16 pm #181860Andy

As with many threads on this bulletin board, this has quickly become tiring. Regression simply does not show causality…end of discussion. If you persist in believing that, please do so at your own risk…and you would be the only professional (?) in the Six Sigma world that does believe it.

Adios

Obiwan0March 2, 2009 at 7:13 pm #181868

AnonymousFor anyone else interested in this subject and not so easily bored, the following note might be of interest:http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1334690A Causal Law and Simple Regression Models by Chendrayan Chendroyaperumal

0March 4, 2009 at 9:17 am #181940 - AuthorPosts

The forum ‘General’ is closed to new topics and replies.