# Correlation

Six Sigma – iSixSigma Forums Old Forums General Correlation

Viewing 37 posts - 1 through 37 (of 37 total)
• Author
Posts
• #51884

sue
Member

Hi,
can anyone tell is it correct to use Paired-t test to perform correlation? My customer is telling me that i should use paired-t to analyze 2 set of data when studying the correlation?

0
#181497

CT
Participant

It might be,
What’s your data and what shape is it?

0
#181502

Darth
Participant

A paired t test requires a unique set of data where the two sets are matched before and after. It is used to look for a difference. Correlation is used for determining relationship. Don’t know where your customer is coming from and what their rationalization is for that request. Basic rule is that customers are usually dumb and as such ask for dumb things. The key is what is the research question that is trying to be answered. But overall, hypothesis testing and correlation analysis are for different questions.

0
#181508

CT
Participant

Maybe worth tieing down what they mean by correlation?
It can be used by some to be inter changable with the word relationship which is where the use of hypotheses testing may have come in. It’s not it’s true meaning but is often confused by people wanting to sound as they know what they’re talking about.

0
#181509

Mikel
Member

Sue,Go look up paired t-test and correlation. Look at what type of data
you have and tell us what you really are trying to do.

0
#181510

Ron
Member

A paired T is only approriate when the two groups are identified before and after an action such as before training and after training.
The comparison has to be made with the identical unit before and after the activity it receives.
A two sample t is appropriate if this is not the case.

0
#181512

Stevo
Member

Use the hypothesis test, babble about P values and confidence, thank them for the directional tip and charge them accordingly.

In the mean time, use the correct tool and wait for them to come back asking for the right information, give it to them and charge them accordingly.

From the book of Stevo.

0
#181516

Sloan
Participant

Sue,
I guess the real question at hand here is (based on the other responses) “Do you give them exactly what they ask for, or do you give them what they really need?”
Find out what practical question they are trying to answer and then use the right statistical tool to answer it.

0
#181525

Bower Chiel
Participant

Hi SueBox, Hunter and Hunter give the following data set.
Boy A B
1 13.2 14.0
2 8.2 8.8
3 10.9 11.2
4 14.3 14.2
5 10.7 11.8
6 6.6 6.4
7 9.5 9.8
8 10.8 11.3
9 8.8 9.3
10 13.3 13.6Each boy wore one shoe soled with material A and one soled with material B. The decision whether the shoe made with A was the right one or the left was made at random. The data are measurements of wear. A paired sample t-test demonstrates “a highly statistically significant increase in the amount of wear associated with the cheaper material B”.If you plot a scatter plot of B versus A you will in fact find a strong positive correlation which will simply reflect the obvious – less active boys with low wear on the A sole will have low wear on the B sole; more active boys with high wear on the A sole will also have high wear on the B sole. Does it add anything useful to the analysis? It looks as though other respondents are saying a very firm no! However I believe that when you run a paired t-test using the SPSS statistical package it gives you both the P-value for the paired t-test and also the P-value for testing the null hypothesis that the population correlation coefficient is zero. I imagine that the team that developed that software must have had a sound reason for doing so!Best WishesBower Chiel

0
#181539

Scott
Member

numbers of samples were exposed to the same treatment (heated 45 minutes at 350F) at the same time.  In which case, the great example given by Bower Chiel reinforces the power of a paired comparison.

analysis
p-vlaue

ANOVA =var
0.716497954

ANOVA not = var
0.716501068

paired
0.008538781

0
#181541

Darth
Participant

Bower, as you know the paired t test is really just a one sample t test that examines whether the difference between sample one and sample two is zero. You can get the same results by calculating another column of the difference and then doing a one sample t against a hypothesized value of zero. As you pointed out, correlation and the t test answer different questions. I still don’t know what the original poster is trying to do. An common conclusion for a set of data might be that there is a significant difference between the two groups yet they are highly correlated. Different yet correlated are two different conclusions. Key is what is the research question.

0
#181543

Scott
Member

Bower Chiel,
Nice job.  The example really brings home the point.  The correlation between factors A & B is very good.  The p-value for the population means is not statistically significant.  Yet, the p-value for the paired analysis shows a statistical difference.  I used excel for the example and these are the number that I obtained are below.
Sue,
Here is the major question?  Are you trying to show that the mean of two different populations are statistically different?  Note:  A linear regression with a good correlation coefficient does not mean that the intercept and slope are very near 0 and 1 respectively.  A difference in either the x-intercept or slope from 0 and 1 gives a great correlation coefficient but the means of the two populations are statistical different.
Perhaps your customer was saying that the individual measurements are correlated. For example, equal numbers of samples were exposed to the same treatment (heated 45 minutes at 350F) at the same time.  In which case, the great example given by Bower Chiel reinforces the power of a paired comparison.

analysis

p-vlaue

ANOVA =var

0.716497954

ANOVA not = var

0.716501068

paired

0.008538781

0
#181549

Darth
Participant

Looks like Sue has gone AWOL. Without her input as to the original question, all this discussion is just a guess. We all know that looking for difference is not the same as looking for relationship.

0
#181556

BTDT
Participant

Sue:Plot your data on an X-Y chart using Excel. Right click a data point, add a linear trend line, and print the equation on the graph.The general equation for a straight line is y=mx+b. If the pairs of data are only different by a constant, then the intercept will not equal to zero and the slope will be close to unity. The correlation will be quite high. This constant difference can be statistically tested using a paired t-test.However, I’m sure you can see that pairs of data could be related in all kinds of other ways even though there is still a linear relationship. The slope may be different from unity and the intercept can be anything even if the correlation is very high.In short, the paired t-test can by used to verify if the pairs of data are related by ONLY a constant difference, while correlation and regression can confirm a more general relationship.Try it on the following two sets of dataSet 1 (significant paired t-test, significant correlation)1, 22, 43, 54, 55, 7Set 2 (NOT significant paired t-test, significant correlation)1, 72, 53, 54, 45, 2Cheers, Alastair

0
#181576

sue
Member

Hi CT,
the data shows that:
For correlation method: Pearson correlation between 2 data = 0.998 with p-value =0.000, it does shows a linearity of R-Sq 99.6%.
As for using Paired-t test, P-Value = 0.064.
Both data does show that the correlation is quite high.
Following is the data:
Set A: 0.16463, 0.16347, 0.16246, 0.16378, 0.16197, 0.16388, 0.916276, 0.16424, 0.16343, 0.16383
Set B: 0.16465, 0,16341, 0.16237, 0.16378, 0.16189, 0.16387, 0.16268, 0.16416, 0.16351, 0.16374.

0
#181618

Roger G Preece
Member

I would agree with Darth’s comments regarding the use of the paired t-test and the requirement for a unique set of data.  The paired t-test is used when the same sample population provides both before and after test data – you are trying to determine if there is a statistically significant difference.  Whereas correlation is strictly used to determine the relationship between two variables.  So…depending on you test design and type of analysis results you are looking for, you need to determine the appropriate analytical tool.

0
#181619

BTDT
Participant

Sue:Data!I have looked at your data and have run the same analyses you have. I can conclude that this is data from a process with two measurements from each item or event.The interpretation depends on the circumstances.If you are testing if there is a significant difference between process A and process B, the two sample paired t-test is the appropriate test (In Minitab, Stat – Basic Stats – paired-t…). It shows there is NO significant difference between the two sets. Sorry.If you are running a measurement system analysis and are comparing two sets of measurements, the high correlation shows that the intra-measurement variation is very small compared with the part-to-part variation. This shows the measurement system is adequate. (In Minitab, Stat – Regression – Fitted line plot… with display prediction interval and display confidence interval). Even the difference between pair #9 is very small compared with the average variation in the measurements. The residual plots show the measurement error is not sensitive to the size of the measurement. I suspect you did the paired t-test (correctly) and showed there is no significant difference, then a stakeholder decided that you should run a correlation analysis to TRY to show some kind of statistical test to verify his/her opinion. Even then, the correct way is to do a linear fit, correlation, and residual analysis all in one.Hope this helps,Cheers, Alastair

0
#181621

Bower Chiel
Participant

Hi SueI think that you intended the 7th value in Set A to be 0.16276 but that is nitpicking. What would be good to have is the story behind these data. Are you able to post that?Best WishesBower Chiel

0
#181808

Rajeev seth
Participant

Hi
Actually it depend upun that what is your objective let me explain purpose of these tools
Correlation : Used to study correlation between two variables 1) Strength (Strong, Week or No Correlation) 2) Direction (+ve, -ve). But it is not a cause and effect relationship.
Regression : Is a cause and effect relationship & you can use it to predict a variable on the basis of other variable.
Paired t-Test: Is used to see if two inter linked populations are significantly different or not. Example : Line performance before & after putting new process in place, Call center efficiency before & after training.
Rajeev, HongKong

0
#181810

Obiwan
Participant

Regression is definitely NOT a cause and effect tool…never has been, never will be.
Obiwan

0
#181811

Obiwan
Participant

Oh…and wrong examples of paired t-test as well.  The examples you are stating cover almost every possible situation.
A better example is a comparison of tennis shoe wear…and whether the left foot of a person or right foot of a person wears the shoe out quicker…that is truly interlinked.  A process before and after, unless you hold absolutely every other variable constant, cannot be said to truly be interlinked.
Obiwan

0
#181812

bbusa
Participant

Rajeev’s definition of regression is , of course, wrong – regression will not help determine a ’cause and effect’ relationship between two variables. Paired t- test is used to test difference between means of samples when the samples are DEPENDENT. The example of employee performance ‘before’ and ‘after’ training Or weight loss ‘before’ and ‘after’ the weight loss program are quite relevant .bbusa

0
#181814

Anonymous
Guest

Obiwan,I’m interested in your recent comment.If regression isn’t a cause and effect tool, what is?My interest is coincidental as I’m currently reading Chapter X, of Foundations of Physics by Lindsay and Margenau, entitled The Problem of Causality.Cheers,Andy

0
#181815

Darth
Participant

Regression and correlation are measures of relationship. The classic example of shark attacks and ice cream sales shows strong correlation and you can develop a regression formula but it is not a cause and effect relationship. The cause is likely a lurking variable such as temperature. DOE is a useful tool for demonstrating cause and effect since it is controlled and real time rather than after the effect. Right Obi?

0
#181816

Anonymous
Guest

Darth,Regression can be based on happenstance data, or it can be based on setting levels; which do you mean.I also think it is also important to define whether the system is closed or open. For example, would you not say there was a causal relationship between gas pressure and temperature?Cheers,
Andy

0
#181817

Obiwan
Participant

Darth
As everyone that knows Star Wars knows…you were, at one point, my Padawan learner…and it appears that you learned well!  Of course, in real life, you are of much greater and grander senior standing than I am…and I have learned from you much!
Either way…you are, as usual, dead on with your explanation of regression and correlation!
Obiwan

0
#181818

Darth
Participant

Granted, a DOE is nothing but a big honkin regression wih the outcome being a prediction equation. I believe that most of the questions and applications that are posted here refer to a case of looking retrospectively at some characteristics and trying to infer c&e. Likewise, most seem to refer to an open system as you put it. That is why it might be safer to say regression demonstrates relationship and may or may not be c&e. In your case, increased pressure will likely cause increased temperature. In my example, increased shark attacks can’t be said to necessarily be an effect of increased ice cream sales. Increased temperature means more people at the beach and thus may effect ice cream sales and more people at the beach may mean more fish food. So, while ice cream sales and shark attacks may be highly correlated, there is not necessarily a causal effect. The cause is the lurking variable of temperature.

0
#181820

Obiwan
Participant

Andy
Regression is based on historical data, period.  If you are manipulating the levels of the inputs and reacting to that information, then you should be performing a designed experiment, which would happen to use regression and/or ANOVA for its analysis, typically.  That does NOT mean that regression, by itself can establish causality.  How do you know that relationship is not by “happenstance data?”
On the closed versus open question…regression did not establish causal relationship between gas, pressure and temperature…active experimentation did that.
Obiwan

0
#181821

Darth
Participant

Thank you Master. I still vividly recall our little “mine is bigger than yours is” demo we did in front of the class during the beginning of our time together. The issue of whether to use coded or uncoded values still comes up now and then. I just refer them to what Mini says in the Help section and don’t argue anymore.

0
#181823

Obiwan
Participant

Former Padawan…and bringer of balance to the force…I, as always, am humbled by your presence and cannot agree with you any stronger…much of the tempestuous days of our youth have led to wisdom in these later years…
Obiwan

0
#181831

Anonymous
Guest

Obiwan,Do you hold the view causality is a property of the experimentation, which has been lost in regressing the independent variable?Isn’t it true that Darth’s example does not fix the number of shark attacks, but has to accept whatever happens on a particular day?(Anyone remember the territorial sharks on Mustang Island?)In my view, the ability to set the value of an independent variable is what distinguishes cause and effect. In this poster’s example, from memory, as I’m not prepared to read all the posts again, the point of discussion was to determine if the within samples are correlated or not.If the experimenter has taken the trouble of orienting the sample and uniquely tagging the measurement locations, then the independent variable has been fixed and we (I at least) would say any relationship between the two measurement locations is causal.Cheers,
Andy

0
#181837

Anonymous
Guest

From http://www.mysanantonio.com/news/Then__Now_Dangerous_waters.html”In April 1987, a girl’s arm was bitten off by a shark at Mustang Island State Park, the Associated Press reported. Three months later, a teenager and a 32-year-old woman were bitten in the legs in separate shark attacks about 30 minutes apart in 4-foot-deep water off the island, near Port Aransas.”

0
#181839

Obiwan
Participant

Andy
Causality is not a property of the experimentation…it is a physical characteristic that the two factors represent.
By definition, you can “set” the value of the independent variable.  Just because I can “set” it does not mean that it truly is causal.  Forget the shark example…I have numerous examples of totally unrelated factors that show correlation…due to a lurking variable that affects both factors.  In one shop I worked at, we actually saw a correlation of changing the settings on one machine changing the output of an unrelated machine…obviously not causal.
Obiwan

0
#181842

Anonymous
Guest

Obiwan,I don’t claim two correlated variables have a causal relationship – that is old hat.What I’ve claimed is regression can be causal. If not, as Darth anticipated, why conduct DOEs?Andy

0
#181860

Obiwan
Participant

Andy
As with many threads on this bulletin board, this has quickly become tiring.  Regression simply does not show causality…end of discussion.  If you persist in believing that, please do so at your own risk…and you would be the only professional (?) in the Six Sigma world that does believe it.
Obiwan

0
#181868

Anonymous
Guest

For anyone else interested in this subject and not so easily bored, the following note might be of interest:http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1334690A Causal Law and Simple Regression Models by Chendrayan Chendroyaperumal

0
#181940

PDF
Participant
Viewing 37 posts - 1 through 37 (of 37 total)

The forum ‘General’ is closed to new topics and replies.