# Choosing the Right Hypothesis Test

Six Sigma – iSixSigma › Forums › General Forums › General › Choosing the Right Hypothesis Test

- This topic has 7 replies, 4 voices, and was last updated 2 weeks, 6 days ago by Fausto Galetto.

- AuthorPosts
- February 18, 2020 at 1:16 am #246209

amir_h2opoloParticipant@amir_h2opolo**Include @amir_h2opolo in your post and this person will**

be notified via email.I’m trying to find the right Hypothesis test for a set of data. I want to check if the presence of a condition (which is responded by Yes/No) affects the outcome (which is shown in percentage). If anyone can help me with that it’d be greatly appreciated.

0February 18, 2020 at 3:40 am #246210

Andrew ParrParticipant@Andy-Parr**Include @Andy-Parr in your post and this person will**

be notified via email.Have you looked at the details on here. For example,

https://www.isixsigma.com/community/blogs/the-history-of-the-hypothesis-testing-flow-chart/

and other material that Michael @michaelcyger and Mike @mikecarnell have contributed?

1February 18, 2020 at 8:38 am #246212

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.There’s any number of ways you could do this. You could run a regression with percentage as the Y and yes/no as the X. You could use a two-sample t-test with the percentages corresponding to a “no” as one group and the percentages corresponding to a “yes” as the other group. If the distribution of the percentages is crazy non-normal you might want to run the t-test and the Wilcoxon-Mann-Whitney test side by side to see if they agree (both either find or do not find a significant difference).

0February 21, 2020 at 7:32 am #246271

amir_h2opoloParticipant@amir_h2opolo**Include @amir_h2opolo in your post and this person will**

be notified via email.Thanks @rbutler, that helped.

How can I run regression while one column of the table is just “Yes” and “No”?

Could you please explain how can I interpret the P=.127?

0February 21, 2020 at 8:38 am #246274

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.The yes/no is the X variable and the percentage is the Y therefore just code no = -1 and yes = 1 and regress the percentage against those two values. As for the p-value the “traditional” choice for significance of a p-value is < .05 so, using that criteria a p of .127 says you don’t have a significant correlation between the yes/no and the percentage. This would argue for the case that whatever is associated with yes/no is not having an impact on the percentages.

Since correlation does not guarantee causation and causation will not guarantee you will find correlation what you need to do (you should do this in every instance anyway) is put your residuals through a wringer before concluding that nothing is happening. You would want to plot the residuals against the predicted values as well as against the yes/no response. If there are other things you know about the data (for instance, you know it was gathered over time and you have a time stamp for each piece of data) you will want to look at the data and the residuals against these variables as well.

Since you had a good reason for suspecting a relationship a check of the residual patterns will help you find data behavior that might account for your lack of significance. If the data structure is adversely impacting the regression you may see things like clusters of data, a few extreme data points, trends in yes/no choices that are non-random over time, etc. which are adversely influencing the correlation.

If you should find such patterns you would want to identify the data points to which they correspond and re-run the regression. If the revised analysis results in statistical significance then you will need to go back to the process and try to identify the source of the influential data points. If it turns out there is something physically wrong with those data points you could justify eliminating them from the analysis and reporting your findings but you will also want to make sure that you clearly discuss this decision in your report.

1March 8, 2020 at 7:42 am #246569

amir_h2opoloParticipant@amir_h2opolo**Include @amir_h2opolo in your post and this person will**

be notified via email.As you suggested I plotted yes/no vs percentage and came up with the graph that I attached here. But I can’t conclude any regression from this plot. Is this the plot you meant?

this is the explanation from minitab; alpha which is the risk of concluding that the mean of Yes differs from the mean of No when in fact it doesn’t, is .05. if the p-value is less than .05, you can conclude that the mean of Yes differs from No at the .05 level of significance.

I kind of don’t understand why. we calculate t-value then transfer that into p-value. when the p-value gets bigger the t-value gets smaller. and for a smaller t-value the probability that we say there is no difference while there is difference gets higher. what am i missing here?

0March 8, 2020 at 12:10 pm #246570

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.It’s not a matter of calculating the t value and then transferring it into a p-value. What you are doing is computing a t statistic and then checking that to see if the value you get meets the test for significance. Small t-values breed large p-values and conversely. :-)

You are doing what I recommended and your plot is telling you what you computed – there is a slight offset between the two groups of data which means that the averages are numerically different but there really is no statistically significant difference. If you want an assessment of the means then you could turn the analysis around – treat the yes/no as categorical and run a one way ANOVA on the results. You’ll get the mean values for the two choices and you will also get no significant difference.

All of the above is based on commenting on the overall plot you have provided. There are a couple of interesting things about your plot you should investigate. It is a small sample size and it looks like your lack of significance is driven by 4 points – the two points at -1 that are much lower than the main group and the two points at 1 that are higher than the main group.

As a check – remove those 4 points and run the analysis again – my guess is you will either have statistical significance (P < .05) or be very close. If you do get significance, and if it is a situation where you were expecting significance, then you will want to go back to the data to see if there is anything out of the ordinary with respect to those 4 points. If there is something that

**physically**differentiates those points from their respective populations and if their deletion results in significance then I would recommend the following:1. Report out the findings with all of the data.

2. Include the plot as part of the report.

3. Emphasize the small size of the sample.

4. Report the results with the deletion of the 4 points.

5. Comment on what you have found to be different about the 4 points.

6. Recommend additional samples be taken with an eye towards controlling for whatever it was that you found to be different about the 4 points and re-run the analysis with an increased sample size to see what you see.

- This reply was modified 3 weeks ago by Robert Butler. Reason: typo

1March 9, 2020 at 4:49 am #246580

Fausto GalettoParticipant@fausto.galetto**Include @fausto.galetto in your post and this person will**

be notified via email. - AuthorPosts

You must be logged in to reply to this topic.