SUNDAY, MAY 19, 2013
Font Size
Topic Data Transformation

Data Transformation

HomeForumsGeneral ForumsTools & TemplatesData Transformation

This topic has 5 voices, contains 6 replies, and was last updated by Avatar of Mike Carnell Mike Carnell 316 days ago.

Viewing 9 posts - 1 through 9 (of 9 total)
Author Posts
Author Posts
June 26, 2012 at 7:15 am #183424
Avatar of Manbir Singh
Manbir Singh
Reputation - 76
Rank - Aluminum

It is advised that if data is not normal then we do data transformation to make to Normal. Commonly used transformation is Box Cox. what is the need to transform data when we have non parametric tests available.

June 26, 2012 at 9:01 am #183431
Avatar of Joel Smith
Joel Smith
Reputation - 974
Rank - Copper

Manbir-

Whether or not you should use a transformation depends greatly on what tool you are using. For example, it is not necessary on a control chart unless your data are highly skewed or you are getting nonsensical control limits and even then a different chart may solve the issue without transformation.

With respect to nonparametric tests, there are two things to consider. The first is that although normality may be an assumption of a test in the mathematical sense, research shows that many of those tests (1- and 2-Sample T, One-way ANOVA, etc.) are very robust and that normality is not required. Second, nonparametric tests are almost across-the-board much less powerful tests than their parametric counterparts, so you give up alot when switching to one.

Capability Analysis is one tool that is highly sensititive to the distribution assumed. If your data is non-normal, then Capability is available for many other distributions. If your data doesn’t seem to fit a distribution and is bi-modal or has some other very unusual shape, your process may not be in control or your data may represent more than one process.

Good luck,

Joel

June 26, 2012 at 9:03 am #183432
Avatar of Robert Butler
Robert Butler
Reputation - 2138
Rank - Silver

If that is what has been advised then it’s not very good advice. The first thing you need to do is plot the data – histogram, normal probability plot, time plot, whatever makes sense, and look at what you have. Once you know what it looks like you then need to have some understanding of what it is that you want to do with the data since, in many instances, data normality isn’t an issue.

As for non-parametric tests – they are not a cure all. They too have issues and you need to know what they are before you decide to use them

June 26, 2012 at 10:05 pm #183454
Avatar of Manbir Singh
Manbir Singh
Reputation - 76
Rank - Aluminum

Thanks Robert & Joel. This helps a lot.
So can i say that even if the data is not normal i can go ahead with Anova and 1,2 Sample T tests as long as the data is too skewed and only few data points are outside control limits. However i need to figure out what distribution type it is to capture accurate process capability?

June 26, 2012 at 10:19 pm #183456
Avatar of Chris Seider
Chris Seider
Reputation - 3002
Rank - Titanium

@talk2manbir

If you are talking 5,10, 15, 20+% defective, don’t even bother coming up with the right distribution to get the most appropriate process capability. As @rbutler stated it well, go look for your causes and begin to eliminate the reasons. Customers don’t care what distribution the data follows, they care about the % out of specification.

If you do a basic process capability (normal) using Minitab, just use the % or ppm observed in the lower left box. Make this your primary metric, and do what rbutler advised.

Some advocate transformations, but I advise my belts to begin to solve the problems and then rationalization can occur as to which distribution might make sense.

  • This reply was modified 326 days ago by Avatar of Katie Barry Katie Barry.
July 5, 2012 at 2:26 pm #183720
Avatar of Darth
Darth
Reputation - 1285
Rank - Silver

@joelatminitab Hey Joel, hope all is well. Circling back to the last comment the poster made, he concluded that he could go ahead and do ANOVA if the normality wasn’t too bad. Let’s not forget about the notion of equal variances. That might be a bigger issue than normality, eh?

July 6, 2012 at 5:28 am #183733
Avatar of Joel Smith
Joel Smith
Reputation - 974
Rank - Copper

@Darth – That’s right. Alternatively, the Welch Test is not sensitive to differences in variation. The Assistant Menu in Minitab uses the Welch Test for One-Way ANOVA while the Stat Menu uses the traditional F-Test.

If the equal variances assumption is not met, I’d recommend Welch first before doing a nonparametric.

July 6, 2012 at 8:39 am #183735
Avatar of Darth
Darth
Reputation - 1285
Rank - Silver

@joelatminitab Just making sure you are still on top of things :-).

July 7, 2012 at 10:31 am #183752
Avatar of Mike Carnell
Mike Carnell
Reputation - 3168
Rank - Titanium

@Darth @joelatminitab I am sure Joel feels much better knowing that you are checking up on him. It looks like you have picked up a Canadian accent?

Viewing 9 posts - 1 through 9 (of 9 total)

Register Now

  • Stop this in-your-face notice
  • Reserve your username
  • Follow people you like, learn from
  • Extend your profile
  • Gain reputation for your contributions
  • No annoying captchas across site
And much more! C'mon, register now.

Reply To: Data Transformation
Your information:






<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>


Practical Lean Six Sigma Problem Solving from Air Academy Associates
Lean and Six Sigma eLearning and Blended Solutions
Lean and Six Sigma Project Examples

Login Form