Home › Forums › General Forums › Tools & Templates › The Dangers of Johnson (and Other) Transformations
Tagged: johnson transformation
This topic contains 12 replies, has 8 voices, and was last updated by Keller 2 years, 8 months ago.
Hi,
I have a quick question for those of you with ore in depth statistical understanding than I do.
When I first went through my six sigma training before I got my green belt, I was taught to NEVER use a Johnson or other transformation in order to analyze non-normal data. The instruction was to instead try to find out why my data was non-normal and see what I could do to fix it.
I have lived by this mantra now for several years, but I was challenged on it the other day in a project in which there was no desire to take extra time and samples to figure out their deliverables. It was much easier for them to just do a Johnson transformation and be done with it.
This is when I realized, that I don’t quite understand what the dangers are well enough in order to make an educated decision when or why it is safe to use a Johnson transformation.
Does anyone have a good understanding of what – exactly – the dangers are when it comes to these transformations, and why it is a good idea to avoid them?
Thanks,
Matt
@mkarlsson The advice to fix all your non-normal data was just plain stupid advice. There are a lot of things in this world that are not normally distributed. Time is frequently not normally distributed particularly when it is bounded by zero on one end (actually it is always bounded by zero – always on the same end even – but the data doesn’t always bump up against it).
It is always good to run through several iterations of graphs just to look at the data in different ways to try to understand it. I wouldn’t get wrapped around the axel just because it was not normal. I would call your instructor and let him know what crap advice he handed out (I wasn’t your instructor, was I? If I was don’t call).
The other thing I don’t understand is why you want to transform things? There are tools to analyze non-normal data. If you have Minitab there is Capability for non-normal data, Non Parametric tests, Levines test. The help menu is good in terms of telling you the assumptions. Stop screwing around making the analysis a science project. Taking the result of the analysis and doing something with it is where the fun (and your future) is. Sitting at your computer transforming stuff that doesn’t need to be transformed makes you look like a nerd.
Just my opinion.
It’s an interesting mantra, it’s boilerplate, and it is simultaneously right and wrong. There’s nothing dangerous about transforms. The only danger, and this is what your instructor was trying to make you understand, is the blind application of a transform (or any other statistical tool for that matter)without looking at the data (that is, really looking at it). In short, he/she wanted to make sure that you do first things first.
First thing – plot the data and look at it and spend some time understanding the process from whence the data came.
a. Plots should include a histogram, a boxplot, and a normal probability plot.
b. You should know how various distributions look when they are plotted in the above manner.
Second thing – if the data is non-normal how is it non-normal? Is it multimodal? Is it truncated? Does it appear to have a natural upper or lower bound? Do the tails look “too heavy”? etc.
Third thing – given the items listed above you should know what to do when confronted with any of them.
For example:
Multimodal – probably means multiple feeds of some kind therefore you are making multiple products – better find out the story between the modes before doing anything else.
Truncated – who’s cutting off the tails of the product distribution and why?
Apparent natural upper/lower bound – why – many processes have natural bounds and if you are operating too close to those bounds your distribution will always be non-normal – see Bothe – Measuring Process Capability Chapter 8 “Measuring Capability for Non-Normal Data” for lots of examples.
Tails too heavy – why – does it matter?
etc.
After satisfying yourself that the data is representative of the process when everything is under control then, and only then, should you give some thought to the need for data transformation.
If you do transform you will need to know what the transform is doing for you and if it matters. For example, if your data is truncated (typically happens when supplier is cherry picking material lots) all the transforms in the world won’t make that data normal – it is a truncated whatever and it will remain so.
If you do transform – run whatever analysis you are running with both transformed and untransformed data to see if anything changes with respect to outcomes or actions that might be taken as a result of the analysis. If nothing changes then you might want to ask why you would want to bother with a transform in the first place.
One final thought. If your crew is running a Johnson transform – which one are they running?
@mkarlsson – Both Mike and Robert have given you excellent advice (as would be expected). I’m more concerned that you have a team “in which there was no desire to take extra time and samples to figure out their deliverables. It was much easier for them to just do a Johnson transformation and be done with it.” Does that mean that they have done what Robert describes above? If so, then plow on, if not, how are you going to determine what to do? Seek first to understand.
There is one very real risk in transforming data, and that is that you also need to transform the spec limits for a capability study. This often confuses people and they try to “untransform” them (even putting in reference lines in Minitab). This can cause more problems than originally existed.
There is nothing inherently right or wrong with transformation, you just need to use it appropriately.
In summary:
1. Try to understand the data and see if there are some obvious reasons why it might not be normal. No rule says all data and all processes are normally distributed.
2. Try using non parametrics.
3. If the first two don’t really give you what you need, then cautiously transform the data keeping in mind that the units of measure will be totally different than the original data and can cause some confusion if looked at by someone unfamiliar with transformed data.
Yeayahhhhh!
Yeayahhhhh!
The voice of reason prevails!
Kudos to all.
@spazwhatsup – Got a problem, dude?
I followed your link above to your website and article. In it you say,
“Any distribution can be characterized by four parameters, whose calculations are the same for any distribution:”
You then provide a generic description of the mean, s.d., skewness and kurtosis. I would beg to differ that the calculations for the s.d. of a binomial and Poisson are quite a bit different than for the Normal. While all distributions have descriptors of central tendency, variation and shape I feel your sentence is misleading and possibly incorrect.
Note to all: The link that @pkeller@qualityamerica.com provided (and @Darth is referring to) has since been removed. We ask that forum participants do not promote products, services or businesses on the discussion forum, including articles contained on their professional websites. [Forum Etiquette guidelines – http://www.isixsigma.com/topic/forum-etiquette/
Katie: Sorry about that. I was simply trying to provide more information than can easily be typed into this forum.
Very good point Paul.
@pkeller@qualityamerica.com Whoops, now that Katie has removed the links, I can’t really follow up other than to question what you mean by a general formula for calculating s.d. We are all familiar with the formula for continuous data but the calculations for discrete data are all very different eg: s.d. for a Poisson is sq. root of lambda while the s.d. for a binomial utilizes the sq. rt. using p and (1-p)etc. I do agree on a more macro perspective that worrying about an uncontrolled process makes little sense but then again, Shewhart designed the control chart to be very robust to distribution shape and well documented by Wheeler and his writings.
As I said, the formula for calculating the standard deviation of a set of empirical data is different than the formulas for calculating the standard deviation of a presumed distribution.
You might investigate some of the peer-reviewed journals on quality engineering, including the Journal of Quality Technology (published by ASQ), Quality Engineering (published by ASQ) and Technometrics (joint published by ASQ and ASA). They’ve been publishing peer-reviewed articles for many years, discussing the statistical basis of the control charts, their detection levels and false alarm rates. It is not magic that the control chart is robust to distribution. It is the very nature of its basis in statistical theory. Yet, each control chart, like all statistical tests, has limitations. It should be obvious that a control limit defined well below zero, when the process metric cannot physically go below zero, is modeling the process poorly. The danger of this poor model is the inability to detect real process changes, or to react to perceived process changes that do not exist. Shewhart, Deming, and many others have discussed these issues as central to the need for control charts, so it is rather ironic when a chart is incorrectly used to result in that same dreaded outcome.
© Copyright iSixSigma 2000-2014. User Agreement. Any reproduction or other use of content without the express written consent of iSixSigma is prohibited. More »