To Transform, or Not to Transform – That is The Question

In the transactional environment, we frequently run projects around reducing cycle times. More often than not, cycle time distributions are not normal, owing to the fact that there is a hard stop at 0 – negative cycle times to complete transactions rarely seen.

Despite the fact that most statistical analysis is built on the assumption of a normal distribution, hypothesis tests are generally robust enough to handle non-normal data (see Robin Barnwell’s post for more details on this.)

Whenever I study a process, I always want to know two things: what is the pattern of variation over time, and what is the process’ current capability of meeting its specification. This is where data transformation can come into play, quite usefully.

It should be said that transforming data can be risky – after all, we are taking true data, with all of its idiosyncrasies, and making it “more normal” to ease our analysis. Moreover, trying to explain transformed data in business contexts poses its own challenges. (If you do need to transform, and then explain this in a business context, try the square root option – far easier to get into than logarithmic numbers).

Capability studies in Minitab can be run with or without transformation on non-normal data; the main nuance being the expected process performance (based on the distribution) vs. the actual process performance (based on actual data). Often the difference in capability results is small, and actual process performance is sufficient in many applications.

Control charts can also be run with transformation, and this is perhaps its most useful application. Consider these control charts for a transactional process:

Handpicked Content:   Goal, Metric, and Operational Definition


Without transformation, a process owner or project leader would be led to investigate the four out of control points on this chart, to discover the special cause present. Not only could valuable time and effort be wasted on such inquiries, but process adjustments made on the basis of these datapoints could in fact upset a properly functioning process. Bycorrectly calculating the control limits, no process intervention would be required.

So readers, what process situations have you encountered which required decisions concerning transforming data? Please post your comments below.

Leave a Reply