# Log data transformation when data is zero…

Six Sigma – iSixSigma Forums Old Forums General Log data transformation when data is zero…

Viewing 7 posts - 1 through 7 (of 7 total)
• Author
Posts
• #35087

Participant

Hi, I have some non-normal data which transforms very well into normal data when taking its natural log (the data can’t be negative, so this fits theoretically as well).  A few of the data points are zero, for which I can’t take the log.  So, I’m considering a shift in addition to the log as follows:
ln(x + a), where a is the shift amount
because taking the log and then shifting as follows still won’t work:
ln(x) + a
I’ve used several transformation techniques in my life, including the log & the shift, but can only vaguely remember using a combination shift once and can’t remember which two I used.  So, will applying the shift before taking the log work?  I’d be inclined to think not, but wanted some additional insight…

0
#97628

Gabriel
Participant

I offer you two possible solutions:
a) Use the shift. It works. Not only save the prblem of log(0), but also you may find an “a” that makes the data fit better than without the shift.
b) Typically, in this type of measurement where zero is the physical limit, in fact zero cannot exist in practice, but the problem is that you don’t have enough resolution in your instrument to measure the true value which will be somewhere above zero. So a solution can be to find a representative value more realistic than zero to use when the reading is zero. For example, let’s say that you are measuring run-out with a digital dial indicator with resulution 0.01mm. You center your workpiece in a turntable, put the dial indicator on one point of the section to be measured, reset it to zero, and turn the part looking for a variation in the reading of the dial indicator. If all the variation is within ±0.005mm arround the point where you set the zero, you have a run-out of 0.01mm but will read zero. Any run-out beyond that will have reading greater than zero, and any run-out between that oine and a real zero will read zero. So if you assign an equal probability to any point of the range where the run-out can read zero (from 0 to 0.01mm), you will conclude that parts reading zero will have, on average, a run-out of 0.005mm. So it would more correct to put that value than zero. Then you don’t have a datapoint of value zero, and you have no problem with the log transformation. You can use this in combination with the shift, as the shift can still give you a better fit.

0
#97633

Participant

Thanks for the reply Gabriel. I thought about your second solution as well, i.e. setting the value to a small number close to zero, but wasn’t sure it would work. However, I’ll try both solutions & see which works best. Thanks again!

0
#97636

Gabriel
Participant

“I thought about […] setting the value to a small number close to zero”
Be careful. Don’t use just any small number close to zero. Bear in mind that the zero under the log would equal to -infinity in the normal distribution. A number too close to zero would be too far on the left tail and look as an outlier. You must use a number which is representative of the real value the part can have when you read zero. Typically half of the increment will work.
“I’ll try both solutions & see which works best”
As I said, maybe both solutions together work best.

0
#97637

Participant

Gabriel, we’re on the same page! I was afraid I’d get outliers & did indeed get them when I used a “small” number. The log-shift worked fine. I found some published scientific studies on the Internet where the researchers used the log-shift method so I have backup if someone ever asks…

0
#103436

Ronald
Participant

I’m dealing with data which have zero value (more than half of them) and are skewed.  So I am thinking ln(x + a).  Can I know some scientific literatures which have justification of this transformation?
Thanks!
Lee :)

0
#173561

martina
Participant

hi i just made use of the log-shift method because i had a number of zero responses in my data.. however i need some information/literature to back me up in order to justify….