Log data transformation when data is zero…
Six Sigma – iSixSigma › Forums › Old Forums › General › Log data transformation when data is zero…
 This topic has 6 replies, 4 voices, and was last updated 13 years, 6 months ago by martina.

AuthorPosts

March 30, 2004 at 9:15 pm #35087
TierradentroParticipant@john Include @john in your post and this person will
be notified via email.Hi, I have some nonnormal data which transforms very well into normal data when taking its natural log (the data can’t be negative, so this fits theoretically as well). A few of the data points are zero, for which I can’t take the log. So, I’m considering a shift in addition to the log as follows:
ln(x + a), where a is the shift amount
because taking the log and then shifting as follows still won’t work:
ln(x) + a
I’ve used several transformation techniques in my life, including the log & the shift, but can only vaguely remember using a combination shift once and can’t remember which two I used. So, will applying the shift before taking the log work? I’d be inclined to think not, but wanted some additional insight…0March 31, 2004 at 1:44 pm #97628
GabrielParticipant@Gabriel Include @Gabriel in your post and this person will
be notified via email.I offer you two possible solutions:
a) Use the shift. It works. Not only save the prblem of log(0), but also you may find an “a” that makes the data fit better than without the shift.
b) Typically, in this type of measurement where zero is the physical limit, in fact zero cannot exist in practice, but the problem is that you don’t have enough resolution in your instrument to measure the true value which will be somewhere above zero. So a solution can be to find a representative value more realistic than zero to use when the reading is zero. For example, let’s say that you are measuring runout with a digital dial indicator with resulution 0.01mm. You center your workpiece in a turntable, put the dial indicator on one point of the section to be measured, reset it to zero, and turn the part looking for a variation in the reading of the dial indicator. If all the variation is within ±0.005mm arround the point where you set the zero, you have a runout of 0.01mm but will read zero. Any runout beyond that will have reading greater than zero, and any runout between that oine and a real zero will read zero. So if you assign an equal probability to any point of the range where the runout can read zero (from 0 to 0.01mm), you will conclude that parts reading zero will have, on average, a runout of 0.005mm. So it would more correct to put that value than zero. Then you don’t have a datapoint of value zero, and you have no problem with the log transformation. You can use this in combination with the shift, as the shift can still give you a better fit.0March 31, 2004 at 2:24 pm #97633
TierradentroParticipant@john Include @john in your post and this person will
be notified via email.Thanks for the reply Gabriel. I thought about your second solution as well, i.e. setting the value to a small number close to zero, but wasn’t sure it would work. However, I’ll try both solutions & see which works best. Thanks again!
0March 31, 2004 at 3:21 pm #97636
GabrielParticipant@Gabriel Include @Gabriel in your post and this person will
be notified via email.“I thought about […] setting the value to a small number close to zero”
Be careful. Don’t use just any small number close to zero. Bear in mind that the zero under the log would equal to infinity in the normal distribution. A number too close to zero would be too far on the left tail and look as an outlier. You must use a number which is representative of the real value the part can have when you read zero. Typically half of the increment will work.
“I’ll try both solutions & see which works best”
As I said, maybe both solutions together work best.0March 31, 2004 at 3:32 pm #97637
TierradentroParticipant@john Include @john in your post and this person will
be notified via email.Gabriel, we’re on the same page! I was afraid I’d get outliers & did indeed get them when I used a “small” number. The logshift worked fine. I found some published scientific studies on the Internet where the researchers used the logshift method so I have backup if someone ever asks…
0July 14, 2004 at 7:42 pm #103436I’m dealing with data which have zero value (more than half of them) and are skewed. So I am thinking ln(x + a). Can I know some scientific literatures which have justification of this transformation?
Thanks!
Lee :)0July 6, 2008 at 2:30 pm #173561
martinaParticipant@martina Include @martina in your post and this person will
be notified via email.hi i just made use of the logshift method because i had a number of zero responses in my data.. however i need some information/literature to back me up in order to justify….
Any help please? Links perhaps…0 
AuthorPosts
The forum ‘General’ is closed to new topics and replies.