statistic ofr skew data
Six Sigma – iSixSigma › Forums › Old Forums › General › statistic ofr skew data
 This topic has 9 replies, 8 voices, and was last updated 19 years, 8 months ago by Arturo Ruiz Falcó.

AuthorPosts

August 21, 2002 at 6:16 am #30152
Which statistic should I use to describe a skew population ? Some told me that we should use mean but some use median.
Please advise
ExGB0August 21, 2002 at 7:38 am #78270
James AParticipant@JamesA Include @JamesA in your post and this person will
be notified via email.Morning ExGB,
I’d try using kurtosis to describe the skew – the NIST website is a great source of stats type data, and it’s free – the following link will take you to the area I think you need to look atfor more information:
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35b.htm
James A0August 21, 2002 at 8:16 am #78271Thanks James,
It seem that kurtosis is used to describe the skewness of the distribution. Which parameter I should use to describe the mean of the distribution, arithmatic average, median or the others
Regards
exGB
0August 21, 2002 at 8:29 am #78272
James AParticipant@JamesA Include @JamesA in your post and this person will
be notified via email.I’d try using the arithmetical mean – if you think about it, it makes sense as it is the same as you would use in SPC – which also deals with ‘normal’ albeit skewed (on occasions) data.
Statistician I’m not, so others may disagree, but that’s where I’d go.
James A0August 21, 2002 at 12:09 pm #78273If you are not discussing this with statisticians then use both. Also plot the data in a histogram.
Remeber the customer does not feel the mean, therefore talking to the mean and the median and what the differences are between them is important. Also by displaying the data you can tell whether you are dealing with special cause or common cause occurrences.
0August 21, 2002 at 1:29 pm #78277
Marc RichardsonParticipant@MarcRichardson Include @MarcRichardson in your post and this person will
be notified via email.Kurtosis is a measure of the relative height of the normal, bellshaped curve. Skewness is a measure of the relative amount of left or right shift of the mean in the distribution. By definition then, a large left or right shift in the mean of a distribution would suggest that the mean is not an accurate measure of the distribution’s central tendency. It is a strong indication that the data are not normally distributed. In these cases, it is worth analyzing the data using nonparametric methods.
Marc Richardson
Sr. Q.A. Eng.0August 21, 2002 at 1:31 pm #78278I suppose that depends on how skewed the data is. You can use the kurtosis to figure out how skewed it is. Once you know that, you can decide whether the mean or median is more appropriate. Most people don’t like to go away from their usual practices so they will tell you to use the mean all the time. But if your data is not distributed normally or uniformly, then often the median is appropriate. Particularly if you have a very points that are skewing you data.
EX: Sample A has 1000 data points, each equal to 10. Therefore, mean = median = 10. Sample B has 1000 data points, 999 equal to 10, 1 equal to 1,000,000. median still equals 10, but the mean is now equal to 1009.99. So you tell me which statistic better represents the data?0August 26, 2002 at 7:31 am #78405Here’s a simple answer to your simple question:
Use mean and median and standard deviation to describe the data.
For analysis, median and span might be the best measures. Use your process expertise to understand why the skew is happening. Might be because there are variations to the process or because there are more than one processes hiding under the data. Also, try segmentation to remove skew.
All the best. Hope this helps!
Harjot0August 26, 2002 at 1:41 pm #78414If the data is severly skewed then the median and interquartile range may be appropriate. If you suspect that the underlying distribution is normal, then it is important to determine the cause of the skewness in your sample.
Avoid using measures of skewness and kurtosis unless the sample size is quite large (rules of thumb range from n=>50 to n=> 100).0September 6, 2002 at 7:48 am #78689
Arturo Ruiz FalcóParticipant@ArturoRuizFalcó Include @ArturoRuizFalcó in your post and this person will
be notified via email.The answers you have got highligth the mean / median dilemma. I suggest to perform a data transformation to achieve normality (and therefore simetry). You can do it using the BoxCox transformation wich requieres a lot of calculations but is available in most statistical packages (e. g. Minitab). I hope it helps.
Regards,
Arturo0 
AuthorPosts
The forum ‘General’ is closed to new topics and replies.