Home › Forums › General Forums › New to Lean Six Sigma › Non-normal Data Query

This topic contains 12 replies, has 6 voices, and was last updated by Chris Seider 11 months, 2 weeks ago.

Viewing 13 posts - 1 through 13 (of 13 total)

- AuthorPosts

Hitesh KathuriaI have a sample data where the objective is to estimate the number of parties involved in each contract. The sample shows that variation from 1 name to 12000, which is most likely going to remain in the population. How to deal with such scenario as the mean and median both will not be helpful?

What do YOU think you should do? Why or why not?

The iSixSigma audience is helpful, but they are not here to do your work for you.

Why do you think the mean and/or the median would not be helpful?

Which of mean or median WILL be helpful?

Hitesh KathuriaI meant to say…..How to deal with such scenario as the mean and median both have an influence of variation?

The median of data is very low whereas mean is high… The objective is to avoid the risk of over or under estimation.

The mean and the median are two measures of central tendency. I’m not sure what you mean by saying “both have an influence of variation.”

Question: Did you plot the data (histogram, normal probability plot, etc.) If so what did you see? Just graphing the data and looking at the result should go a long way with respect to helping you decide which descriptive statistics should be applied.

And what is the Mode telling you? How big is your sample?

as @rbutler says graphing the data will give you some clues but that does depend on the size of your sample too.

Hitesh KathuriaI plotted the data and the summary as follow, hope this assist in understanding the summary.

Anderson-Darling Normality Test

A-Squared 40.07

P-Value <0.005

Mean 38.844

StDev 73.609

Variance 5418.341

Skewness 4.4121

Kurtosis 21.3030

N 212

Minimum 1.000

1st Quartile 10.000

Median 18.000

3rd Quartile 34.750

Maximum 532.000

95% Confidence Interval for Mean

28.879 48.810

95% Confidence Interval for Median

14.000 21.000

95% Confidence Interval for StDev

67.206 81.372In the original post you said “The sample shows that variation from 1 name to 12000”. The summary statistics you provided indicate a minimum of 1 and a maximum of 532 – what happened to 12000?

Also – could you post a jpeg of the histogram of your data?

This is a count, correct?

Hitesh KathuriaYes this is a count

I have removed the outlier with 12000, considering it as exception.

Can’t paste the image on it

So – you have a total sample of 212 contracts and the number of people associated with each contract can vary from 1 to 532 with 50% of the contracts having 18 people or less.

What is the story with respect to the contracts? Are they all the same kind of contract? If they aren’t then it would be worth splitting the sample by contract type and checking the distributions of number of people by contract type.

If they are all the same kind of contract then the question you would want to address is: How come supposedly identical contract types have such a wide range of individuals associated with them?

Yes, you may want to analyze the various demographics categories to begin to solve the problem.

- AuthorPosts

Viewing 13 posts - 1 through 13 (of 13 total)

© Copyright iSixSigma 2000-2018. User Agreement. Any reproduction or other use of content without the express written consent of iSixSigma is prohibited. More »