Home › Forums › General Forums › New to Lean Six Sigma › Non-normal Data Query
This topic contains 12 replies, has 6 voices, and was last updated by Chris Seider 3 months, 1 week ago.
I have a sample data where the objective is to estimate the number of parties involved in each contract. The sample shows that variation from 1 name to 12000, which is most likely going to remain in the population. How to deal with such scenario as the mean and median both will not be helpful?
What do YOU think you should do? Why or why not?
The iSixSigma audience is helpful, but they are not here to do your work for you.
Why do you think the mean and/or the median would not be helpful?
Which of mean or median WILL be helpful?
I meant to say…..How to deal with such scenario as the mean and median both have an influence of variation?
The median of data is very low whereas mean is high… The objective is to avoid the risk of over or under estimation.
The mean and the median are two measures of central tendency. I’m not sure what you mean by saying “both have an influence of variation.”
Question: Did you plot the data (histogram, normal probability plot, etc.) If so what did you see? Just graphing the data and looking at the result should go a long way with respect to helping you decide which descriptive statistics should be applied.
And what is the Mode telling you? How big is your sample?
as @rbutler says graphing the data will give you some clues but that does depend on the size of your sample too.
I plotted the data and the summary as follow, hope this assist in understanding the summary.
Anderson-Darling Normality Test
A-Squared 40.07
P-Value <0.005
Mean 38.844
StDev 73.609
Variance 5418.341
Skewness 4.4121
Kurtosis 21.3030
N 212
Minimum 1.000
1st Quartile 10.000
Median 18.000
3rd Quartile 34.750
Maximum 532.000
95% Confidence Interval for Mean
28.879 48.810
95% Confidence Interval for Median
14.000 21.000
95% Confidence Interval for StDev
67.206 81.372
In the original post you said “The sample shows that variation from 1 name to 12000”. The summary statistics you provided indicate a minimum of 1 and a maximum of 532 – what happened to 12000?
Also – could you post a jpeg of the histogram of your data?
This is a count, correct?
Yes this is a count
I have removed the outlier with 12000, considering it as exception.
Can’t paste the image on it
So – you have a total sample of 212 contracts and the number of people associated with each contract can vary from 1 to 532 with 50% of the contracts having 18 people or less.
What is the story with respect to the contracts? Are they all the same kind of contract? If they aren’t then it would be worth splitting the sample by contract type and checking the distributions of number of people by contract type.
If they are all the same kind of contract then the question you would want to address is: How come supposedly identical contract types have such a wide range of individuals associated with them?
Yes, you may want to analyze the various demographics categories to begin to solve the problem.
© Copyright iSixSigma 2000-2017. User Agreement. Any reproduction or other use of content without the express written consent of iSixSigma is prohibited. More »