I have a sample data where the objective is to estimate the number of parties involved in each contract. The sample shows that variation from 1 name to 12000, which is most likely going to remain in the population. How to deal with such scenario as the mean and median both will not be helpful?
What do YOU think you should do? Why or why not?
The iSixSigma audience is helpful, but they are not here to do your work for you.
Why do you think the mean and/or the median would not be helpful?
Which of mean or median WILL be helpful?
I meant to say…..How to deal with such scenario as the mean and median both have an influence of variation?
The median of data is very low whereas mean is high… The objective is to avoid the risk of over or under estimation.
The mean and the median are two measures of central tendency. I’m not sure what you mean by saying “both have an influence of variation.”
Question: Did you plot the data (histogram, normal probability plot, etc.) If so what did you see? Just graphing the data and looking at the result should go a long way with respect to helping you decide which descriptive statistics should be applied.
And what is the Mode telling you? How big is your sample?
as @rbutler says graphing the data will give you some clues but that does depend on the size of your sample too.
I plotted the data and the summary as follow, hope this assist in understanding the summary.
Anderson-Darling Normality Test
1st Quartile 10.000
3rd Quartile 34.750
95% Confidence Interval for Mean
95% Confidence Interval for Median
95% Confidence Interval for StDev
In the original post you said “The sample shows that variation from 1 name to 12000”. The summary statistics you provided indicate a minimum of 1 and a maximum of 532 – what happened to 12000?
Also – could you post a jpeg of the histogram of your data?
Yes this is a count
I have removed the outlier with 12000, considering it as exception.
Can’t paste the image on it
So – you have a total sample of 212 contracts and the number of people associated with each contract can vary from 1 to 532 with 50% of the contracts having 18 people or less.
What is the story with respect to the contracts? Are they all the same kind of contract? If they aren’t then it would be worth splitting the sample by contract type and checking the distributions of number of people by contract type.
If they are all the same kind of contract then the question you would want to address is: How come supposedly identical contract types have such a wide range of individuals associated with them?