# Non-normal Data Query

Six Sigma – iSixSigma › Forums › General Forums › New to Lean Six Sigma › Non-normal Data Query

- This topic has 12 replies, 5 voices, and was last updated 3 years, 4 months ago by Chris Seider.

- AuthorPosts
- March 7, 2017 at 8:26 am #55646
I have a sample data where the objective is to estimate the number of parties involved in each contract. The sample shows that variation from 1 name to 12000, which is most likely going to remain in the population. How to deal with such scenario as the mean and median both will not be helpful?

0March 7, 2017 at 9:45 am #200937

Katie BarryKeymaster@KatieBarry**Include @KatieBarry in your post and this person will**

be notified via email.What do YOU think you should do? Why or why not?

The iSixSigma audience is helpful, but they are not here to do your work for you.

0March 7, 2017 at 3:09 pm #200940

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.Why do you think the mean and/or the median would not be helpful?

0March 7, 2017 at 6:30 pm #200943

MBBinWIParticipant@MBBinWI**Include @MBBinWI in your post and this person will**

be notified via email.Which of mean or median WILL be helpful?

0March 8, 2017 at 1:19 am #200948I meant to say…..How to deal with such scenario as the mean and median both have an influence of variation?

The median of data is very low whereas mean is high… The objective is to avoid the risk of over or under estimation.

0March 8, 2017 at 5:40 am #200949

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.The mean and the median are two measures of central tendency. I’m not sure what you mean by saying “both have an influence of variation.”

Question: Did you plot the data (histogram, normal probability plot, etc.) If so what did you see? Just graphing the data and looking at the result should go a long way with respect to helping you decide which descriptive statistics should be applied.

0March 8, 2017 at 5:53 am #200950

Andrew ParrParticipant@Andy-Parr**Include @Andy-Parr in your post and this person will**

be notified via email.And what is the Mode telling you? How big is your sample?

as @rbutler says graphing the data will give you some clues but that does depend on the size of your sample too.

0March 9, 2017 at 8:01 am #200958I plotted the data and the summary as follow, hope this assist in understanding the summary.

Anderson-Darling Normality Test

A-Squared 40.07

P-Value <0.005

Mean 38.844

StDev 73.609

Variance 5418.341

Skewness 4.4121

Kurtosis 21.3030

N 212

Minimum 1.000

1st Quartile 10.000

Median 18.000

3rd Quartile 34.750

Maximum 532.000

95% Confidence Interval for Mean

28.879 48.810

95% Confidence Interval for Median

14.000 21.000

95% Confidence Interval for StDev

67.206 81.3720March 9, 2017 at 8:40 am #200959

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.In the original post you said “The sample shows that variation from 1 name to 12000”. The summary statistics you provided indicate a minimum of 1 and a maximum of 532 – what happened to 12000?

Also – could you post a jpeg of the histogram of your data?

0March 9, 2017 at 9:32 am #200960

Chris SeiderParticipant@cseider**Include @cseider in your post and this person will**

be notified via email.This is a count, correct?

0March 15, 2017 at 7:53 am #201010Yes this is a count

I have removed the outlier with 12000, considering it as exception.

Can’t paste the image on it

0March 15, 2017 at 8:08 am #201011

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.So – you have a total sample of 212 contracts and the number of people associated with each contract can vary from 1 to 532 with 50% of the contracts having 18 people or less.

What is the story with respect to the contracts? Are they all the same kind of contract? If they aren’t then it would be worth splitting the sample by contract type and checking the distributions of number of people by contract type.

If they are all the same kind of contract then the question you would want to address is: How come supposedly identical contract types have such a wide range of individuals associated with them?

0March 15, 2017 at 10:36 am #201014

Chris SeiderParticipant@cseider**Include @cseider in your post and this person will**

be notified via email.Yes, you may want to analyze the various demographics categories to begin to solve the problem.

0 - AuthorPosts

You must be logged in to reply to this topic.